Infrastructure
- Docker or Kubernetes runtime.
- API server, worker, Redis, PostgreSQL where configured, and browser service.
- Health checks for API, queue, and browser extraction.
Self-host Firecrawl
Self-hosting is useful for internal security boundaries and custom services. It also adds maintenance, queue, browser, network, and AGPL review work.
PORT, HOST, and authentication mode.Firecrawl is primarily AGPL-3.0. If you modify and publicly run a network service based on AGPL-covered code, plan source availability, notices, license text, and legal review before launch.
The upstream self-host guide notes limitations around advanced managed capabilities. Public copy should not promise the same reliability, proxy coverage, or anti-blocking behavior as the official cloud unless you can prove it in your own deployment.
| Stage | Decision | Evidence to keep |
|---|---|---|
| Local smoke | Can scrape a simple allowed URL. | Request, response, logs, queue status. |
| Browser path | Can handle JavaScript-heavy pages needed by the product. | Browser service logs, timeout settings, screenshot sample. |
| Scale path | Can crawl or batch scrape without queue collapse. | Limits, retry rules, memory and CPU observations. |
| Compliance path | Can enforce allowed domains, robots policy, and data boundaries. | Policy file, deny list, audit log sample. |
| License path | Can satisfy AGPL notices and source obligations. | License page, source link, modification changelog. |