- Docker-first and Docker-only unless user asks otherwise.
- Keep repo focused: stable Botasaurus scrape API wrapper, not generic framework.
- Endpoints:
GET /health,POST /scrape. - Stable legacy
/scrapefields:url,final_url,status_code,headers,html,error,metadata_error. - Additive diagnostics fields (current contract):
request_id,attempts,strategy_used,render_ms,blocked_detected,challenge_detected,error_category. - Request options (current contract):
navigation_mode,max_retries,wait_for_selector,wait_timeout_seconds,block_images. - Error codes:
400validation/resolution failure403SSRF guardrail block422request schema validation502scrape execution failure504timeout
POST /scrapeis async API over sync browser work (threadpool).- Each scrape request must use isolated runtime state:
- request-scoped runtime dir
/tmp/scrape/<request_id> - request-scoped browser profile
- no cache/profile/driver reuse across requests
- request-scoped runtime dir
- Cleanup is mandatory in
finally:- close browser driver
- delete request runtime dir
- remove in-memory active request id
- Keep request-id collision/invariant guard (
_active_request_ids) intact. driver.requests.getmetadata is best-effort; metadata failure must not fail HTML success.- Keep strategy engine behavior:
automode attempt order:google_get->google_get_bypass->get- do not alter retry semantics without docs/tests update
- Multi-arch image required:
- all architectures: Chromium install
- keep
/usr/bin/google-chromesymlink to Chromium for compatibility
- If browser install logic changes, re-verify binary path and Botasaurus startup.
- Keep SSRF guardrails: localhost/domain checks and blocked IP classes (loopback/private/link-local/multicast/reserved/unspecified).
- Do not weaken URL validation without explicit request plus docs/tests updates.
- Run
make smokebefore finish. make smokemust cover build, boot,/health,/scrapehappy path, strategy override, retry path, isolation check, localhost guardrail.- If API contract, Docker behavior, or error semantics changed, update README in same change.
- Keep commits scoped (infra vs API vs docs).