Spaces:
Running
Running
agentic-scraper-sandbox-plugin-execution-report
goal
Enable scraper as an agent that can:
- search from non-URL prompts,
- navigate and scrape links,
- execute plugin-based Python analysis (
numpy,pandas,bs4) safely, - run in a sandboxed per-request environment with cleanup.
what-was-implemented
- Added sandbox plugin executor:
backend/app/plugins/python_sandbox.py- AST safety validation (restricted imports and blocked dangerous calls/attributes)
- isolated execution with
python -I - per-request temp workspace
- automatic cleanup after execution
- Wired sandbox plugin execution into scrape flow (
/api/scrape/streamand/api/scrape/via shared pipeline):mcp-python-sandboxproc-pythonproc-pandasproc-numpyproc-bs4
- Added optional request field:
python_code(sandboxed code, must assignresult)
- Enhanced non-URL asset resolution:
- MCP search attempt via DuckDuckGo provider
- deterministic fallback resolution for scraper workflows
- Updated plugin registry and installed plugin set for new plugins.
safety-model
- Sandbox runs in isolated temp directory per request (
scraperl-sandbox-<session>-*) - Dangerous operations blocked by static AST checks (
open,exec,eval,subprocess,os-style operations, dunder access, etc.) - No persistent artifacts are kept after run (workspace removed in
finallycleanup).
one-request-validation-real-curl-n-runs
All tests executed with one request to POST /api/scrape/stream each.
| Test | Status | Errors | URLs Processed | Python Analysis Present | Dataset Row Count |
|---|---|---|---|---|---|
| gold-csv-agentic | completed | 0 | 2 | true | 123 |
| ev-data-search-json | completed | 0 | 6 | true | - |
| direct-dataset-python-analysis | completed | 0 | 1 | true | 123 |
notes
- Gold trend request produced monthly dataset rows from 2016 onward with source links in one stream request.
- Python plugin analysis was present in all validation scenarios.
- Agent step stream included planner/search/navigator/extractor/verifier + sandbox analysis events.
document-flow
flowchart TD
A[document] --> B[key-sections]
B --> C[implementation]
B --> D[operations]
B --> E[validation]
related-api-reference
| item | value |
|---|---|
| api-reference | api-reference.md |