Spaces:
Running
Running
overview
purpose
This document is the top-level guide for the ScrapeRL documentation set. It explains what the platform does, how the main runtime surfaces connect, and where to find detailed references.
platform-summary
| dimension | summary |
|---|---|
| core-goal | AI-first scraping workflows with RL-style episodes and dynamic agent planning |
| backend | FastAPI control plane with episode, scrape, agent, plugin, memory, and provider APIs |
| frontend | React dashboard for task submission, stream monitoring, and result inspection |
| runtime-pattern | session-based execution with real-time step/tool_call stream events |
| output-targets | json, csv, markdown, and text |
| integrations | OpenAI, Anthropic, Google, Groq, NVIDIA, plugin tools, memory layers |
primary-runtime-flows
flowchart TD
A[user-request] --> B[api-scrape-stream]
B --> C[agent-decision]
C --> D[tool-plan-and-execution]
D --> E[llm-extraction-and-formatting]
E --> F[complete-event]
B --> G[session-status-and-artifacts]
documentation-navigation
| doc | focus-area |
|---|---|
readme.md |
documentation index |
api-reference.md |
complete endpoint catalog and stream/event contract |
architecture.md |
system topology, subsystem planes, reliability model |
openenv.md |
environment/action/observation/reward contract |
features.md |
advanced runtime features and toggles |
memory.md |
memory layers, storage, and operations |
plugins.md |
plugin registry and runtime tool-selection model |
tool-calls.md |
tool call payload schema and lifecycle |
api.md |
multi-model routing and provider behavior |
settings.md |
runtime setting controls and policy knobs |
observability.md |
telemetry/tracing/cost visibility |
rewards.md |
reward design and scoring structure |
search-engine.md |
search provider and retrieval routing details |
mcp.md |
mcp integration architecture |
agents.md |
agent roles and coordination model |
key-api-surfaces
| surface | endpoints |
|---|---|
| system-health | /api/health, /api/ready, /api/ping |
| episode-runtime | /api/episode/reset, /api/episode/step, /api/episode/state/{episode_id} |
| scrape-runtime | /api/scrape/stream, /api/scrape/{session_id}/status, /api/scrape/{session_id}/result |
| agent-tool-memory | /api/agents/*, /api/tools/*, /api/plugins/*, /api/memory/* |
| realtime-channel | /ws/episode/{episode_id} |
Use api-reference.md for full method/path listings.
configuration-surfaces
| file | intent |
|---|---|
.env.example |
complete variable template for app + inference runtime |
.env |
local runtime values |
docker-compose.yml |
backend/frontend orchestration and env wiring |
inference.py |
OpenEnv-compliant inference entrypoint and stdout contract |
recommended-reading-order
overview.mdapi-reference.mdarchitecture.mdopenenv.mdtool-calls.mdplugins.md- domain docs (
memory.md,api.md,features.md,settings.md)
document-metadata
| key | value |
|---|---|
| document | overview.md |
| status | active |
| owner | platform-docs |