scrapeRL / docs /overview.md
NeerajCodz's picture
docs: init proto
24f0bf0

overview

purpose

This document is the top-level guide for the ScrapeRL documentation set. It explains what the platform does, how the main runtime surfaces connect, and where to find detailed references.

platform-summary

dimension summary
core-goal AI-first scraping workflows with RL-style episodes and dynamic agent planning
backend FastAPI control plane with episode, scrape, agent, plugin, memory, and provider APIs
frontend React dashboard for task submission, stream monitoring, and result inspection
runtime-pattern session-based execution with real-time step/tool_call stream events
output-targets json, csv, markdown, and text
integrations OpenAI, Anthropic, Google, Groq, NVIDIA, plugin tools, memory layers

primary-runtime-flows

flowchart TD
    A[user-request] --> B[api-scrape-stream]
    B --> C[agent-decision]
    C --> D[tool-plan-and-execution]
    D --> E[llm-extraction-and-formatting]
    E --> F[complete-event]
    B --> G[session-status-and-artifacts]

documentation-navigation

doc focus-area
readme.md documentation index
api-reference.md complete endpoint catalog and stream/event contract
architecture.md system topology, subsystem planes, reliability model
openenv.md environment/action/observation/reward contract
features.md advanced runtime features and toggles
memory.md memory layers, storage, and operations
plugins.md plugin registry and runtime tool-selection model
tool-calls.md tool call payload schema and lifecycle
api.md multi-model routing and provider behavior
settings.md runtime setting controls and policy knobs
observability.md telemetry/tracing/cost visibility
rewards.md reward design and scoring structure
search-engine.md search provider and retrieval routing details
mcp.md mcp integration architecture
agents.md agent roles and coordination model

key-api-surfaces

surface endpoints
system-health /api/health, /api/ready, /api/ping
episode-runtime /api/episode/reset, /api/episode/step, /api/episode/state/{episode_id}
scrape-runtime /api/scrape/stream, /api/scrape/{session_id}/status, /api/scrape/{session_id}/result
agent-tool-memory /api/agents/*, /api/tools/*, /api/plugins/*, /api/memory/*
realtime-channel /ws/episode/{episode_id}

Use api-reference.md for full method/path listings.

configuration-surfaces

file intent
.env.example complete variable template for app + inference runtime
.env local runtime values
docker-compose.yml backend/frontend orchestration and env wiring
inference.py OpenEnv-compliant inference entrypoint and stdout contract

recommended-reading-order

  1. overview.md
  2. api-reference.md
  3. architecture.md
  4. openenv.md
  5. tool-calls.md
  6. plugins.md
  7. domain docs (memory.md, api.md, features.md, settings.md)

document-metadata

key value
document overview.md
status active
owner platform-docs