# plugins ## plugin-registry-overview The plugin registry is the canonical catalog of callable capabilities used by the scraper runtime and agent tool planner. Current registry snapshot: | metric | value | | --- | ---: | | plugin-groups | 12 | | total-tools | 82 | | source-file | `backend/app/plugins/registry.py` | ## plugin-group-matrix | plugin-id | category | tool-count | primary-purpose | | --- | --- | ---: | --- | | `browser` | `browser` | 8 | navigation and interaction actions | | `html-parser` | `parser` | 13 | html and dom parsing/extraction | | `data-processing` | `data` | 13 | json/csv/dataframe style transforms | | `regex` | `extraction` | 5 | pattern matching and text extraction | | `network` | `network` | 5 | http/url operations | | `media` | `media` | 4 | media and document extraction | | `analysis` | `analysis` | 7 | schema/relevance/stats/text analysis | | `extraction` | `extraction` | 8 | contact/date/price/entity extraction | | `validation` | `validation` | 7 | url/json/schema/signal validation | | `storage` | `storage` | 5 | memory and cache operations | | `sandbox` | `ai` | 3 | sandboxed code execution | | `ai` | `ai` | 4 | ai completion/embedding/classification | ## runtime-usage-model ```mermaid flowchart TD A[scrape request] --> B[resolve enabled plugins] B --> C[agent tool planner] C --> D[plugin registry catalog] D --> E[selected tool calls] E --> F[tool executor] F --> G[tool results and context updates] G --> H[llm extraction code generation] H --> I[sandbox execution] I --> J[formatted output and complete event] ``` ## request-and-selection-rules | input-surface | behavior | | --- | --- | | `enable_plugins` | requested plugin ids from the request payload | | plugin-resolver | filters to installed plugin ids and returns enabled + missing lists | | `selected_agents` | controls agent roles/modules, independent from plugin install state | | runtime planner | chooses tools dynamically from registry metadata, not fixed site templates | ## plugin-extension-checklist 1. add new `ToolDefinition` entries in `backend/app/plugins/registry.py` 2. ensure tool names use namespace format (`namespace.action`) 3. provide parameter and return schemas in the registry entry 4. implement runtime behavior in agent executor if the namespace is executable in-agent 5. expose and verify behavior via scrape stream step events ## plugin-extension-flow ```mermaid sequenceDiagram participant Dev as developer participant Reg as plugin-registry participant Planner as agent-tool-planner participant Exec as tool-executor participant Stream as scrape-stream Dev->>Reg: add ToolDefinition Reg-->>Planner: tool metadata available Planner->>Exec: select and call tool Exec-->>Stream: tool_call result in step event Stream-->>Dev: visible runtime behavior ``` ## recently-added-tools | namespace | tool-name | intent | | --- | --- | --- | | `html` | `html.extract_meta` | capture title and meta tags | | `html` | `html.extract_jsonld` | parse structured json-ld blocks | | `html` | `html.detect_repeating_blocks` | identify repeated dom structures | | `data` | `data.dedupe_rows` | remove duplicate records | | `data` | `data.rank_rows` | rank rows by selected score field | | `data` | `data.select_columns` | project rows to requested columns | | `analysis` | `analysis.infer_schema` | infer field types/nullability | | `analysis` | `analysis.score_relevance` | score rows against instructions | | `extract` | `extract.top_n` | keep top-n records | | `validate` | `validate.data_completeness` | completeness score by field | | `validate` | `validate.row_signal` | estimate row quality signal | ## related-api-reference | item | value | | --- | --- | | api-reference | `api-reference.md` |