Spaces:
Sleeping
Sleeping
File size: 3,818 Bytes
24f0bf0 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 | # plugins
## plugin-registry-overview
The plugin registry is the canonical catalog of callable capabilities used by the scraper runtime and agent tool planner.
Current registry snapshot:
| metric | value |
| --- | ---: |
| plugin-groups | 12 |
| total-tools | 82 |
| source-file | `backend/app/plugins/registry.py` |
## plugin-group-matrix
| plugin-id | category | tool-count | primary-purpose |
| --- | --- | ---: | --- |
| `browser` | `browser` | 8 | navigation and interaction actions |
| `html-parser` | `parser` | 13 | html and dom parsing/extraction |
| `data-processing` | `data` | 13 | json/csv/dataframe style transforms |
| `regex` | `extraction` | 5 | pattern matching and text extraction |
| `network` | `network` | 5 | http/url operations |
| `media` | `media` | 4 | media and document extraction |
| `analysis` | `analysis` | 7 | schema/relevance/stats/text analysis |
| `extraction` | `extraction` | 8 | contact/date/price/entity extraction |
| `validation` | `validation` | 7 | url/json/schema/signal validation |
| `storage` | `storage` | 5 | memory and cache operations |
| `sandbox` | `ai` | 3 | sandboxed code execution |
| `ai` | `ai` | 4 | ai completion/embedding/classification |
## runtime-usage-model
```mermaid
flowchart TD
A[scrape request] --> B[resolve enabled plugins]
B --> C[agent tool planner]
C --> D[plugin registry catalog]
D --> E[selected tool calls]
E --> F[tool executor]
F --> G[tool results and context updates]
G --> H[llm extraction code generation]
H --> I[sandbox execution]
I --> J[formatted output and complete event]
```
## request-and-selection-rules
| input-surface | behavior |
| --- | --- |
| `enable_plugins` | requested plugin ids from the request payload |
| plugin-resolver | filters to installed plugin ids and returns enabled + missing lists |
| `selected_agents` | controls agent roles/modules, independent from plugin install state |
| runtime planner | chooses tools dynamically from registry metadata, not fixed site templates |
## plugin-extension-checklist
1. add new `ToolDefinition` entries in `backend/app/plugins/registry.py`
2. ensure tool names use namespace format (`namespace.action`)
3. provide parameter and return schemas in the registry entry
4. implement runtime behavior in agent executor if the namespace is executable in-agent
5. expose and verify behavior via scrape stream step events
## plugin-extension-flow
```mermaid
sequenceDiagram
participant Dev as developer
participant Reg as plugin-registry
participant Planner as agent-tool-planner
participant Exec as tool-executor
participant Stream as scrape-stream
Dev->>Reg: add ToolDefinition
Reg-->>Planner: tool metadata available
Planner->>Exec: select and call tool
Exec-->>Stream: tool_call result in step event
Stream-->>Dev: visible runtime behavior
```
## recently-added-tools
| namespace | tool-name | intent |
| --- | --- | --- |
| `html` | `html.extract_meta` | capture title and meta tags |
| `html` | `html.extract_jsonld` | parse structured json-ld blocks |
| `html` | `html.detect_repeating_blocks` | identify repeated dom structures |
| `data` | `data.dedupe_rows` | remove duplicate records |
| `data` | `data.rank_rows` | rank rows by selected score field |
| `data` | `data.select_columns` | project rows to requested columns |
| `analysis` | `analysis.infer_schema` | infer field types/nullability |
| `analysis` | `analysis.score_relevance` | score rows against instructions |
| `extract` | `extract.top_n` | keep top-n records |
| `validate` | `validate.data_completeness` | completeness score by field |
| `validate` | `validate.row_signal` | estimate row quality signal |
## related-api-reference
| item | value |
| --- | --- |
| api-reference | `api-reference.md` |
|