Add comprehensive README
Browse files
README.md
ADDED
|
@@ -0,0 +1,108 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Speculative Tool Actions
|
| 2 |
+
|
| 3 |
+
**Goal**: Test whether speculative decoding can be adapted from token prediction to agent action prediction.
|
| 4 |
+
|
| 5 |
+
**System**: A cheap model proposes candidate next actions (tool call, retrieval, file read/write, repair attempt, verifier call, ask clarification, final answer, BLOCKED). A stronger model or verifier accepts, repairs, or rejects the proposal.
|
| 6 |
+
|
| 7 |
+
**Configurations Compared**:
|
| 8 |
+
- **A**: Always strong model (Qwen2.5-7B-Instruct)
|
| 9 |
+
- **B**: Cheap model only (Qwen3-1.7B)
|
| 10 |
+
- **C**: Cheap proposer + strong verifier
|
| 11 |
+
- **D**: Cheap proposer + trained trace judge (reward model)
|
| 12 |
+
- **E**: Multi-proposal reranking (3 cheap + strong scoring)
|
| 13 |
+
|
| 14 |
+
## Repository Structure
|
| 15 |
+
|
| 16 |
+
| File | Description |
|
| 17 |
+
|------|-------------|
|
| 18 |
+
| `synthetic_data_and_train.py` | End-to-end pipeline: generate data, train proposer, train verifier, evaluate, report |
|
| 19 |
+
| `build_datasets_raw.py` | Dataset builder from SWE-smith + ToolBench |
|
| 20 |
+
| `train_proposer.py` | SFT training script for cheap proposer (Qwen3-1.7B + LoRA) |
|
| 21 |
+
| `train_verifier.py` | Reward model training script for verifier (Qwen3-4B + LoRA) |
|
| 22 |
+
| `eval_runner.py` | Evaluation runner for configs A-E |
|
| 23 |
+
| `pipeline_full.py` | Alternative full pipeline with real datasets |
|
| 24 |
+
|
| 25 |
+
## Datasets
|
| 26 |
+
|
| 27 |
+
- `narcolepticchicken/speculative-actions-proposer-sft` — SFT dataset for next-action prediction
|
| 28 |
+
- `narcolepticchicken/speculative-actions-verifier-pref` — Preference pairs for verifier training
|
| 29 |
+
- `narcolepticchicken/speculative-actions-eval` — Held-out evaluation set with gold labels
|
| 30 |
+
|
| 31 |
+
## Models
|
| 32 |
+
|
| 33 |
+
- `narcolepticchicken/speculative-proposer-qwen3-1.7b` — Cheap action proposer
|
| 34 |
+
- `narcolepticchicken/speculative-verifier-qwen3-4b` — Trained trace judge/verifier
|
| 35 |
+
|
| 36 |
+
## How to Run
|
| 37 |
+
|
| 38 |
+
### Generate Synthetic Data & Full Pipeline
|
| 39 |
+
|
| 40 |
+
```bash
|
| 41 |
+
python synthetic_data_and_train.py
|
| 42 |
+
```
|
| 43 |
+
|
| 44 |
+
This single script:
|
| 45 |
+
1. Generates 5,500 synthetic agent traces with 9 action types
|
| 46 |
+
2. Splits into proposer SFT, verifier preference, and eval datasets
|
| 47 |
+
3. Pushes datasets to Hub
|
| 48 |
+
4. Trains proposer (Qwen3-1.7B + LoRA)
|
| 49 |
+
5. Trains verifier (Qwen3-4B + LoRA RewardTrainer)
|
| 50 |
+
6. Evaluates all 5 configurations (A-E) on 200 held-out examples
|
| 51 |
+
7. Generates cost-quality frontier and ablation report
|
| 52 |
+
|
| 53 |
+
### Run on HF Jobs (GPU)
|
| 54 |
+
|
| 55 |
+
```python
|
| 56 |
+
from huggingface_hub import hf_jobs
|
| 57 |
+
|
| 58 |
+
hf_jobs.run(
|
| 59 |
+
script="https://huggingface.co/narcolepticchicken/speculative-tool-actions/blob/main/synthetic_data_and_train.py",
|
| 60 |
+
dependencies=["datasets","transformers","trl","peft","accelerate","huggingface_hub","trackio","torch"],
|
| 61 |
+
hardware_flavor="a10g-large",
|
| 62 |
+
timeout="8h",
|
| 63 |
+
trackio_project="speculative-tool-actions",
|
| 64 |
+
trackio_space_id="narcolepticchicken/mlintern-7f3a9c2d",
|
| 65 |
+
)
|
| 66 |
+
```
|
| 67 |
+
|
| 68 |
+
## Action Space
|
| 69 |
+
|
| 70 |
+
| Action | Description |
|
| 71 |
+
|--------|-------------|
|
| 72 |
+
| `tool_call` | Execute an external tool/API |
|
| 73 |
+
| `retrieval` | Search/retrieve information |
|
| 74 |
+
| `file_read` | Read a file from disk |
|
| 75 |
+
| `file_write` | Write/edit a file |
|
| 76 |
+
| `repair` | Attempt to fix an error/bug |
|
| 77 |
+
| `verifier` | Validate/check correctness |
|
| 78 |
+
| `ask_clarification` | Request more information from user |
|
| 79 |
+
| `final_answer` | Provide final response |
|
| 80 |
+
| `BLOCKED` | Refuse unsafe/invalid action |
|
| 81 |
+
|
| 82 |
+
## Research Foundation
|
| 83 |
+
|
| 84 |
+
This work builds on:
|
| 85 |
+
- **DualSpec** (arXiv:2603.07416): Heterogeneous action speculation for deep research agents
|
| 86 |
+
- **TinyV** (arXiv:2505.14625): Lightweight LLM-based verifier for RL
|
| 87 |
+
- **Tool-Star** (arXiv:2505.16410): Multi-tool RL with cold-start + self-critic
|
| 88 |
+
- **DeepVerifier** (arXiv:2601.15808): Rubric-guided agent verification
|
| 89 |
+
- **EASD** (arXiv:2512.23765): Entropy-aware speculative decoding
|
| 90 |
+
|
| 91 |
+
## Cost Model
|
| 92 |
+
|
| 93 |
+
Relative token costs:
|
| 94 |
+
- Strong model (Qwen2.5-7B): input=1.0, output=1.0
|
| 95 |
+
- Cheap model (Qwen3-1.7B): input=0.2, output=0.2
|
| 96 |
+
|
| 97 |
+
Cost = input_tokens × input_cost + output_tokens × output_cost
|
| 98 |
+
|
| 99 |
+
## Citation
|
| 100 |
+
|
| 101 |
+
```bibtex
|
| 102 |
+
@software{speculative_tool_actions_2026,
|
| 103 |
+
title = {Speculative Tool Actions},
|
| 104 |
+
author = {ML Intern},
|
| 105 |
+
year = {2026},
|
| 106 |
+
url = {https://huggingface.co/narcolepticchicken/speculative-tool-actions}
|
| 107 |
+
}
|
| 108 |
+
```
|