File size: 2,113 Bytes
8911258 c079f25 8911258 853c597 8911258 853c597 8911258 853c597 8911258 853c597 8911258 853c597 c079f25 8911258 853c597 c079f25 853c597 c079f25 8911258 853c597 c079f25 853c597 e125869 853c597 e125869 853c597 e125869 853c597 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 | # Speculative Tool Actions
Investigating whether speculative decoding can be adapted from token prediction to agent action prediction.
**Current state:** v2 evaluation complete (see [ABLATION_REPORT_v2.md](./ABLATION_REPORT_v2.md)). v3 datasets + 1.7B proposer trained. **Need:** train 4B verifier + 8B proposer, then run eval.
## Quick Start: Complete the Project
### One-command training (A100-large, ~2h):
```bash
python train_all_v3.py
```
Or via HF Jobs:
```python
hf_jobs(operation="run", script="https://hf.co/narcolepticchicken/speculative-tool-actions/resolve/main/train_all_v3.py",
dependencies=["transformers>=4.51","trl","torch","datasets","accelerate","peft","huggingface_hub"],
hardware_flavor="a100-large", timeout="12h")
```
### Then evaluate:
```bash
python eval_runner_v3.py
```
## Architecture
A cheap model (Qwen3-1.7B LoRA) proposes the next agent action. A verifier (Qwen3-4B LoRA) accepts or rejects. On rejection, fall back to the expensive 8B model.
**Action space:** `tool_call`, `retrieval`, `file_read`, `file_write`, `repair`, `verifier`, `ask_clarification`, `final_answer`, `BLOCKED`
## Files
| File | Purpose |
|------|---------|
| `train_all_v3.py` | Consolidated: trains 1.7B+4B+8B sequentially |
| `train_sft_v3.py` | Individual proposer training |
| `train_verifier_v3.py` | Individual verifier training |
| `eval_runner_v3.py` | All-5-configs evaluation |
| `PROJECT_REPORT.md` | Full project documentation + v2 results |
| `ABLATION_REPORT_v2.md` | v2 analysis (51% cheap vs 40% frozen 8B) |
| `eval_results_v2.json` | v2 raw results |
## v2 Results
| Config | Acc | Cost |
|--------|-----|------|
| A: 8B frozen | 40% | 1.00 |
| B: 1.7B cheap | **51%** | **0.15** |
| D: cheap + 4B RM | 51% | 0.25 |
| E: multi-proposal | 42% | 0.75 |
See [ABLATION_REPORT_v2.md](./ABLATION_REPORT_v2.md) for analysis.
## v3 Status
| Component | Status |
|-----------|--------|
| Datasets (SFT, verifier, eval) | ✓ Built |
| 1.7B proposer | ✓ Trained |
| 4B verifier | ✗ Needs training |
| 8B proposer | ✗ Needs training |
| Eval runner | ✓ Ready |
|