| # Speculative Tool Actions |
|
|
| Investigating whether speculative decoding can be adapted from token prediction to agent action prediction. |
|
|
| **Current state:** v2 evaluation complete (see [ABLATION_REPORT_v2.md](./ABLATION_REPORT_v2.md)). v3 datasets + 1.7B proposer trained. **Need:** train 4B verifier + 8B proposer, then run eval. |
|
|
| ## Quick Start: Complete the Project |
|
|
| ### One-command training (A100-large, ~2h): |
| ```bash |
| python train_all_v3.py |
| ``` |
|
|
| Or via HF Jobs: |
| ```python |
| hf_jobs(operation="run", script="https://hf.co/narcolepticchicken/speculative-tool-actions/resolve/main/train_all_v3.py", |
| dependencies=["transformers>=4.51","trl","torch","datasets","accelerate","peft","huggingface_hub"], |
| hardware_flavor="a100-large", timeout="12h") |
| ``` |
|
|
| ### Then evaluate: |
| ```bash |
| python eval_runner_v3.py |
| ``` |
|
|
| ## Architecture |
|
|
| A cheap model (Qwen3-1.7B LoRA) proposes the next agent action. A verifier (Qwen3-4B LoRA) accepts or rejects. On rejection, fall back to the expensive 8B model. |
|
|
| **Action space:** `tool_call`, `retrieval`, `file_read`, `file_write`, `repair`, `verifier`, `ask_clarification`, `final_answer`, `BLOCKED` |
|
|
| ## Files |
|
|
| | File | Purpose | |
| |------|---------| |
| | `train_all_v3.py` | Consolidated: trains 1.7B+4B+8B sequentially | |
| | `train_sft_v3.py` | Individual proposer training | |
| | `train_verifier_v3.py` | Individual verifier training | |
| | `eval_runner_v3.py` | All-5-configs evaluation | |
| | `PROJECT_REPORT.md` | Full project documentation + v2 results | |
| | `ABLATION_REPORT_v2.md` | v2 analysis (51% cheap vs 40% frozen 8B) | |
| | `eval_results_v2.json` | v2 raw results | |
|
|
| ## v2 Results |
|
|
| | Config | Acc | Cost | |
| |--------|-----|------| |
| | A: 8B frozen | 40% | 1.00 | |
| | B: 1.7B cheap | **51%** | **0.15** | |
| | D: cheap + 4B RM | 51% | 0.25 | |
| | E: multi-proposal | 42% | 0.75 | |
|
|
| See [ABLATION_REPORT_v2.md](./ABLATION_REPORT_v2.md) for analysis. |
|
|
| ## v3 Status |
|
|
| | Component | Status | |
| |-----------|--------| |
| | Datasets (SFT, verifier, eval) | β Built | |
| | 1.7B proposer | β Trained | |
| | 4B verifier | β Needs training | |
| | 8B proposer | β Needs training | |
| | Eval runner | β Ready | |
|
|