Speculative Tool Actions
Investigating whether speculative decoding can be adapted from token prediction to agent action prediction.
Current state: v2 evaluation complete (see ABLATION_REPORT_v2.md). v3 datasets + 1.7B proposer trained. Need: train 4B verifier + 8B proposer, then run eval.
Quick Start: Complete the Project
One-command training (A100-large, ~2h):
python train_all_v3.py
Or via HF Jobs:
hf_jobs(operation="run", script="https://hf.co/narcolepticchicken/speculative-tool-actions/resolve/main/train_all_v3.py",
dependencies=["transformers>=4.51","trl","torch","datasets","accelerate","peft","huggingface_hub"],
hardware_flavor="a100-large", timeout="12h")
Then evaluate:
python eval_runner_v3.py
Architecture
A cheap model (Qwen3-1.7B LoRA) proposes the next agent action. A verifier (Qwen3-4B LoRA) accepts or rejects. On rejection, fall back to the expensive 8B model.
Action space: tool_call, retrieval, file_read, file_write, repair, verifier, ask_clarification, final_answer, BLOCKED
Files
| File | Purpose |
|---|---|
train_all_v3.py |
Consolidated: trains 1.7B+4B+8B sequentially |
train_sft_v3.py |
Individual proposer training |
train_verifier_v3.py |
Individual verifier training |
eval_runner_v3.py |
All-5-configs evaluation |
PROJECT_REPORT.md |
Full project documentation + v2 results |
ABLATION_REPORT_v2.md |
v2 analysis (51% cheap vs 40% frozen 8B) |
eval_results_v2.json |
v2 raw results |
v2 Results
| Config | Acc | Cost |
|---|---|---|
| A: 8B frozen | 40% | 1.00 |
| B: 1.7B cheap | 51% | 0.15 |
| D: cheap + 4B RM | 51% | 0.25 |
| E: multi-proposal | 42% | 0.75 |
See ABLATION_REPORT_v2.md for analysis.
v3 Status
| Component | Status |
|---|---|
| Datasets (SFT, verifier, eval) | ✓ Built |
| 1.7B proposer | ✓ Trained |
| 4B verifier | ✗ Needs training |
| 8B proposer | ✗ Needs training |
| Eval runner | ✓ Ready |