narcolepticchicken's picture
Upload README.md
853c597 verified
# Speculative Tool Actions
Investigating whether speculative decoding can be adapted from token prediction to agent action prediction.
**Current state:** v2 evaluation complete (see [ABLATION_REPORT_v2.md](./ABLATION_REPORT_v2.md)). v3 datasets + 1.7B proposer trained. **Need:** train 4B verifier + 8B proposer, then run eval.
## Quick Start: Complete the Project
### One-command training (A100-large, ~2h):
```bash
python train_all_v3.py
```
Or via HF Jobs:
```python
hf_jobs(operation="run", script="https://hf.co/narcolepticchicken/speculative-tool-actions/resolve/main/train_all_v3.py",
dependencies=["transformers>=4.51","trl","torch","datasets","accelerate","peft","huggingface_hub"],
hardware_flavor="a100-large", timeout="12h")
```
### Then evaluate:
```bash
python eval_runner_v3.py
```
## Architecture
A cheap model (Qwen3-1.7B LoRA) proposes the next agent action. A verifier (Qwen3-4B LoRA) accepts or rejects. On rejection, fall back to the expensive 8B model.
**Action space:** `tool_call`, `retrieval`, `file_read`, `file_write`, `repair`, `verifier`, `ask_clarification`, `final_answer`, `BLOCKED`
## Files
| File | Purpose |
|------|---------|
| `train_all_v3.py` | Consolidated: trains 1.7B+4B+8B sequentially |
| `train_sft_v3.py` | Individual proposer training |
| `train_verifier_v3.py` | Individual verifier training |
| `eval_runner_v3.py` | All-5-configs evaluation |
| `PROJECT_REPORT.md` | Full project documentation + v2 results |
| `ABLATION_REPORT_v2.md` | v2 analysis (51% cheap vs 40% frozen 8B) |
| `eval_results_v2.json` | v2 raw results |
## v2 Results
| Config | Acc | Cost |
|--------|-----|------|
| A: 8B frozen | 40% | 1.00 |
| B: 1.7B cheap | **51%** | **0.15** |
| D: cheap + 4B RM | 51% | 0.25 |
| E: multi-proposal | 42% | 0.75 |
See [ABLATION_REPORT_v2.md](./ABLATION_REPORT_v2.md) for analysis.
## v3 Status
| Component | Status |
|-----------|--------|
| Datasets (SFT, verifier, eval) | βœ“ Built |
| 1.7B proposer | βœ“ Trained |
| 4B verifier | βœ— Needs training |
| 8B proposer | βœ— Needs training |
| Eval runner | βœ“ Ready |