File size: 2,113 Bytes
8911258
 
c079f25
8911258
853c597
8911258
853c597
8911258
853c597
8911258
853c597
8911258
 
853c597
 
 
 
 
c079f25
8911258
853c597
c079f25
853c597
c079f25
8911258
853c597
c079f25
853c597
e125869
853c597
e125869
853c597
e125869
853c597
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
# Speculative Tool Actions

Investigating whether speculative decoding can be adapted from token prediction to agent action prediction.

**Current state:** v2 evaluation complete (see [ABLATION_REPORT_v2.md](./ABLATION_REPORT_v2.md)). v3 datasets + 1.7B proposer trained. **Need:** train 4B verifier + 8B proposer, then run eval.

## Quick Start: Complete the Project

### One-command training (A100-large, ~2h):
```bash
python train_all_v3.py
```

Or via HF Jobs:
```python
hf_jobs(operation="run", script="https://hf.co/narcolepticchicken/speculative-tool-actions/resolve/main/train_all_v3.py",
        dependencies=["transformers>=4.51","trl","torch","datasets","accelerate","peft","huggingface_hub"],
        hardware_flavor="a100-large", timeout="12h")
```

### Then evaluate:
```bash
python eval_runner_v3.py
```

## Architecture

A cheap model (Qwen3-1.7B LoRA) proposes the next agent action. A verifier (Qwen3-4B LoRA) accepts or rejects. On rejection, fall back to the expensive 8B model.

**Action space:** `tool_call`, `retrieval`, `file_read`, `file_write`, `repair`, `verifier`, `ask_clarification`, `final_answer`, `BLOCKED`

## Files

| File | Purpose |
|------|---------|
| `train_all_v3.py` | Consolidated: trains 1.7B+4B+8B sequentially |
| `train_sft_v3.py` | Individual proposer training |
| `train_verifier_v3.py` | Individual verifier training |
| `eval_runner_v3.py` | All-5-configs evaluation |
| `PROJECT_REPORT.md` | Full project documentation + v2 results |
| `ABLATION_REPORT_v2.md` | v2 analysis (51% cheap vs 40% frozen 8B) |
| `eval_results_v2.json` | v2 raw results |

## v2 Results

| Config | Acc | Cost |
|--------|-----|------|
| A: 8B frozen | 40% | 1.00 |
| B: 1.7B cheap | **51%** | **0.15** |
| D: cheap + 4B RM | 51% | 0.25 |
| E: multi-proposal | 42% | 0.75 |

See [ABLATION_REPORT_v2.md](./ABLATION_REPORT_v2.md) for analysis.

## v3 Status

| Component | Status |
|-----------|--------|
| Datasets (SFT, verifier, eval) | ✓ Built |
| 1.7B proposer | ✓ Trained |
| 4B verifier | ✗ Needs training |
| 8B proposer | ✗ Needs training |
| Eval runner | ✓ Ready |