Speculative Tool Actions

Investigating whether speculative decoding can be adapted from token prediction to agent action prediction.

Current state: v2 evaluation complete (see ABLATION_REPORT_v2.md). v3 datasets + 1.7B proposer trained. Need: train 4B verifier + 8B proposer, then run eval.

Quick Start: Complete the Project

One-command training (A100-large, ~2h):

python train_all_v3.py

Or via HF Jobs:

hf_jobs(operation="run", script="https://hf.co/narcolepticchicken/speculative-tool-actions/resolve/main/train_all_v3.py",
        dependencies=["transformers>=4.51","trl","torch","datasets","accelerate","peft","huggingface_hub"],
        hardware_flavor="a100-large", timeout="12h")

Then evaluate:

python eval_runner_v3.py

Architecture

A cheap model (Qwen3-1.7B LoRA) proposes the next agent action. A verifier (Qwen3-4B LoRA) accepts or rejects. On rejection, fall back to the expensive 8B model.

Action space: tool_call, retrieval, file_read, file_write, repair, verifier, ask_clarification, final_answer, BLOCKED

Files

File	Purpose
`train_all_v3.py`	Consolidated: trains 1.7B+4B+8B sequentially
`train_sft_v3.py`	Individual proposer training
`train_verifier_v3.py`	Individual verifier training
`eval_runner_v3.py`	All-5-configs evaluation
`PROJECT_REPORT.md`	Full project documentation + v2 results
`ABLATION_REPORT_v2.md`	v2 analysis (51% cheap vs 40% frozen 8B)
`eval_results_v2.json`	v2 raw results

v2 Results

Config	Acc	Cost
A: 8B frozen	40%	1.00
B: 1.7B cheap	51%	0.15
D: cheap + 4B RM	51%	0.25
E: multi-proposal	42%	0.75

See ABLATION_REPORT_v2.md for analysis.

v3 Status

Component	Status
Datasets (SFT, verifier, eval)	✓ Built
1.7B proposer	✓ Trained
4B verifier	✗ Needs training
8B proposer	✗ Needs training
Eval runner	✓ Ready