narcolepticchicken's picture
Upload README.md
853c597 verified

Speculative Tool Actions

Investigating whether speculative decoding can be adapted from token prediction to agent action prediction.

Current state: v2 evaluation complete (see ABLATION_REPORT_v2.md). v3 datasets + 1.7B proposer trained. Need: train 4B verifier + 8B proposer, then run eval.

Quick Start: Complete the Project

One-command training (A100-large, ~2h):

python train_all_v3.py

Or via HF Jobs:

hf_jobs(operation="run", script="https://hf.co/narcolepticchicken/speculative-tool-actions/resolve/main/train_all_v3.py",
        dependencies=["transformers>=4.51","trl","torch","datasets","accelerate","peft","huggingface_hub"],
        hardware_flavor="a100-large", timeout="12h")

Then evaluate:

python eval_runner_v3.py

Architecture

A cheap model (Qwen3-1.7B LoRA) proposes the next agent action. A verifier (Qwen3-4B LoRA) accepts or rejects. On rejection, fall back to the expensive 8B model.

Action space: tool_call, retrieval, file_read, file_write, repair, verifier, ask_clarification, final_answer, BLOCKED

Files

File Purpose
train_all_v3.py Consolidated: trains 1.7B+4B+8B sequentially
train_sft_v3.py Individual proposer training
train_verifier_v3.py Individual verifier training
eval_runner_v3.py All-5-configs evaluation
PROJECT_REPORT.md Full project documentation + v2 results
ABLATION_REPORT_v2.md v2 analysis (51% cheap vs 40% frozen 8B)
eval_results_v2.json v2 raw results

v2 Results

Config Acc Cost
A: 8B frozen 40% 1.00
B: 1.7B cheap 51% 0.15
D: cheap + 4B RM 51% 0.25
E: multi-proposal 42% 0.75

See ABLATION_REPORT_v2.md for analysis.

v3 Status

Component Status
Datasets (SFT, verifier, eval) ✓ Built
1.7B proposer ✓ Trained
4B verifier ✗ Needs training
8B proposer ✗ Needs training
Eval runner ✓ Ready