narcolepticchicken commited on
Commit
8911258
·
verified ·
1 Parent(s): ba7590b

Add comprehensive README

Browse files
Files changed (1) hide show
  1. README.md +108 -0
README.md ADDED
@@ -0,0 +1,108 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Speculative Tool Actions
2
+
3
+ **Goal**: Test whether speculative decoding can be adapted from token prediction to agent action prediction.
4
+
5
+ **System**: A cheap model proposes candidate next actions (tool call, retrieval, file read/write, repair attempt, verifier call, ask clarification, final answer, BLOCKED). A stronger model or verifier accepts, repairs, or rejects the proposal.
6
+
7
+ **Configurations Compared**:
8
+ - **A**: Always strong model (Qwen2.5-7B-Instruct)
9
+ - **B**: Cheap model only (Qwen3-1.7B)
10
+ - **C**: Cheap proposer + strong verifier
11
+ - **D**: Cheap proposer + trained trace judge (reward model)
12
+ - **E**: Multi-proposal reranking (3 cheap + strong scoring)
13
+
14
+ ## Repository Structure
15
+
16
+ | File | Description |
17
+ |------|-------------|
18
+ | `synthetic_data_and_train.py` | End-to-end pipeline: generate data, train proposer, train verifier, evaluate, report |
19
+ | `build_datasets_raw.py` | Dataset builder from SWE-smith + ToolBench |
20
+ | `train_proposer.py` | SFT training script for cheap proposer (Qwen3-1.7B + LoRA) |
21
+ | `train_verifier.py` | Reward model training script for verifier (Qwen3-4B + LoRA) |
22
+ | `eval_runner.py` | Evaluation runner for configs A-E |
23
+ | `pipeline_full.py` | Alternative full pipeline with real datasets |
24
+
25
+ ## Datasets
26
+
27
+ - `narcolepticchicken/speculative-actions-proposer-sft` — SFT dataset for next-action prediction
28
+ - `narcolepticchicken/speculative-actions-verifier-pref` — Preference pairs for verifier training
29
+ - `narcolepticchicken/speculative-actions-eval` — Held-out evaluation set with gold labels
30
+
31
+ ## Models
32
+
33
+ - `narcolepticchicken/speculative-proposer-qwen3-1.7b` — Cheap action proposer
34
+ - `narcolepticchicken/speculative-verifier-qwen3-4b` — Trained trace judge/verifier
35
+
36
+ ## How to Run
37
+
38
+ ### Generate Synthetic Data & Full Pipeline
39
+
40
+ ```bash
41
+ python synthetic_data_and_train.py
42
+ ```
43
+
44
+ This single script:
45
+ 1. Generates 5,500 synthetic agent traces with 9 action types
46
+ 2. Splits into proposer SFT, verifier preference, and eval datasets
47
+ 3. Pushes datasets to Hub
48
+ 4. Trains proposer (Qwen3-1.7B + LoRA)
49
+ 5. Trains verifier (Qwen3-4B + LoRA RewardTrainer)
50
+ 6. Evaluates all 5 configurations (A-E) on 200 held-out examples
51
+ 7. Generates cost-quality frontier and ablation report
52
+
53
+ ### Run on HF Jobs (GPU)
54
+
55
+ ```python
56
+ from huggingface_hub import hf_jobs
57
+
58
+ hf_jobs.run(
59
+ script="https://huggingface.co/narcolepticchicken/speculative-tool-actions/blob/main/synthetic_data_and_train.py",
60
+ dependencies=["datasets","transformers","trl","peft","accelerate","huggingface_hub","trackio","torch"],
61
+ hardware_flavor="a10g-large",
62
+ timeout="8h",
63
+ trackio_project="speculative-tool-actions",
64
+ trackio_space_id="narcolepticchicken/mlintern-7f3a9c2d",
65
+ )
66
+ ```
67
+
68
+ ## Action Space
69
+
70
+ | Action | Description |
71
+ |--------|-------------|
72
+ | `tool_call` | Execute an external tool/API |
73
+ | `retrieval` | Search/retrieve information |
74
+ | `file_read` | Read a file from disk |
75
+ | `file_write` | Write/edit a file |
76
+ | `repair` | Attempt to fix an error/bug |
77
+ | `verifier` | Validate/check correctness |
78
+ | `ask_clarification` | Request more information from user |
79
+ | `final_answer` | Provide final response |
80
+ | `BLOCKED` | Refuse unsafe/invalid action |
81
+
82
+ ## Research Foundation
83
+
84
+ This work builds on:
85
+ - **DualSpec** (arXiv:2603.07416): Heterogeneous action speculation for deep research agents
86
+ - **TinyV** (arXiv:2505.14625): Lightweight LLM-based verifier for RL
87
+ - **Tool-Star** (arXiv:2505.16410): Multi-tool RL with cold-start + self-critic
88
+ - **DeepVerifier** (arXiv:2601.15808): Rubric-guided agent verification
89
+ - **EASD** (arXiv:2512.23765): Entropy-aware speculative decoding
90
+
91
+ ## Cost Model
92
+
93
+ Relative token costs:
94
+ - Strong model (Qwen2.5-7B): input=1.0, output=1.0
95
+ - Cheap model (Qwen3-1.7B): input=0.2, output=0.2
96
+
97
+ Cost = input_tokens × input_cost + output_tokens × output_cost
98
+
99
+ ## Citation
100
+
101
+ ```bibtex
102
+ @software{speculative_tool_actions_2026,
103
+ title = {Speculative Tool Actions},
104
+ author = {ML Intern},
105
+ year = {2026},
106
+ url = {https://huggingface.co/narcolepticchicken/speculative-tool-actions}
107
+ }
108
+ ```