narcolepticchicken
/

speculative-tool-actions

Model card Files Files and versions

xet

Community

narcolepticchicken commited on 3 days ago

Commit

8911258

verified ·

1 Parent(s): ba7590b

Add comprehensive README

Browse files

Files changed (1) hide show

README.md +108 -0

README.md ADDED Viewed

	@@ -0,0 +1,108 @@

+# Speculative Tool Actions
+**Goal**: Test whether speculative decoding can be adapted from token prediction to agent action prediction.
+**System**: A cheap model proposes candidate next actions (tool call, retrieval, file read/write, repair attempt, verifier call, ask clarification, final answer, BLOCKED). A stronger model or verifier accepts, repairs, or rejects the proposal.
+**Configurations Compared**:
+- **A**: Always strong model (Qwen2.5-7B-Instruct)
+- **B**: Cheap model only (Qwen3-1.7B)
+- **C**: Cheap proposer + strong verifier
+- **D**: Cheap proposer + trained trace judge (reward model)
+- **E**: Multi-proposal reranking (3 cheap + strong scoring)
+## Repository Structure
+| File | Description |
+|------|-------------|
+| `synthetic_data_and_train.py` | End-to-end pipeline: generate data, train proposer, train verifier, evaluate, report |
+| `build_datasets_raw.py` | Dataset builder from SWE-smith + ToolBench |
+| `train_proposer.py` | SFT training script for cheap proposer (Qwen3-1.7B + LoRA) |
+| `train_verifier.py` | Reward model training script for verifier (Qwen3-4B + LoRA) |
+| `eval_runner.py` | Evaluation runner for configs A-E |
+| `pipeline_full.py` | Alternative full pipeline with real datasets |
+## Datasets
+- `narcolepticchicken/speculative-actions-proposer-sft` — SFT dataset for next-action prediction
+- `narcolepticchicken/speculative-actions-verifier-pref` — Preference pairs for verifier training
+- `narcolepticchicken/speculative-actions-eval` — Held-out evaluation set with gold labels
+## Models
+- `narcolepticchicken/speculative-proposer-qwen3-1.7b` — Cheap action proposer
+- `narcolepticchicken/speculative-verifier-qwen3-4b` — Trained trace judge/verifier
+## How to Run
+### Generate Synthetic Data & Full Pipeline
+```bash
+python synthetic_data_and_train.py
+```
+This single script:
+1. Generates 5,500 synthetic agent traces with 9 action types
+2. Splits into proposer SFT, verifier preference, and eval datasets
+3. Pushes datasets to Hub
+4. Trains proposer (Qwen3-1.7B + LoRA)
+5. Trains verifier (Qwen3-4B + LoRA RewardTrainer)
+6. Evaluates all 5 configurations (A-E) on 200 held-out examples
+7. Generates cost-quality frontier and ablation report
+### Run on HF Jobs (GPU)
+```python
+from huggingface_hub import hf_jobs
+hf_jobs.run(
+    script="https://huggingface.co/narcolepticchicken/speculative-tool-actions/blob/main/synthetic_data_and_train.py",
+    dependencies=["datasets","transformers","trl","peft","accelerate","huggingface_hub","trackio","torch"],
+    hardware_flavor="a10g-large",
+    timeout="8h",
+    trackio_project="speculative-tool-actions",
+    trackio_space_id="narcolepticchicken/mlintern-7f3a9c2d",
+)
+```
+## Action Space
+| Action | Description |
+|--------|-------------|
+| `tool_call` | Execute an external tool/API |
+| `retrieval` | Search/retrieve information |
+| `file_read` | Read a file from disk |
+| `file_write` | Write/edit a file |
+| `repair` | Attempt to fix an error/bug |
+| `verifier` | Validate/check correctness |
+| `ask_clarification` | Request more information from user |
+| `final_answer` | Provide final response |
+| `BLOCKED` | Refuse unsafe/invalid action |
+## Research Foundation
+This work builds on:
+- **DualSpec** (arXiv:2603.07416): Heterogeneous action speculation for deep research agents
+- **TinyV** (arXiv:2505.14625): Lightweight LLM-based verifier for RL
+- **Tool-Star** (arXiv:2505.16410): Multi-tool RL with cold-start + self-critic
+- **DeepVerifier** (arXiv:2601.15808): Rubric-guided agent verification
+- **EASD** (arXiv:2512.23765): Entropy-aware speculative decoding
+## Cost Model
+Relative token costs:
+- Strong model (Qwen2.5-7B): input=1.0, output=1.0
+- Cheap model (Qwen3-1.7B): input=0.2, output=0.2
+Cost = input_tokens × input_cost + output_tokens × output_cost
+## Citation
+```bibtex
+@software{speculative_tool_actions_2026,
+  title = {Speculative Tool Actions},
+  author = {ML Intern},
+  year = {2026},
+  url = {https://huggingface.co/narcolepticchicken/speculative-tool-actions}
+}
+```