Upload README.md

853c597 verified about 13 hours ago

2.11 kB

	# Speculative Tool Actions

	Investigating whether speculative decoding can be adapted from token prediction to agent action prediction.

	Current state: v2 evaluation complete (see [ABLATION_REPORT_v2.md](./ABLATION_REPORT_v2.md)). v3 datasets + 1.7B proposer trained. Need: train 4B verifier + 8B proposer, then run eval.

	## Quick Start: Complete the Project

	### One-command training (A100-large, ~2h):
	```bash
	python train_all_v3.py
	```

	Or via HF Jobs:
	```python
	hf_jobs(operation="run", script="https://hf.co/narcolepticchicken/speculative-tool-actions/resolve/main/train_all_v3.py",
	dependencies=["transformers>=4.51","trl","torch","datasets","accelerate","peft","huggingface_hub"],
	hardware_flavor="a100-large", timeout="12h")
	```

	### Then evaluate:
	```bash
	python eval_runner_v3.py
	```

	## Architecture

	A cheap model (Qwen3-1.7B LoRA) proposes the next agent action. A verifier (Qwen3-4B LoRA) accepts or rejects. On rejection, fall back to the expensive 8B model.

	Action space: `tool_call`, `retrieval`, `file_read`, `file_write`, `repair`, `verifier`, `ask_clarification`, `final_answer`, `BLOCKED`

	## Files

	\| File \| Purpose \|
	\|------\|---------\|
	\| `train_all_v3.py` \| Consolidated: trains 1.7B+4B+8B sequentially \|
	\| `train_sft_v3.py` \| Individual proposer training \|
	\| `train_verifier_v3.py` \| Individual verifier training \|
	\| `eval_runner_v3.py` \| All-5-configs evaluation \|
	\| `PROJECT_REPORT.md` \| Full project documentation + v2 results \|
	\| `ABLATION_REPORT_v2.md` \| v2 analysis (51% cheap vs 40% frozen 8B) \|
	\| `eval_results_v2.json` \| v2 raw results \|

	## v2 Results

	\| Config \| Acc \| Cost \|
	\|--------\|-----\|------\|
	\| A: 8B frozen \| 40% \| 1.00 \|
	\| B: 1.7B cheap \| 51% \| 0.15 \|
	\| D: cheap + 4B RM \| 51% \| 0.25 \|
	\| E: multi-proposal \| 42% \| 0.75 \|

	See [ABLATION_REPORT_v2.md](./ABLATION_REPORT_v2.md) for analysis.

	## v3 Status

	\| Component \| Status \|
	\|-----------\|--------\|
	\| Datasets (SFT, verifier, eval) \| ✓ Built \|
	\| 1.7B proposer \| ✓ Trained \|
	\| 4B verifier \| ✗ Needs training \|
	\| 8B proposer \| ✗ Needs training \|
	\| Eval runner \| ✓ Ready \|

	# Speculative Tool Actions

	Investigating whether speculative decoding can be adapted from token prediction to agent action prediction.

	Current state: v2 evaluation complete (see [ABLATION_REPORT_v2.md](./ABLATION_REPORT_v2.md)). v3 datasets + 1.7B proposer trained. Need: train 4B verifier + 8B proposer, then run eval.

	## Quick Start: Complete the Project

	### One-command training (A100-large, ~2h):
	```bash
	python train_all_v3.py
	```

	Or via HF Jobs:
	```python
	hf_jobs(operation="run", script="https://hf.co/narcolepticchicken/speculative-tool-actions/resolve/main/train_all_v3.py",
	dependencies=["transformers>=4.51","trl","torch","datasets","accelerate","peft","huggingface_hub"],
	hardware_flavor="a100-large", timeout="12h")
	```

	### Then evaluate:
	```bash
	python eval_runner_v3.py
	```

	## Architecture

	A cheap model (Qwen3-1.7B LoRA) proposes the next agent action. A verifier (Qwen3-4B LoRA) accepts or rejects. On rejection, fall back to the expensive 8B model.

	Action space: `tool_call`, `retrieval`, `file_read`, `file_write`, `repair`, `verifier`, `ask_clarification`, `final_answer`, `BLOCKED`

	## Files

	\| File \| Purpose \|
	\|------\|---------\|
	\| `train_all_v3.py` \| Consolidated: trains 1.7B+4B+8B sequentially \|
	\| `train_sft_v3.py` \| Individual proposer training \|
	\| `train_verifier_v3.py` \| Individual verifier training \|
	\| `eval_runner_v3.py` \| All-5-configs evaluation \|
	\| `PROJECT_REPORT.md` \| Full project documentation + v2 results \|
	\| `ABLATION_REPORT_v2.md` \| v2 analysis (51% cheap vs 40% frozen 8B) \|
	\| `eval_results_v2.json` \| v2 raw results \|

	## v2 Results

	\| Config \| Acc \| Cost \|
	\|--------\|-----\|------\|
	\| A: 8B frozen \| 40% \| 1.00 \|
	\| B: 1.7B cheap \| 51% \| 0.15 \|
	\| D: cheap + 4B RM \| 51% \| 0.25 \|
	\| E: multi-proposal \| 42% \| 0.75 \|

	See [ABLATION_REPORT_v2.md](./ABLATION_REPORT_v2.md) for analysis.

	## v3 Status

	\| Component \| Status \|
	\|-----------\|--------\|
	\| Datasets (SFT, verifier, eval) \| ✓ Built \|
	\| 1.7B proposer \| ✓ Trained \|
	\| 4B verifier \| ✗ Needs training \|
	\| 8B proposer \| ✗ Needs training \|
	\| Eval runner \| ✓ Ready \|