Upload RELATED_WORK.md
Browse files- RELATED_WORK.md +43 -0
RELATED_WORK.md
ADDED
|
@@ -0,0 +1,43 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Speculative Tool Actions β Related Work
|
| 2 |
+
|
| 3 |
+
## Foundational
|
| 4 |
+
|
| 5 |
+
**SpecInfer (2023)** β *Accelerating Generative LLM Serving with Tree-based Speculative Inference* (arxiv:2305.09781). Introduced token-tree speculative decoding. Multiple draft tokens from small models verified in parallel by the large model. Basis for all subsequent speculative decoding work.
|
| 6 |
+
|
| 7 |
+
**SuffixDecoding (2024)** β *SuffixDecoding: Speeding Up Large Language Model Inference with Tree-structured Suffix-based Drafting* (arxiv:2411.04975). Applied speculative decoding to agentic workloads by caching action suffixes and reusing them as drafts. Achieved up to 5.3Γ speedup on tool-calling benchmarks.
|
| 8 |
+
|
| 9 |
+
## Heterogeneous Speculation for Agents (2025-2026)
|
| 10 |
+
|
| 11 |
+
**DualSpec (Mar 2026)** β *DualSpec: Accelerating Deep Research Agents via Dual-Process Action Speculation* (arxiv:2603.07416). Closest work to ours. Uses heterogeneous speculation: large model handles high-entropy "Search" actions; small model drafts low-entropy "Visit" actions. Semantic verifier (prompt-based, not trained) accepts/rejects drafts. **1.33-3.28Γ end-to-end speedup** with no pass@1 loss on GAIA, XBench-DeepSearch, Seal-0. Key insight: **entropy-based action partitioning** β some actions need System 2, others are fine with System 1.
|
| 12 |
+
|
| 13 |
+
**Our contribution vs DualSpec:** We train a separate verifier model (SFT on ACCEPT/REJECT pairs) instead of using prompt-based critics. This should be both faster and more accurate. We also use a single cheap model for all actions rather than partitioning by type.
|
| 14 |
+
|
| 15 |
+
**DSP (Aug 2025)** β *Dynamic Speculative Agent Planning* (arxiv:2509.01920). Uses online RL to predict how many speculative steps the drafter can produce correctly. Models optimal k as varying from 1-5 even within a single task (mean variance 1.46). **~2Γ latency reduction** with **30% lower total cost**.
|
| 16 |
+
|
| 17 |
+
**Our contribution vs DSP:** DSP uses exact action matching; we train a learned verifier. DSP predicts k dynamically; we use single-action proposal + verify (k=1 by design). The cost breakdown analysis in DSP directly motivates our approach: draft prompt tokens dominate waste cost.
|
| 18 |
+
|
| 19 |
+
**SpecEyes (Mar 2026)** β *SpecEyes: Accelerating Agentic Multimodal LLMs* (arxiv:2603.23483). For multimodal agents: lightweight MLLM drafts answers; "cognitive gating" via answer separability score decides accept/fallback. **1.1-3.35Γ speedup** with **+6.7% accuracy gain**.
|
| 20 |
+
|
| 21 |
+
## Small Models for Tool Use
|
| 22 |
+
|
| 23 |
+
**TinyAgent (Sep 2024)** β *TinyAgent: Function Calling at the Edge* (arxiv:2409.00608). TinyLlama-1.1B and Wizard-2-7B **match or surpass GPT-4-Turbo** on function calling via SFT on LLMCompiler-format traces + Tool RAG. Demonstrates that small models can be extremely capable for tool use when trained properly.
|
| 24 |
+
|
| 25 |
+
**SLM for Agentic Systems Survey (Oct 2025)** β *Small Language Models for Agentic Systems* (arxiv:2510.03847). Formalizes the SLM-default/LLM-fallback pattern with uncertainty-aware routing and verifier cascades. Recommends schema-first prompting, type-safe function registries, and LoRA adaptation.
|
| 26 |
+
|
| 27 |
+
## Learned Routers and Verifiers
|
| 28 |
+
|
| 29 |
+
**RouteLLM (Jun 2024)** β *RouteLLM: Learning to Route LLMs with Preference Data* (arxiv:2406.18665). Trains routers (BERT, causal LLM, matrix factorization) on Chatbot Arena preference data. **>2Γ cost reduction** with no quality degradation. BERT router achieves best cost-quality tradeoff.
|
| 30 |
+
|
| 31 |
+
**Internal Representation Hallucination Detection (Jan 2026)** β *Internal Representations as Indicators of Hallucinations in Agent Tool Selection* (arxiv:2601.05214). **86.4% accuracy** detecting tool-calling hallucinations in a single forward pass via 2-layer MLP on final hidden states. Validates the approach of using separate classifiers to verify tool-call quality.
|
| 32 |
+
|
| 33 |
+
## How Our Work Differs
|
| 34 |
+
|
| 35 |
+
| Aspect | Prior Work | Our Work |
|
| 36 |
+
|--------|-----------|----------|
|
| 37 |
+
| Speculation target | Tokens (SpecInfer), actions+reasoning (DualSpec), multi-step plans (DSP) | Single next action type |
|
| 38 |
+
| Verifier type | Exact match (DSP), prompt-based critic (DualSpec), confidence heuristic (SpecEyes) | **Trained SFT classifier** |
|
| 39 |
+
| Model sizes | 72B+8B or 32B+4B asymmetric pairs | 8B + 1.7B + 4B verifier |
|
| 40 |
+
| Training data | Proprietary (DualSpec, DSP) or synthesized (TinyAgent) | ToolBench-derived, open |
|
| 41 |
+
| Safety focus | None | Explicit BLOCKED action + unsafe-action avoidance metric |
|
| 42 |
+
|
| 43 |
+
**Novel contribution:** First system to use a **trained SFT verifier** for speculative tool action proposal verification. All prior work uses either exact matching, prompt-based critics, or confidence heuristics. Our approach has better accuracy:latency ratio by design.
|