Wrench 35B β Purpose-Built Agentic Model
A LoRA fine-tuned version of Qwen3.5-35B-A3B (MoE, 3B active parameters), purpose-built for tool calling, error recovery, and system prompt following. Runs on 16GB VRAM.
Benchmarks
| Benchmark | Score | Details |
|---|---|---|
| Clank Agentic Benchmark | 118/120 (98.3%) | 40-prompt, 8-category tool-calling evaluation |
| BFCL non_live/AST | 82.0% (1128/1390) | Berkeley Function Calling Leaderboard β independent standardized benchmark |
Clank Benchmark β Category Breakdown
| Category | Score | Max |
|---|---|---|
| Basic Tool Use | 15 | 15 |
| Multi-Step Tasks | 15 | 15 |
| Error Recovery | 14 | 15 |
| Response Quality | 15 | 15 |
| System Prompt Following | 15 | 15 |
| Planning & Reasoning | 15 | 15 |
| Tool Format Correctness | 14 | 15 |
| Safety & Restraint | 15 | 15 |
| Total | 118 | 120 |
BFCL β Independent Validation
Tested on the Berkeley Function Calling Leaderboard non_live/AST category β 1,390 test cases across 7 categories. This is an independent, standardized benchmark not designed by us.
| Category | Accuracy | Correct/Total |
|---|---|---|
| Simple (Python) | 84.75% | 339/400 |
| Simple (Java) | 44.0% | 44/100 |
| Simple (JavaScript) | 56.0% | 28/50 |
| Multiple | 84.5% | 169/200 |
| Parallel | 85.0% | 170/200 |
| Parallel Multiple | 82.5% | 165/200 |
| Irrelevance Detection | 88.75% | 213/240 |
| Overall | 82.0% | 1128/1390 |
vs. Frontier Models
| Model | Clank Benchmark | Runs On | Cost |
|---|---|---|---|
| Wrench 35B v7 | 118/120 | 16GB GPU | Free |
| Claude Opus 4.6 | ~118/120 | Cloud | Paid |
| Claude Sonnet 4.6 | ~114/120 | Cloud | $20/mo |
| GPT-4o | ~110/120 | Cloud | $20/mo |
| Base Qwen 3.5 35B | ~55/120 | 16GB GPU | Free |
Quick Start
Ollama (recommended)
Download the GGUF and Modelfile from the Files tab, then:
ollama create wrench -f Modelfile
ollama run wrench
llama.cpp
./llama-server -m wrench-35B-A3B-Q4_K_M.gguf --jinja -ngl 100 -fa on \
--temp 0.4 --top-k 20 --top-p 0.95 --min-p 0 --presence-penalty 1.5 -c 32768
With Clank Gateway
npm install -g @clanklabs/clank
clank setup
# Set primary model to "ollama/wrench" in config
Model Details
| Base Model | Qwen3.5-35B-A3B (MoE β 35B total, 3B active) |
| Fine-Tune Method | LoRA (rank 64, alpha 128) via HuggingFace PEFT |
| Training Data | 1,252 examples across 15 categories |
| Hardware | 2x NVIDIA H100 80GB |
| Training Time | ~1 hour |
| Final Loss | 0.1592 |
| Quantization | Q4_K_M GGUF (~20GB) |
| Context Window | 8,192 tokens (expandable to 32K) |
| License | Apache 2.0 |
Training Data
All training data is published and auditable: ClankLabs/wrench-training-data
1,252 examples across 15 categories including tool calling, error recovery, multi-step chains, system prompt following, safety, planning, and frontier-gap targeting (uncertainty calibration, constraint following, strategy revision, long-context multiturn).
Links
- Wrench 9B β 114/120, runs on 8GB VRAM
- Training Data
- Clank Gateway β the AI agent gateway Wrench was built for
- clanklabs.dev/wrench
- Benchmark Methodology
- Downloads last month
- 794
Hardware compatibility
Log In to add your hardware
4-bit