Wrench 35B β€” Purpose-Built Agentic Model

A LoRA fine-tuned version of Qwen3.5-35B-A3B (MoE, 3B active parameters), purpose-built for tool calling, error recovery, and system prompt following. Runs on 16GB VRAM.

Benchmarks

Benchmark Score Details
Clank Agentic Benchmark 118/120 (98.3%) 40-prompt, 8-category tool-calling evaluation
BFCL non_live/AST 82.0% (1128/1390) Berkeley Function Calling Leaderboard β€” independent standardized benchmark

Clank Benchmark β€” Category Breakdown

Category Score Max
Basic Tool Use 15 15
Multi-Step Tasks 15 15
Error Recovery 14 15
Response Quality 15 15
System Prompt Following 15 15
Planning & Reasoning 15 15
Tool Format Correctness 14 15
Safety & Restraint 15 15
Total 118 120

BFCL β€” Independent Validation

Tested on the Berkeley Function Calling Leaderboard non_live/AST category β€” 1,390 test cases across 7 categories. This is an independent, standardized benchmark not designed by us.

Category Accuracy Correct/Total
Simple (Python) 84.75% 339/400
Simple (Java) 44.0% 44/100
Simple (JavaScript) 56.0% 28/50
Multiple 84.5% 169/200
Parallel 85.0% 170/200
Parallel Multiple 82.5% 165/200
Irrelevance Detection 88.75% 213/240
Overall 82.0% 1128/1390

vs. Frontier Models

Model Clank Benchmark Runs On Cost
Wrench 35B v7 118/120 16GB GPU Free
Claude Opus 4.6 ~118/120 Cloud Paid
Claude Sonnet 4.6 ~114/120 Cloud $20/mo
GPT-4o ~110/120 Cloud $20/mo
Base Qwen 3.5 35B ~55/120 16GB GPU Free

Quick Start

Ollama (recommended)

Download the GGUF and Modelfile from the Files tab, then:

ollama create wrench -f Modelfile
ollama run wrench

llama.cpp

./llama-server -m wrench-35B-A3B-Q4_K_M.gguf --jinja -ngl 100 -fa on \
  --temp 0.4 --top-k 20 --top-p 0.95 --min-p 0 --presence-penalty 1.5 -c 32768

With Clank Gateway

npm install -g @clanklabs/clank
clank setup
# Set primary model to "ollama/wrench" in config

Model Details

Base Model Qwen3.5-35B-A3B (MoE β€” 35B total, 3B active)
Fine-Tune Method LoRA (rank 64, alpha 128) via HuggingFace PEFT
Training Data 1,252 examples across 15 categories
Hardware 2x NVIDIA H100 80GB
Training Time ~1 hour
Final Loss 0.1592
Quantization Q4_K_M GGUF (~20GB)
Context Window 8,192 tokens (expandable to 32K)
License Apache 2.0

Training Data

All training data is published and auditable: ClankLabs/wrench-training-data

1,252 examples across 15 categories including tool calling, error recovery, multi-step chains, system prompt following, safety, planning, and frontier-gap targeting (uncertainty calibration, constraint following, strategy revision, long-context multiturn).

Links

Downloads last month
794
GGUF
Model size
35B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ClankLabs/Wrench-35B-A3B-Q4_K_M-GGUF

Quantized
(243)
this model