Wrench 35B — Purpose-Built Agentic Model

A LoRA fine-tuned version of Qwen3.5-35B-A3B (MoE, 3B active parameters), purpose-built for tool calling, error recovery, and system prompt following. Runs on 16GB VRAM.

Benchmarks

Benchmark	Score	Details
Clank Agentic Benchmark	118/120 (98.3%)	40-prompt, 8-category tool-calling evaluation
BFCL non_live/AST	82.0% (1128/1390)	Berkeley Function Calling Leaderboard — independent standardized benchmark

Clank Benchmark — Category Breakdown

Category	Score	Max
Basic Tool Use	15	15
Multi-Step Tasks	15	15
Error Recovery	14	15
Response Quality	15	15
System Prompt Following	15	15
Planning & Reasoning	15	15
Tool Format Correctness	14	15
Safety & Restraint	15	15
Total	118	120

BFCL — Independent Validation

Tested on the Berkeley Function Calling Leaderboard non_live/AST category — 1,390 test cases across 7 categories. This is an independent, standardized benchmark not designed by us.

Category	Accuracy	Correct/Total
Simple (Python)	84.75%	339/400
Simple (Java)	44.0%	44/100
Simple (JavaScript)	56.0%	28/50
Multiple	84.5%	169/200
Parallel	85.0%	170/200
Parallel Multiple	82.5%	165/200
Irrelevance Detection	88.75%	213/240
Overall	82.0%	1128/1390

vs. Frontier Models

Model	Clank Benchmark	Runs On	Cost
Wrench 35B v7	118/120	16GB GPU	Free
Claude Opus 4.6	~118/120	Cloud	Paid
Claude Sonnet 4.6	~114/120	Cloud	$20/mo
GPT-4o	~110/120	Cloud	$20/mo
Base Qwen 3.5 35B	~55/120	16GB GPU	Free

Quick Start

Ollama (recommended)

Download the GGUF and Modelfile from the Files tab, then:

ollama create wrench -f Modelfile
ollama run wrench

llama.cpp

./llama-server -m wrench-35B-A3B-Q4_K_M.gguf --jinja -ngl 100 -fa on \
  --temp 0.4 --top-k 20 --top-p 0.95 --min-p 0 --presence-penalty 1.5 -c 32768

With Clank Gateway

npm install -g @clanklabs/clank
clank setup
# Set primary model to "ollama/wrench" in config

Model Details


Base Model	Qwen3.5-35B-A3B (MoE — 35B total, 3B active)
Fine-Tune Method	LoRA (rank 64, alpha 128) via HuggingFace PEFT
Training Data	1,252 examples across 15 categories
Hardware	2x NVIDIA H100 80GB
Training Time	~1 hour
Final Loss	0.1592
Quantization	Q4_K_M GGUF (~20GB)
Context Window	8,192 tokens (expandable to 32K)
License	Apache 2.0

Training Data

All training data is published and auditable: ClankLabs/wrench-training-data

1,252 examples across 15 categories including tool calling, error recovery, multi-step chains, system prompt following, safety, planning, and frontier-gap targeting (uncertainty calibration, constraint following, strategy revision, long-context multiturn).

Model tree for ClankLabs/Wrench-35B-A3B-Q4_K_M-GGUF

Base model

Qwen/Qwen3.5-35B-A3B-Base

Finetuned

Qwen/Qwen3.5-35B-A3B

Quantized

(243)

this model

ClankLabs
/

Wrench-35B-A3B-Q4_K_M-GGUF