rad-model

QLoRA fine-tune of Qwen 2.5 7B Instruct for Radicle and Git tool calling. Distributed as a GGUF file for local inference with llama.cpp.

Intended use

Tool-calling backend for CLI assistants that work with Radicle (decentralized code collaboration) and Git. The model selects and parameterizes the right CLI tool given a natural language request.

Training

Method: QLoRA (r=32, alpha=64) via Unsloth
Dataset: h-d-h/rad-model-dataset — ~870 synthetic tool-calling examples covering 89 tools
Hardware: NVIDIA RTX 3090 (24 GB VRAM)
Quantization: Q4_K_M (~4.5 GB)

Serving

llama-server -m rad-model-run6-q4_k_m.gguf --port 8080 -ngl 99 --host 0.0.0.0

The model serves an OpenAI-compatible /v1/chat/completions endpoint with tool-calling support.

Evaluation

Evaluated on 88 held-out examples (stratified across all 89 tools). Scoring: 1.0 = correct tool + arguments, 0.75 = correct tool + extra args, 0.5 = correct tool + wrong args, 0.0 = wrong tool or no tool call.

See RESULTS.md for full experiment history.

Limitations

Trained on synthetic data only; may not handle ambiguous real-world requests well
Tool descriptions heavily influence accuracy — the base model with good descriptions can outperform the fine-tune on some tasks
English only
Designed for single-turn or short multi-turn tool-calling; not a general chat model

Source

Developed on Radicle: rad:z2YCwgkXrZkUTu8c4CQayvk9Pkpky

License

Apache-2.0

Downloads last month: 13

GGUF

Model size

8B params

Architecture

qwen2

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for h-d-h/rad-model

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Finetuned

unsloth/Qwen2.5-7B-Instruct

Quantized

(9)

this model

h-d-h
/

rad-model