ibitato/c64-ministral-3-14b-thinking-c64-reasoning-gguf
Overview
GGUF exports of the C64-focused Ministral 3 14B reasoning fine-tune, ready for llama.cpp and Ollama.
Project source code and training pipeline:
Related repositories:
- LoRA: https://huggingface.co/ibitato/c64-ministral-3-14b-thinking-c64-reasoning-lora
- GGUF: https://huggingface.co/ibitato/c64-ministral-3-14b-thinking-c64-reasoning-gguf
- Collection: https://huggingface.co/collections/ibitato/c64-ministral-3-14b-thinking-c64-reasoning-69a5bf2535468e14708e19da
Technical Details
- Derived from:
mistralai/Ministral-3-14B-Reasoning-2512+ project LoRA adaptation - Context length in GGUF metadata: 262,144 tokens
- Architecture in GGUF:
mistral3
Training Provenance
- DAPT checkpoint used: checkpoint-78
- SFT checkpoint used: checkpoint-306
- DAPT steps: 78 / 78
- SFT steps: 306 / 306
- Data splits: DAPT 408/27/45, SFT 1620/204/190
- Card generated at (UTC): 2026-03-02T16:47:37.079120+00:00
- Source git revision: 13fafe7
Included Files
| File | Size |
|---|---|
c64-ministral-3-14b-thinking-c64-F16.gguf |
25.17 GiB |
c64-ministral-3-14b-thinking-c64-Q4_K_M.gguf |
7.67 GiB |
c64-ministral-3-14b-thinking-c64-Q6_K.gguf |
10.33 GiB |
c64-ministral-3-14b-thinking-c64-Q8_0.gguf |
13.37 GiB |
Modelfile templates are included for direct Ollama import.
Quick Start
Ollama
ollama create c64-ministral-c64-14b -f Modelfile.Q4_K_M
ollama create c64-ministral-c64-14b-q6 -f Modelfile.Q6_K
ollama create c64-ministral-c64-14b-q8 -f Modelfile.Q8_0
llama.cpp
llama-cli -m c64-ministral-3-14b-thinking-c64-Q6_K.gguf -ngl 99 -c 4096 -n 256 -p "Explain VIC-II timing."
llama-server (OpenAI-compatible API / GUI reasoning panel)
python3 scripts/prompt_contract.py --model-profile 14b --print-full > .cache/runtime/c64_system_prompt_14b.txt
llama-server \
-hf ibitato/c64-ministral-3-14b-thinking-c64-reasoning-gguf:F16 \
--host 0.0.0.0 --port 8080 \
--jinja \
--reasoning-format deepseek \
--reasoning-budget -1 \
--system-prompt-file .cache/runtime/c64_system_prompt_14b.txt \
--ctx-size 32768 \
-ngl 99 \
--temp 0.15 \
--threads "$(nproc)" \
--fit on
Use --reasoning-format none for raw [THINK]...[/THINK] tags in content instead of separated reasoning fields.
Reasoning Validation Snapshot
Validation status:
FAILSource artifacts:
results/reasoning_validation/14b/20260302_152057Note: contract/format retention passed; failure is due to strict exact-token determinism (hash mismatch across repeated same-seed runs).
Metric Value single_think_tag_rate 1.0000 single_balanced_tag_rate 1.0000 single_final_after_think_rate 1.0000 multi_turn_retention_rate 1.0000 format_contract_pass_rate 1.0000 exact_hash_match_rate 0.3403 semantic_similarity_avg 0.9956 crash_or_timeout_rate 0.0000
Reference Throughput (project benchmark)
Measured via benchmark_gguf_matrix.sh on the infrastructure below.
Infrastructure used:
- Host OS: Fedora Linux 43 (Server Edition)
- Host kernel: 6.18.8-200.fc43.x86_64
- CPU: AMD RYZEN AI MAX+ 395 (16C/32T)
- System RAM: 30 GiB
- GPU: AMD Radeon 8060S (96.00 GiB VRAM visible to PyTorch)
- Container image:
rocm/pytorch:rocm7.2_ubuntu24.04_py3.12_pytorch_release_2.9.1 - llama.cpp revision:
2afcdb9 - Benchmark command source:
scripts/inference/benchmark_gguf_matrix.sh
| Quant | tok/s | eval runs | total tokens | GPU max % | VRAM max % | Power max W |
|---|---|---|---|---|---|---|
| F16 | 45.32 | 95 | 111 | 99.00 | 25.00 | 102.10 |
| Q4_K_M | 57.30 | 85 | 101 | 97.00 | 8.00 | 112.07 |
| Q6_K | 222.64 | 1 | 17 | 84.00 | 11.00 | 96.07 |
| Q8_0 | 172.68 | 95 | 111 | 99.00 | 14.00 | 97.08 |
Benchmark source CSV: results/benchmarks/gguf_benchmark_14b_20260302_134011.csv
Benchmark measured at (UTC): 2026-03-02T13:41:12.556669+00:00
Notes:
- Throughput depends on prompt length, generated tokens, and runtime flags.
- Compare rows with similar
eval runsandtotal tokensfor fair conclusions.
- Downloads last month
- 63
4-bit
6-bit
8-bit
16-bit
Model tree for ibitato/c64-ministral-3-14b-thinking-c64-reasoning-gguf
Base model
mistralai/Ministral-3-14B-Base-2512