ibitato/c64-ministral-3-14b-thinking-c64-reasoning-gguf

Overview

GGUF exports of the C64-focused Ministral 3 14B reasoning fine-tune, ready for llama.cpp and Ollama.

Project source code and training pipeline:

https://github.com/ibitato/C64_AI_Companion

Related repositories:

Technical Details

Derived from: mistralai/Ministral-3-14B-Reasoning-2512 + project LoRA adaptation
Context length in GGUF metadata: 262,144 tokens
Architecture in GGUF: mistral3

Training Provenance

DAPT checkpoint used: checkpoint-78
SFT checkpoint used: checkpoint-306
DAPT steps: 78 / 78
SFT steps: 306 / 306
Data splits: DAPT 408/27/45, SFT 1620/204/190
Card generated at (UTC): 2026-03-02T16:47:37.079120+00:00
Source git revision: 13fafe7

Included Files

File	Size
`c64-ministral-3-14b-thinking-c64-F16.gguf`	25.17 GiB
`c64-ministral-3-14b-thinking-c64-Q4_K_M.gguf`	7.67 GiB
`c64-ministral-3-14b-thinking-c64-Q6_K.gguf`	10.33 GiB
`c64-ministral-3-14b-thinking-c64-Q8_0.gguf`	13.37 GiB

Modelfile templates are included for direct Ollama import.

Quick Start

Ollama

ollama create c64-ministral-c64-14b -f Modelfile.Q4_K_M
ollama create c64-ministral-c64-14b-q6 -f Modelfile.Q6_K
ollama create c64-ministral-c64-14b-q8 -f Modelfile.Q8_0

llama.cpp

llama-cli -m c64-ministral-3-14b-thinking-c64-Q6_K.gguf -ngl 99 -c 4096 -n 256 -p "Explain VIC-II timing."

llama-server (OpenAI-compatible API / GUI reasoning panel)

python3 scripts/prompt_contract.py --model-profile 14b --print-full > .cache/runtime/c64_system_prompt_14b.txt
llama-server \
  -hf ibitato/c64-ministral-3-14b-thinking-c64-reasoning-gguf:F16 \
  --host 0.0.0.0 --port 8080 \
  --jinja \
  --reasoning-format deepseek \
  --reasoning-budget -1 \
  --system-prompt-file .cache/runtime/c64_system_prompt_14b.txt \
  --ctx-size 32768 \
  -ngl 99 \
  --temp 0.15 \
  --threads "$(nproc)" \
  --fit on

Use --reasoning-format none for raw [THINK]...[/THINK] tags in content instead of separated reasoning fields.

Reasoning Validation Snapshot

Validation status: FAIL
Source artifacts: results/reasoning_validation/14b/20260302_152057

Note: contract/format retention passed; failure is due to strict exact-token determinism (hash mismatch across repeated same-seed runs).

Metric	Value
single_think_tag_rate	1.0000
single_balanced_tag_rate	1.0000
single_final_after_think_rate	1.0000
multi_turn_retention_rate	1.0000
format_contract_pass_rate	1.0000
exact_hash_match_rate	0.3403
semantic_similarity_avg	0.9956
crash_or_timeout_rate	0.0000

Reference Throughput (project benchmark)

Measured via benchmark_gguf_matrix.sh on the infrastructure below.

Infrastructure used:

Host OS: Fedora Linux 43 (Server Edition)
Host kernel: 6.18.8-200.fc43.x86_64
CPU: AMD RYZEN AI MAX+ 395 (16C/32T)
System RAM: 30 GiB
GPU: AMD Radeon 8060S (96.00 GiB VRAM visible to PyTorch)
Container image: rocm/pytorch:rocm7.2_ubuntu24.04_py3.12_pytorch_release_2.9.1
llama.cpp revision: 2afcdb9
Benchmark command source: scripts/inference/benchmark_gguf_matrix.sh