ibitato/c64-ministral-3-14b-thinking-c64-reasoning-gguf

Overview

GGUF exports of the C64-focused Ministral 3 14B reasoning fine-tune, ready for llama.cpp and Ollama.

Project source code and training pipeline:

Related repositories:

Technical Details

  • Derived from: mistralai/Ministral-3-14B-Reasoning-2512 + project LoRA adaptation
  • Context length in GGUF metadata: 262,144 tokens
  • Architecture in GGUF: mistral3

Training Provenance

  • DAPT checkpoint used: checkpoint-78
  • SFT checkpoint used: checkpoint-306
  • DAPT steps: 78 / 78
  • SFT steps: 306 / 306
  • Data splits: DAPT 408/27/45, SFT 1620/204/190
  • Card generated at (UTC): 2026-03-02T16:47:37.079120+00:00
  • Source git revision: 13fafe7

Included Files

File Size
c64-ministral-3-14b-thinking-c64-F16.gguf 25.17 GiB
c64-ministral-3-14b-thinking-c64-Q4_K_M.gguf 7.67 GiB
c64-ministral-3-14b-thinking-c64-Q6_K.gguf 10.33 GiB
c64-ministral-3-14b-thinking-c64-Q8_0.gguf 13.37 GiB

Modelfile templates are included for direct Ollama import.

Quick Start

Ollama

ollama create c64-ministral-c64-14b -f Modelfile.Q4_K_M
ollama create c64-ministral-c64-14b-q6 -f Modelfile.Q6_K
ollama create c64-ministral-c64-14b-q8 -f Modelfile.Q8_0

llama.cpp

llama-cli -m c64-ministral-3-14b-thinking-c64-Q6_K.gguf -ngl 99 -c 4096 -n 256 -p "Explain VIC-II timing."

llama-server (OpenAI-compatible API / GUI reasoning panel)

python3 scripts/prompt_contract.py --model-profile 14b --print-full > .cache/runtime/c64_system_prompt_14b.txt
llama-server \
  -hf ibitato/c64-ministral-3-14b-thinking-c64-reasoning-gguf:F16 \
  --host 0.0.0.0 --port 8080 \
  --jinja \
  --reasoning-format deepseek \
  --reasoning-budget -1 \
  --system-prompt-file .cache/runtime/c64_system_prompt_14b.txt \
  --ctx-size 32768 \
  -ngl 99 \
  --temp 0.15 \
  --threads "$(nproc)" \
  --fit on

Use --reasoning-format none for raw [THINK]...[/THINK] tags in content instead of separated reasoning fields.

Reasoning Validation Snapshot

  • Validation status: FAIL

  • Source artifacts: results/reasoning_validation/14b/20260302_152057

  • Note: contract/format retention passed; failure is due to strict exact-token determinism (hash mismatch across repeated same-seed runs).

    Metric Value
    single_think_tag_rate 1.0000
    single_balanced_tag_rate 1.0000
    single_final_after_think_rate 1.0000
    multi_turn_retention_rate 1.0000
    format_contract_pass_rate 1.0000
    exact_hash_match_rate 0.3403
    semantic_similarity_avg 0.9956
    crash_or_timeout_rate 0.0000

Reference Throughput (project benchmark)

Measured via benchmark_gguf_matrix.sh on the infrastructure below.

Infrastructure used:

  • Host OS: Fedora Linux 43 (Server Edition)
  • Host kernel: 6.18.8-200.fc43.x86_64
  • CPU: AMD RYZEN AI MAX+ 395 (16C/32T)
  • System RAM: 30 GiB
  • GPU: AMD Radeon 8060S (96.00 GiB VRAM visible to PyTorch)
  • Container image: rocm/pytorch:rocm7.2_ubuntu24.04_py3.12_pytorch_release_2.9.1
  • llama.cpp revision: 2afcdb9
  • Benchmark command source: scripts/inference/benchmark_gguf_matrix.sh
Quant tok/s eval runs total tokens GPU max % VRAM max % Power max W
F16 45.32 95 111 99.00 25.00 102.10
Q4_K_M 57.30 85 101 97.00 8.00 112.07
Q6_K 222.64 1 17 84.00 11.00 96.07
Q8_0 172.68 95 111 99.00 14.00 97.08

Benchmark source CSV: results/benchmarks/gguf_benchmark_14b_20260302_134011.csv

Benchmark measured at (UTC): 2026-03-02T13:41:12.556669+00:00

Notes:

  • Throughput depends on prompt length, generated tokens, and runtime flags.
  • Compare rows with similar eval runs and total tokens for fair conclusions.
Downloads last month
63
GGUF
Model size
14B params
Architecture
mistral3
Hardware compatibility
Log In to add your hardware

4-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ibitato/c64-ministral-3-14b-thinking-c64-reasoning-gguf

Collection including ibitato/c64-ministral-3-14b-thinking-c64-reasoning-gguf