ibitato/c64-ministral-3-8b-thinking-c64-reasoning-lora

Overview

This repository contains the LoRA adapter produced by fine-tuning a reasoning-capable Ministral 3 8B model on technical Commodore 64 material.

Objective:

  • keep the reasoning behavior of the base model,
  • inject C64-specific technical knowledge,
  • support practical troubleshooting and low-level explanations (BASIC, KERNAL, memory map, VIC-II, SID, 6502/6510).

Project source code and pipeline:

Related repositories:

Base Model

  • Base model: mistralai/Ministral-3-8B-Reasoning-2512
  • Architecture: Mistral3ForConditionalGeneration (language component fine-tuned with LoRA)
  • Max context length: 262,144 tokens (from text_config.max_position_embeddings)

Training Data (project-local corpus)

  • DAPT: train=408, validation=27, test=45
  • SFT: train=1620, validation=204, test=190
  • Sources: curated Commodore 64 manuals and technical documents from this project.

Training Recipe

  • Pipeline: DAPT + SFT (LoRA)
  • Precision: bf16
  • Max sequence length: 2048
  • Batch size per device: 2
  • Gradient accumulation: 16
  • Learning rate: 2e-05
  • Epochs: 3.0
  • Warmup steps: 100
  • Logging/save/eval steps: 50/500/250
  • Optimizer: adamw_torch, eval strategy: steps
  • Gradient checkpointing: False
  • LoRA: r=16, alpha=32, dropout=0.05, scope=language_qkvo
  • SFT options: assistant_only_loss=True, packing=False

Run summary:

  • DAPT checkpoint: checkpoint-39
  • SFT checkpoint: checkpoint-153
  • DAPT global steps: 39 / 39
  • SFT global steps: 153 / 153
  • Last logged SFT step: 150
  • Last logged SFT loss: 0.07340381622314453
  • Last logged SFT token accuracy: 0.9800586728965606
  • Card generated at (UTC): 2026-03-02T16:47:20.093292+00:00
  • Source git revision: 13fafe7

Reasoning Validation Snapshot

  • Validation status: PASS
  • Source artifacts: results/reasoning_validation/8b/20260302_151302
Metric Value
single_think_tag_rate 1.0000
single_balanced_tag_rate 1.0000
single_final_after_think_rate 1.0000
multi_turn_retention_rate 1.0000
format_contract_pass_rate 1.0000
exact_hash_match_rate 1.0000
semantic_similarity_avg 1.0000
crash_or_timeout_rate 0.0000

Usage (Transformers + PEFT)

import torch
from peft import PeftModel
from transformers import AutoTokenizer
from transformers.models.mistral3 import Mistral3ForConditionalGeneration

base_id = "mistralai/Ministral-3-8B-Reasoning-2512"
adapter_id = "ibitato/c64-ministral-3-8b-thinking-c64-reasoning-lora"

tokenizer = AutoTokenizer.from_pretrained(base_id, trust_remote_code=True)
base_model = Mistral3ForConditionalGeneration.from_pretrained(
    base_id,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(base_model, adapter_id)

prompt = "Explain the C64 SID chip in one concise paragraph."
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
    out = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(out[0], skip_special_tokens=True))

Limitations

  • This adapter is specialized for C64 technical assistance, not for broad benchmark optimization.
  • No additional safety fine-tuning was applied beyond the base model behavior.
  • Evaluation in this repo focuses on training diagnostics; downstream benchmarking should be done per target use case.
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ibitato/c64-ministral-3-8b-thinking-c64-reasoning-lora

Adapter
(2)
this model
Quantizations
1 model

Collection including ibitato/c64-ministral-3-8b-thinking-c64-reasoning-lora