Ornstein-26B-A4B-it-GGUF

GGUF quantizations of DJLougen/Ornstein-26B-A4B-it - a reasoning-focused fine-tune of Gemma 4 26B-A4B-it trained on 1,229 high-quality reasoning traces curated through a custom Drift Diffusion Modeling (DDM) pipeline.

Support This Work

I'm a PhD student in visual neuroscience at the University of Toronto who also happens to spend way too much time fine-tuning, merging, and quantizing open-weight models on rented H100s and a local DGX Spark. All training compute is self-funded - balancing GPU costs against a student budget. If my uploads have been useful to you, consider buying a PhD student a coffee. It goes a long way toward keeping these experiments running.

Support on Ko-fi

What Makes Ornstein Different

Unlike typical reasoning fine-tunes that use large volumes of synthetic data, Ornstein implements quality-over-quantity:

Detects degenerate reasoning: Identifies "fake" reasoning that mimics thought without substance (hedging, restating, circling)
Premium vs. Degenerate split: 799 premium traces + 430 selected degenerate traces = 1,229 total
DDM AUC of 0.9705 separating premium from degenerate reasoning with 99.49% sensitivity
Per-segment loss weighting: An Ornstein-Uhlenbeck stochastic process models quality evolution across each trace, generating loss weights that tell the trainer which parts matter most

The model uses extended thinking with multi-phase reasoning, self-correction, and verification before providing final answers.

Available Quantizations

Quantization	Use Case
Q8_0	Near-lossless. Best quality for high-end consumer GPUs.
Q6_K	High quality, moderate size reduction.
Q5_K_M	Good balance of quality and size.
Q5_K_S	Lighter 5-bit variant.
Q5_0	Standard 5-bit.
Q4_K_M	Recommended - strong quality/size tradeoff for most users.
Q4_K_S	Lighter 4-bit variant.
Q4_0	Standard 4-bit.
Q3_K_L	Lighter 3-bit, larger variant.
Q3_K_M	Mid 3-bit.
Q3_K_S	Light 3-bit.
Q2_K	Minimal footprint, noticeable quality tradeoff.

For NVIDIA FP4 quantization, see: DJLougen/Ornstein-26B-A4B-it-NVFP4 For full-precision weights: DJLougen/Ornstein-26B-A4B-it

Quick Start

llama.cpp

# Download a quantization (example: Q4_K_M)
huggingface-cli download DJLougen/Ornstein-26B-A4B-it-GGUF Ornstein-26B-A4B-it-Q4_K_M.gguf --local-dir .

# Run with llama.cpp
./llama-cli -m Ornstein-26B-A4B-it-Q4_K_M.gguf \
  -p "You are a helpful reasoning assistant." \
  --temp 0.6 -n 8192

Ollama

# Create a Modelfile
cat <<EOF > Modelfile
FROM ./Ornstein-26B-A4B-it-Q4_K_M.gguf
PARAMETER temperature 0.6
PARAMETER num_predict 8192
SYSTEM "You are a helpful reasoning assistant."
EOF

ollama create ornstein -f Modelfile
ollama run ornstein

LM Studio

Download the desired quantization from the Files tab
Load it in LM Studio
Set context length to 8192 for full reasoning depth

Training Data at a Glance

Drift Score Distribution: DDM drift scores for premium (blue) vs degenerate (red) traces, with the fitted threshold at 1.463 cleanly separating the two pools.
Category Mix: Math-heavy (1,016), with code (124), science (45), and logic (44).
Reasoning Depth: Premium traces average ~1,263 words of thinking vs ~281 for degenerate traces, which tend to be shallow repetition.
Difficulty x Pool: The degenerate pool skews toward hard problems where models are most likely to loop or stall.

About the Model

Ornstein-26B-A4B-it is trained on just 1,229 examples - but every one of them passed through a rigorous quality pipeline. Drift Diffusion Modeling (DDM) decomposes reasoning traces into segments and tracks whether the model is demonstrating genuine cognitive progress or just generating filler. Only traces that show real reasoning - self-correction, verification, exploration of alternatives - make it into the training set.

The hybrid curation pipeline uses AUC-based classification for the keep/discard decision, then an Ornstein-Uhlenbeck stochastic process for per-segment loss weighting. A trace that dips in quality but self-corrects is treated differently from one that sustains a collapse - the OU mean-reversion parameter captures this distinction naturally.

Key numbers from the curation pipeline:

AUC 0.9705 separating premium from degenerate reasoning
99.49% sensitivity catching low-quality traces
~5% false positive rate on genuine reasoning

Training Configuration

Parameter	Value
Base model	Gemma 4 26B-A4B-it (via Unsloth)
Architecture	Gemma 4 (MoE, 26B total, A4B active)
Method	LoRA (rank 32, alpha 32)
Dropout	0.05
Epochs	1
Training examples	1,229 (DDM-curated)
Max sequence length	8192

Data Quality Metrics

Metric	Value
Total examples	1,229
Mean thinking depth	~1,667 words
Self-correction present	100% of traces
Verification present	100% of traces
Exploration present	100% of traces
Quality gate pass rate	100%

Recommended Settings

Parameter	Suggested Value
Temperature	0.6
Top-P	0.95
Max tokens	8192
Repeat penalty	1.1

Quantization Notes

Q4_K_M is the sweet spot for most hardware - minimal perceptible quality loss on reasoning tasks
Q8_0 recommended if you have the VRAM and want maximum fidelity
Q2_K for minimum footprint, but expect some degradation on complex reasoning chains

Intended Use

Designed for tasks requiring structured, multi-step reasoning:

Mathematics
Logic problems
Code analysis
Scientific problems
Complex question answering

Limitations

Single epoch training on 1,229 examples means the model retains most base Gemma 4 behavior; the fine-tune primarily shapes reasoning style rather than injecting new knowledge
Language scope: DDM pipeline optimized for English; other languages reflect base model performance
Extended thinking can occasionally loop on adversarial or highly ambiguous prompts

Citation

@misc{ornstein26b,
  author = {DJLougen},
  title = {Ornstein-26B-A4B-it: DDM-Curated Reasoning Fine-Tune of Gemma 4 26B-A4B},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/DJLougen/Ornstein-26B-A4B-it}
}

License

Apache 2.0

Downloads last month: 2,584

GGUF

Model size

25B params

Architecture

gemma4

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Model tree for DJLougen/Ornstein-26B-A4B-it-GGUF

Base model

google/gemma-4-26B-A4B-it

Finetuned

unsloth/gemma-4-26B-A4B-it

Adapter

DJLougen/Ornstein-26B-A4B-it

Quantized

(3)

this model

DJLougen
/

Ornstein-26B-A4B-it-GGUF