Ornstein-26B-A4B-it-GGUF
GGUF quantizations of DJLougen/Ornstein-26B-A4B-it - a reasoning-focused fine-tune of Gemma 4 26B-A4B-it trained on 1,229 high-quality reasoning traces curated through a custom Drift Diffusion Modeling (DDM) pipeline.
Support This Work
I'm a PhD student in visual neuroscience at the University of Toronto who also happens to spend way too much time fine-tuning, merging, and quantizing open-weight models on rented H100s and a local DGX Spark. All training compute is self-funded - balancing GPU costs against a student budget. If my uploads have been useful to you, consider buying a PhD student a coffee. It goes a long way toward keeping these experiments running.
What Makes Ornstein Different
Unlike typical reasoning fine-tunes that use large volumes of synthetic data, Ornstein implements quality-over-quantity:
- Detects degenerate reasoning: Identifies "fake" reasoning that mimics thought without substance (hedging, restating, circling)
- Premium vs. Degenerate split: 799 premium traces + 430 selected degenerate traces = 1,229 total
- DDM AUC of 0.9705 separating premium from degenerate reasoning with 99.49% sensitivity
- Per-segment loss weighting: An Ornstein-Uhlenbeck stochastic process models quality evolution across each trace, generating loss weights that tell the trainer which parts matter most
The model uses extended thinking with multi-phase reasoning, self-correction, and verification before providing final answers.
Available Quantizations
| Quantization | Use Case |
|---|---|
| Q8_0 | Near-lossless. Best quality for high-end consumer GPUs. |
| Q6_K | High quality, moderate size reduction. |
| Q5_K_M | Good balance of quality and size. |
| Q5_K_S | Lighter 5-bit variant. |
| Q5_0 | Standard 5-bit. |
| Q4_K_M | Recommended - strong quality/size tradeoff for most users. |
| Q4_K_S | Lighter 4-bit variant. |
| Q4_0 | Standard 4-bit. |
| Q3_K_L | Lighter 3-bit, larger variant. |
| Q3_K_M | Mid 3-bit. |
| Q3_K_S | Light 3-bit. |
| Q2_K | Minimal footprint, noticeable quality tradeoff. |
For NVIDIA FP4 quantization, see: DJLougen/Ornstein-26B-A4B-it-NVFP4 For full-precision weights: DJLougen/Ornstein-26B-A4B-it
Quick Start
llama.cpp
# Download a quantization (example: Q4_K_M)
huggingface-cli download DJLougen/Ornstein-26B-A4B-it-GGUF Ornstein-26B-A4B-it-Q4_K_M.gguf --local-dir .
# Run with llama.cpp
./llama-cli -m Ornstein-26B-A4B-it-Q4_K_M.gguf \
-p "You are a helpful reasoning assistant." \
--temp 0.6 -n 8192
Ollama
# Create a Modelfile
cat <<EOF > Modelfile
FROM ./Ornstein-26B-A4B-it-Q4_K_M.gguf
PARAMETER temperature 0.6
PARAMETER num_predict 8192
SYSTEM "You are a helpful reasoning assistant."
EOF
ollama create ornstein -f Modelfile
ollama run ornstein
LM Studio
- Download the desired quantization from the Files tab
- Load it in LM Studio
- Set context length to 8192 for full reasoning depth
Training Data at a Glance
- Drift Score Distribution: DDM drift scores for premium (blue) vs degenerate (red) traces, with the fitted threshold at 1.463 cleanly separating the two pools.
- Category Mix: Math-heavy (1,016), with code (124), science (45), and logic (44).
- Reasoning Depth: Premium traces average ~1,263 words of thinking vs ~281 for degenerate traces, which tend to be shallow repetition.
- Difficulty x Pool: The degenerate pool skews toward hard problems where models are most likely to loop or stall.
About the Model
Ornstein-26B-A4B-it is trained on just 1,229 examples - but every one of them passed through a rigorous quality pipeline. Drift Diffusion Modeling (DDM) decomposes reasoning traces into segments and tracks whether the model is demonstrating genuine cognitive progress or just generating filler. Only traces that show real reasoning - self-correction, verification, exploration of alternatives - make it into the training set.
The hybrid curation pipeline uses AUC-based classification for the keep/discard decision, then an Ornstein-Uhlenbeck stochastic process for per-segment loss weighting. A trace that dips in quality but self-corrects is treated differently from one that sustains a collapse - the OU mean-reversion parameter captures this distinction naturally.
Key numbers from the curation pipeline:
- AUC 0.9705 separating premium from degenerate reasoning
- 99.49% sensitivity catching low-quality traces
- ~5% false positive rate on genuine reasoning
Training Configuration
| Parameter | Value |
|---|---|
| Base model | Gemma 4 26B-A4B-it (via Unsloth) |
| Architecture | Gemma 4 (MoE, 26B total, A4B active) |
| Method | LoRA (rank 32, alpha 32) |
| Dropout | 0.05 |
| Epochs | 1 |
| Training examples | 1,229 (DDM-curated) |
| Max sequence length | 8192 |
Data Quality Metrics
| Metric | Value |
|---|---|
| Total examples | 1,229 |
| Mean thinking depth | ~1,667 words |
| Self-correction present | 100% of traces |
| Verification present | 100% of traces |
| Exploration present | 100% of traces |
| Quality gate pass rate | 100% |
Recommended Settings
| Parameter | Suggested Value |
|---|---|
| Temperature | 0.6 |
| Top-P | 0.95 |
| Max tokens | 8192 |
| Repeat penalty | 1.1 |
Quantization Notes
- Q4_K_M is the sweet spot for most hardware - minimal perceptible quality loss on reasoning tasks
- Q8_0 recommended if you have the VRAM and want maximum fidelity
- Q2_K for minimum footprint, but expect some degradation on complex reasoning chains
Intended Use
Designed for tasks requiring structured, multi-step reasoning:
- Mathematics
- Logic problems
- Code analysis
- Scientific problems
- Complex question answering
Limitations
- Single epoch training on 1,229 examples means the model retains most base Gemma 4 behavior; the fine-tune primarily shapes reasoning style rather than injecting new knowledge
- Language scope: DDM pipeline optimized for English; other languages reflect base model performance
- Extended thinking can occasionally loop on adversarial or highly ambiguous prompts
Citation
@misc{ornstein26b,
author = {DJLougen},
title = {Ornstein-26B-A4B-it: DDM-Curated Reasoning Fine-Tune of Gemma 4 26B-A4B},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/DJLougen/Ornstein-26B-A4B-it}
}
Links
- Full-precision model: DJLougen/Ornstein-26B-A4B-it
- NVIDIA FP4: DJLougen/Ornstein-26B-A4B-it-NVFP4
- Base model: Google Gemma 4 26B-A4B-it | unsloth/gemma-4-26B-A4B-it
License
Apache 2.0
- Downloads last month
- 2,584
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
Model tree for DJLougen/Ornstein-26B-A4B-it-GGUF
Base model
google/gemma-4-26B-A4B-it