Ornstein-26B-A4B-it

Ornstein-26B-A4B-it-GGUF

GGUF quantizations of DJLougen/Ornstein-26B-A4B-it - a reasoning-focused fine-tune of Gemma 4 26B-A4B-it trained on 1,229 high-quality reasoning traces curated through a custom Drift Diffusion Modeling (DDM) pipeline.

Support This Work

I'm a PhD student in visual neuroscience at the University of Toronto who also happens to spend way too much time fine-tuning, merging, and quantizing open-weight models on rented H100s and a local DGX Spark. All training compute is self-funded - balancing GPU costs against a student budget. If my uploads have been useful to you, consider buying a PhD student a coffee. It goes a long way toward keeping these experiments running.

Support on Ko-fi


What Makes Ornstein Different

Unlike typical reasoning fine-tunes that use large volumes of synthetic data, Ornstein implements quality-over-quantity:

  • Detects degenerate reasoning: Identifies "fake" reasoning that mimics thought without substance (hedging, restating, circling)
  • Premium vs. Degenerate split: 799 premium traces + 430 selected degenerate traces = 1,229 total
  • DDM AUC of 0.9705 separating premium from degenerate reasoning with 99.49% sensitivity
  • Per-segment loss weighting: An Ornstein-Uhlenbeck stochastic process models quality evolution across each trace, generating loss weights that tell the trainer which parts matter most

The model uses extended thinking with multi-phase reasoning, self-correction, and verification before providing final answers.


Available Quantizations

Quantization Use Case
Q8_0 Near-lossless. Best quality for high-end consumer GPUs.
Q6_K High quality, moderate size reduction.
Q5_K_M Good balance of quality and size.
Q5_K_S Lighter 5-bit variant.
Q5_0 Standard 5-bit.
Q4_K_M Recommended - strong quality/size tradeoff for most users.
Q4_K_S Lighter 4-bit variant.
Q4_0 Standard 4-bit.
Q3_K_L Lighter 3-bit, larger variant.
Q3_K_M Mid 3-bit.
Q3_K_S Light 3-bit.
Q2_K Minimal footprint, noticeable quality tradeoff.

For NVIDIA FP4 quantization, see: DJLougen/Ornstein-26B-A4B-it-NVFP4 For full-precision weights: DJLougen/Ornstein-26B-A4B-it


Quick Start

llama.cpp

# Download a quantization (example: Q4_K_M)
huggingface-cli download DJLougen/Ornstein-26B-A4B-it-GGUF Ornstein-26B-A4B-it-Q4_K_M.gguf --local-dir .

# Run with llama.cpp
./llama-cli -m Ornstein-26B-A4B-it-Q4_K_M.gguf \
  -p "You are a helpful reasoning assistant." \
  --temp 0.6 -n 8192

Ollama

# Create a Modelfile
cat <<EOF > Modelfile
FROM ./Ornstein-26B-A4B-it-Q4_K_M.gguf
PARAMETER temperature 0.6
PARAMETER num_predict 8192
SYSTEM "You are a helpful reasoning assistant."
EOF

ollama create ornstein -f Modelfile
ollama run ornstein

LM Studio

  1. Download the desired quantization from the Files tab
  2. Load it in LM Studio
  3. Set context length to 8192 for full reasoning depth

Training Data at a Glance

Training Data Profile

  • Drift Score Distribution: DDM drift scores for premium (blue) vs degenerate (red) traces, with the fitted threshold at 1.463 cleanly separating the two pools.
  • Category Mix: Math-heavy (1,016), with code (124), science (45), and logic (44).
  • Reasoning Depth: Premium traces average ~1,263 words of thinking vs ~281 for degenerate traces, which tend to be shallow repetition.
  • Difficulty x Pool: The degenerate pool skews toward hard problems where models are most likely to loop or stall.

About the Model

Ornstein-26B-A4B-it is trained on just 1,229 examples - but every one of them passed through a rigorous quality pipeline. Drift Diffusion Modeling (DDM) decomposes reasoning traces into segments and tracks whether the model is demonstrating genuine cognitive progress or just generating filler. Only traces that show real reasoning - self-correction, verification, exploration of alternatives - make it into the training set.

The hybrid curation pipeline uses AUC-based classification for the keep/discard decision, then an Ornstein-Uhlenbeck stochastic process for per-segment loss weighting. A trace that dips in quality but self-corrects is treated differently from one that sustains a collapse - the OU mean-reversion parameter captures this distinction naturally.

Key numbers from the curation pipeline:

  • AUC 0.9705 separating premium from degenerate reasoning
  • 99.49% sensitivity catching low-quality traces
  • ~5% false positive rate on genuine reasoning

Training Configuration

Parameter Value
Base model Gemma 4 26B-A4B-it (via Unsloth)
Architecture Gemma 4 (MoE, 26B total, A4B active)
Method LoRA (rank 32, alpha 32)
Dropout 0.05
Epochs 1
Training examples 1,229 (DDM-curated)
Max sequence length 8192

Data Quality Metrics

Metric Value
Total examples 1,229
Mean thinking depth ~1,667 words
Self-correction present 100% of traces
Verification present 100% of traces
Exploration present 100% of traces
Quality gate pass rate 100%

Recommended Settings

Parameter Suggested Value
Temperature 0.6
Top-P 0.95
Max tokens 8192
Repeat penalty 1.1

Quantization Notes

  • Q4_K_M is the sweet spot for most hardware - minimal perceptible quality loss on reasoning tasks
  • Q8_0 recommended if you have the VRAM and want maximum fidelity
  • Q2_K for minimum footprint, but expect some degradation on complex reasoning chains

Intended Use

Designed for tasks requiring structured, multi-step reasoning:

  • Mathematics
  • Logic problems
  • Code analysis
  • Scientific problems
  • Complex question answering

Limitations

  • Single epoch training on 1,229 examples means the model retains most base Gemma 4 behavior; the fine-tune primarily shapes reasoning style rather than injecting new knowledge
  • Language scope: DDM pipeline optimized for English; other languages reflect base model performance
  • Extended thinking can occasionally loop on adversarial or highly ambiguous prompts

Citation

@misc{ornstein26b,
  author = {DJLougen},
  title = {Ornstein-26B-A4B-it: DDM-Curated Reasoning Fine-Tune of Gemma 4 26B-A4B},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/DJLougen/Ornstein-26B-A4B-it}
}

Links

License

Apache 2.0

Downloads last month
2,584
GGUF
Model size
25B params
Architecture
gemma4
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DJLougen/Ornstein-26B-A4B-it-GGUF

Quantized
(3)
this model