Ornstein-26B-A4B-it

Ornstein-26B-A4B-it-FP8

FP8 (8-bit floating point) quantization of DJLougen/Ornstein-26B-A4B-it - a reasoning-focused fine-tune of Gemma 4 26B-A4B-it trained on 1,229 high-quality reasoning traces curated through a custom Drift Diffusion Modeling (DDM) pipeline.

Support This Work

I'm a PhD student in visual neuroscience at the University of Toronto who also happens to spend way too much time fine-tuning, merging, and quantizing open-weight models on rented H100s and a local DGX Spark. All training compute is self-funded - balancing GPU costs against a student budget. If my models have been useful to you, consider buying a PhD student a coffee. It goes a long way toward keeping these experiments running.

Support on Ko-fi


About This Quantization

This is the FP8 quantization of Ornstein-26B-A4B-it via bitsandbytes. FP8 cuts memory usage roughly in half compared to FP16/BF16 while preserving near-lossless quality - the best option when you want maximum fidelity at reduced VRAM.

Format Notes
FP8 8-bit floating point via bitsandbytes. Near-lossless quality. Requires GPU with FP8 support (Ada Lovelace, Hopper, Blackwell).

Other formats:


What Makes Ornstein Different

Unlike typical reasoning fine-tunes that use large volumes of synthetic data, Ornstein implements quality-over-quantity:

  • Detects degenerate reasoning: Identifies "fake" reasoning that mimics thought without substance (hedging, restating, circling)
  • Premium vs. Degenerate split: 799 premium traces + 430 selected degenerate traces = 1,229 total
  • DDM AUC of 0.9705 separating premium from degenerate reasoning with 99.49% sensitivity
  • Per-segment loss weighting: An Ornstein-Uhlenbeck stochastic process models quality evolution across each trace, generating loss weights that tell the trainer which parts matter most

The model uses extended thinking with multi-phase reasoning, self-correction, and verification before providing final answers.


Training Data at a Glance

Training Data Profile

  • Drift Score Distribution: DDM drift scores for premium (blue) vs degenerate (red) traces, with the fitted threshold at 1.463 cleanly separating the two pools.
  • Category Mix: Math-heavy (1,016), with code (124), science (45), and logic (44).
  • Reasoning Depth: Premium traces average ~1,263 words of thinking vs ~281 for degenerate traces, which tend to be shallow repetition.
  • Difficulty x Pool: The degenerate pool skews toward hard problems where models are most likely to loop or stall.

Training Configuration

Parameter Value
Base model unsloth/gemma-4-26B-A4B-it
Architecture Gemma 4 (MoE, 26B total, A4B active)
Method LoRA (rank 32, alpha 32)
Dropout 0.05
Epochs 1
Training examples 1,229 (DDM-curated)
Max sequence length 8192
Framework Unsloth

Data Quality Metrics

Metric Value
Total examples 1,229
Mean thinking depth ~1,667 words
Self-correction present 100% of traces
Verification present 100% of traces
Exploration present 100% of traces
Quality gate pass rate 100%

Recommended Settings

Parameter Suggested Value
Temperature 0.6
Top-P 0.95
Max tokens 8192
Repeat penalty 1.1

Intended Use

Designed for tasks requiring structured, multi-step reasoning:

  • Mathematics
  • Logic problems
  • Code analysis
  • Scientific problems
  • Complex question answering

Limitations

  • Single epoch training on 1,229 examples means the model retains most base Gemma 4 behavior; the fine-tune primarily shapes reasoning style rather than injecting new knowledge
  • Language scope: DDM pipeline optimized for English; other languages reflect base model performance
  • Extended thinking can occasionally loop on adversarial or highly ambiguous prompts

Citation

@misc{ornstein26b,
  author = {DJLougen},
  title = {Ornstein-26B-A4B-it: DDM-Curated Reasoning Fine-Tune of Gemma 4 26B-A4B},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/DJLougen/Ornstein-26B-A4B-it}
}

Links

License

Apache 2.0

Downloads last month
17
Safetensors
Model size
26B params
Tensor type
F32
·
F16
·
I8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DJLougen/Ornstein-26B-A4B-it-FP8

Quantized
(3)
this model