Ornstein-26B-A4B-it-FP8
FP8 (8-bit floating point) quantization of DJLougen/Ornstein-26B-A4B-it - a reasoning-focused fine-tune of Gemma 4 26B-A4B-it trained on 1,229 high-quality reasoning traces curated through a custom Drift Diffusion Modeling (DDM) pipeline.
Support This Work
I'm a PhD student in visual neuroscience at the University of Toronto who also happens to spend way too much time fine-tuning, merging, and quantizing open-weight models on rented H100s and a local DGX Spark. All training compute is self-funded - balancing GPU costs against a student budget. If my models have been useful to you, consider buying a PhD student a coffee. It goes a long way toward keeping these experiments running.
About This Quantization
This is the FP8 quantization of Ornstein-26B-A4B-it via bitsandbytes. FP8 cuts memory usage roughly in half compared to FP16/BF16 while preserving near-lossless quality - the best option when you want maximum fidelity at reduced VRAM.
| Format | Notes |
|---|---|
| FP8 | 8-bit floating point via bitsandbytes. Near-lossless quality. Requires GPU with FP8 support (Ada Lovelace, Hopper, Blackwell). |
Other formats:
- Full precision: DJLougen/Ornstein-26B-A4B-it
- NVIDIA FP4: DJLougen/Ornstein-26B-A4B-it-NVFP4
- GGUF: DJLougen/Ornstein-26B-A4B-it-GGUF
- MLX 4-bit: DJLougen/Ornstein-26B-A4B-it-MLX-4bit
- MLX 8-bit: DJLougen/Ornstein-26B-A4B-it-MLX-8bit
What Makes Ornstein Different
Unlike typical reasoning fine-tunes that use large volumes of synthetic data, Ornstein implements quality-over-quantity:
- Detects degenerate reasoning: Identifies "fake" reasoning that mimics thought without substance (hedging, restating, circling)
- Premium vs. Degenerate split: 799 premium traces + 430 selected degenerate traces = 1,229 total
- DDM AUC of 0.9705 separating premium from degenerate reasoning with 99.49% sensitivity
- Per-segment loss weighting: An Ornstein-Uhlenbeck stochastic process models quality evolution across each trace, generating loss weights that tell the trainer which parts matter most
The model uses extended thinking with multi-phase reasoning, self-correction, and verification before providing final answers.
Training Data at a Glance
- Drift Score Distribution: DDM drift scores for premium (blue) vs degenerate (red) traces, with the fitted threshold at 1.463 cleanly separating the two pools.
- Category Mix: Math-heavy (1,016), with code (124), science (45), and logic (44).
- Reasoning Depth: Premium traces average ~1,263 words of thinking vs ~281 for degenerate traces, which tend to be shallow repetition.
- Difficulty x Pool: The degenerate pool skews toward hard problems where models are most likely to loop or stall.
Training Configuration
| Parameter | Value |
|---|---|
| Base model | unsloth/gemma-4-26B-A4B-it |
| Architecture | Gemma 4 (MoE, 26B total, A4B active) |
| Method | LoRA (rank 32, alpha 32) |
| Dropout | 0.05 |
| Epochs | 1 |
| Training examples | 1,229 (DDM-curated) |
| Max sequence length | 8192 |
| Framework | Unsloth |
Data Quality Metrics
| Metric | Value |
|---|---|
| Total examples | 1,229 |
| Mean thinking depth | ~1,667 words |
| Self-correction present | 100% of traces |
| Verification present | 100% of traces |
| Exploration present | 100% of traces |
| Quality gate pass rate | 100% |
Recommended Settings
| Parameter | Suggested Value |
|---|---|
| Temperature | 0.6 |
| Top-P | 0.95 |
| Max tokens | 8192 |
| Repeat penalty | 1.1 |
Intended Use
Designed for tasks requiring structured, multi-step reasoning:
- Mathematics
- Logic problems
- Code analysis
- Scientific problems
- Complex question answering
Limitations
- Single epoch training on 1,229 examples means the model retains most base Gemma 4 behavior; the fine-tune primarily shapes reasoning style rather than injecting new knowledge
- Language scope: DDM pipeline optimized for English; other languages reflect base model performance
- Extended thinking can occasionally loop on adversarial or highly ambiguous prompts
Citation
@misc{ornstein26b,
author = {DJLougen},
title = {Ornstein-26B-A4B-it: DDM-Curated Reasoning Fine-Tune of Gemma 4 26B-A4B},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/DJLougen/Ornstein-26B-A4B-it}
}
Links
- Full-precision model: DJLougen/Ornstein-26B-A4B-it
- GGUF quantizations: DJLougen/Ornstein-26B-A4B-it-GGUF
- NVIDIA FP4: DJLougen/Ornstein-26B-A4B-it-NVFP4
- Base model: Google Gemma 4 26B-A4B-it | unsloth/gemma-4-26B-A4B-it
License
Apache 2.0
- Downloads last month
- 17
Model tree for DJLougen/Ornstein-26B-A4B-it-FP8
Base model
google/gemma-4-26B-A4B-it