Ornstein-26B-A4B-it-FP8

FP8 (8-bit floating point) quantization of DJLougen/Ornstein-26B-A4B-it - a reasoning-focused fine-tune of Gemma 4 26B-A4B-it trained on 1,229 high-quality reasoning traces curated through a custom Drift Diffusion Modeling (DDM) pipeline.

Support This Work

I'm a PhD student in visual neuroscience at the University of Toronto who also happens to spend way too much time fine-tuning, merging, and quantizing open-weight models on rented H100s and a local DGX Spark. All training compute is self-funded - balancing GPU costs against a student budget. If my models have been useful to you, consider buying a PhD student a coffee. It goes a long way toward keeping these experiments running.

Support on Ko-fi

About This Quantization

This is the FP8 quantization of Ornstein-26B-A4B-it via bitsandbytes. FP8 cuts memory usage roughly in half compared to FP16/BF16 while preserving near-lossless quality - the best option when you want maximum fidelity at reduced VRAM.

Format	Notes
FP8	8-bit floating point via bitsandbytes. Near-lossless quality. Requires GPU with FP8 support (Ada Lovelace, Hopper, Blackwell).

Other formats:

Full precision: DJLougen/Ornstein-26B-A4B-it

NVIDIA FP4: DJLougen/Ornstein-26B-A4B-it-NVFP4

GGUF: DJLougen/Ornstein-26B-A4B-it-GGUF

MLX 4-bit: DJLougen/Ornstein-26B-A4B-it-MLX-4bit

MLX 8-bit: DJLougen/Ornstein-26B-A4B-it-MLX-8bit

What Makes Ornstein Different

Unlike typical reasoning fine-tunes that use large volumes of synthetic data, Ornstein implements quality-over-quantity:

Detects degenerate reasoning: Identifies "fake" reasoning that mimics thought without substance (hedging, restating, circling)
Premium vs. Degenerate split: 799 premium traces + 430 selected degenerate traces = 1,229 total
DDM AUC of 0.9705 separating premium from degenerate reasoning with 99.49% sensitivity
Per-segment loss weighting: An Ornstein-Uhlenbeck stochastic process models quality evolution across each trace, generating loss weights that tell the trainer which parts matter most

The model uses extended thinking with multi-phase reasoning, self-correction, and verification before providing final answers.

Training Data at a Glance

Drift Score Distribution: DDM drift scores for premium (blue) vs degenerate (red) traces, with the fitted threshold at 1.463 cleanly separating the two pools.
Category Mix: Math-heavy (1,016), with code (124), science (45), and logic (44).
Reasoning Depth: Premium traces average ~1,263 words of thinking vs ~281 for degenerate traces, which tend to be shallow repetition.
Difficulty x Pool: The degenerate pool skews toward hard problems where models are most likely to loop or stall.

Training Configuration

Parameter	Value
Base model	`unsloth/gemma-4-26B-A4B-it`
Architecture	Gemma 4 (MoE, 26B total, A4B active)
Method	LoRA (rank 32, alpha 32)
Dropout	0.05
Epochs	1
Training examples	1,229 (DDM-curated)
Max sequence length	8192
Framework	Unsloth

Data Quality Metrics

Metric	Value
Total examples	1,229
Mean thinking depth	~1,667 words
Self-correction present	100% of traces
Verification present	100% of traces
Exploration present	100% of traces
Quality gate pass rate	100%

Recommended Settings

Parameter	Suggested Value
Temperature	0.6
Top-P	0.95
Max tokens	8192
Repeat penalty	1.1

Intended Use

Designed for tasks requiring structured, multi-step reasoning:

Mathematics
Logic problems
Code analysis
Scientific problems
Complex question answering

Limitations

Single epoch training on 1,229 examples means the model retains most base Gemma 4 behavior; the fine-tune primarily shapes reasoning style rather than injecting new knowledge
Language scope: DDM pipeline optimized for English; other languages reflect base model performance
Extended thinking can occasionally loop on adversarial or highly ambiguous prompts

Citation

@misc{ornstein26b,
  author = {DJLougen},
  title = {Ornstein-26B-A4B-it: DDM-Curated Reasoning Fine-Tune of Gemma 4 26B-A4B},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/DJLougen/Ornstein-26B-A4B-it}
}

License

Apache 2.0

Downloads last month: 17

Safetensors

Model size

26B params

Tensor type

F32

F16

Model tree for DJLougen/Ornstein-26B-A4B-it-FP8

Base model

google/gemma-4-26B-A4B-it

Finetuned

unsloth/gemma-4-26B-A4B-it

Adapter

DJLougen/Ornstein-26B-A4B-it

Quantized

(3)

this model

DJLougen
/

Ornstein-26B-A4B-it-FP8