Ornstein-27B-v2-GGUF
GGUF quantizations of DJLougen/Ornstein-27B-v2 — a reasoning-focused fine-tune of Qwen 3.5 27B trained on high-quality reasoning traces curated through a custom Drift Diffusion Modeling (DDM) pipeline. Second iteration of the Ornstein series with improved segment-level reasoning modeling.
Support This Work
I'm a PhD student in visual neuroscience at the University of Toronto who also happens to spend way too much time fine-tuning, merging, and quantizing open-weight models on rented H100s and a local DGX Spark. All training compute is self-funded — balancing GPU costs against a student budget. If my uploads have been useful to you, consider buying a PhD student a coffee. It goes a long way toward keeping these experiments running.
What Makes Ornstein Different
Unlike typical reasoning fine-tunes that use large volumes of synthetic data, Ornstein implements quality-over-quantity:
- Detects degenerate reasoning: Identifies "fake" reasoning that mimics thought without substance (hedging, restating, circling)
- Premium vs. Degenerate split: DDM pipeline cleanly separates premium from degenerate reasoning traces
- High-fidelity curation: Near-perfect AUC separating premium from degenerate reasoning with >99% sensitivity
- Improved segment modeling (v2): Enhanced modeling of individual reasoning segments for finer-grained quality separation
The model uses <think>...</think> blocks for extended multi-phase reasoning with self-correction and verification before providing final answers.
Available Quantizations
High Quality (Q4+)
| Quantization | BPW | Size | Use Case |
|---|---|---|---|
| Q8_0 | 8.50 | 26.6 GB | Near-lossless, good for high-end consumer GPUs |
| Q6_K | 6.57 | 20.6 GB | High quality |
| Q5_K_M | 5.69 | 17.9 GB | Good balance |
| Q4_K_M | 4.89 | 15.4 GB | Recommended — strong quality/size tradeoff |
| IQ4_NL | 4.70 | 15 GB | High-quality 4-bit with importance matrix |
| Q4_K_S | 4.63 | 15 GB | Smaller Q4 K-quant |
| Q4_0 | 4.60 | 15 GB | Legacy 4-bit |
| IQ4_XS | 4.48 | 15 GB | Smallest 4-bit with importance matrix |
Medium (Q3)
| Quantization | BPW | Size | Use Case |
|---|---|---|---|
| Q3_K_L | 4.26 | 14 GB | Best Q3 quality |
| Q3_K_M | 3.95 | 13 GB | Mid-range Q3 |
| IQ3_M | 3.74 | 12 GB | IQ3 with importance matrix |
| IQ3_S | 3.69 | 12 GB | Smaller IQ3 |
| Q3_K_S | 3.59 | 12 GB | Smallest K-quant Q3 |
| IQ3_XXS | 3.32 | 11 GB | Ultra-small IQ3 |
Low (Q2 and below)
| Quantization | BPW | Size | Use Case |
|---|---|---|---|
| Q2_K | 3.18 | 10 GB | Best Q2 quality |
| IQ2_M | 2.97 | 9.4 GB | IQ2 with importance matrix |
| IQ2_S | 2.78 | 8.8 GB | Smaller IQ2 |
| IQ2_XXS | 2.51 | 7.9 GB | Ultra-small IQ2 |
| IQ1_M | 2.27 | 7.2 GB | Extreme compression |
| IQ1_S | 2.12 | 6.7 GB | Maximum compression — experimental |
IQ quants use an importance matrix for optimized weight distribution. Lower BPW quants (IQ2 and below) trade quality for significantly reduced memory — best suited for experimentation or constrained hardware.
Quick Start
llama.cpp
# Download a quantization (example: Q4_K_M)
huggingface-cli download DJLougen/Ornstein-27B-v2-GGUF Ornstein-27B-v2-Q4_K_M.gguf --local-dir .
# Run with llama.cpp
./llama-cli -m Ornstein-27B-v2-Q4_K_M.gguf \
-p "You are a helpful reasoning assistant." \
--temp 0.6 -n 8192
Ollama
# Create a Modelfile
cat <<EOF > Modelfile
FROM ./Ornstein-27B-v2-Q4_K_M.gguf
PARAMETER temperature 0.6
PARAMETER num_predict 8192
SYSTEM "You are a helpful reasoning assistant."
EOF
ollama create ornstein-v2 -f Modelfile
ollama run ornstein-v2
LM Studio
- Download the desired quantization from the Files tab
- Load it in LM Studio
- Set context length to 8192 for full reasoning depth
Recommended Settings
| Parameter | Suggested Value |
|---|---|
| Temperature | 0.6 |
| Top-P | 0.95 |
| Max Tokens | 8192 |
| Repeat Penalty | 1.1 |
Training Details
| Parameter | Value |
|---|---|
| Base Model | unsloth/Qwen3.5-27B |
| Parameters | 27B |
| Method | LoRA (rank 32, alpha 32) |
| Dropout | 0.05 |
| Epochs | 1 |
| Learning Rate | 1e-4 (cosine schedule, 10% warmup) |
| Max Sequence Length | 8192 |
| Micro Batch Size | 1 |
| Gradient Accumulation | 4 steps |
| Weight Decay | 0.01 |
| LoRA Targets | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Framework | Unsloth |
Intended Use
Designed for tasks requiring structured, multi-step reasoning:
- Mathematics
- Logic problems
- Code analysis
- Scientific problems
- Complex question answering
Limitations
- Single epoch training means the model retains most base Qwen 3.5 27B behavior; the fine-tune primarily shapes reasoning style rather than injecting new knowledge
- Language scope: DDM pipeline optimized for English; other languages reflect base model performance
- Edge cases: Extended thinking can occasionally loop on adversarial or highly ambiguous prompts
Citation
@misc{ornstein27bv2,
author = {DJLougen},
title = {Ornstein-27B-v2: DDM-Curated Reasoning Fine-Tune of Qwen 3.5 27B},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/DJLougen/Ornstein-27B-v2}
}
Links
- Full-precision model: DJLougen/Ornstein-27B-v2
- Original Ornstein-27B: DJLougen/Ornstein-27B | GGUF
- Base model: Qwen/Qwen3.5-27B | unsloth/Qwen3.5-27B
- Downloads last month
- 6,638
1-bit
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
