Ornstein-27B-v2-GGUF

GGUF quantizations of DJLougen/Ornstein-27B-v2 — a reasoning-focused fine-tune of Qwen 3.5 27B trained on high-quality reasoning traces curated through a custom Drift Diffusion Modeling (DDM) pipeline. Second iteration of the Ornstein series with improved segment-level reasoning modeling.

Ornstein-27B-v2

Support This Work

I'm a PhD student in visual neuroscience at the University of Toronto who also happens to spend way too much time fine-tuning, merging, and quantizing open-weight models on rented H100s and a local DGX Spark. All training compute is self-funded — balancing GPU costs against a student budget. If my uploads have been useful to you, consider buying a PhD student a coffee. It goes a long way toward keeping these experiments running.

Support on Ko-fi



What Makes Ornstein Different

Unlike typical reasoning fine-tunes that use large volumes of synthetic data, Ornstein implements quality-over-quantity:

  • Detects degenerate reasoning: Identifies "fake" reasoning that mimics thought without substance (hedging, restating, circling)
  • Premium vs. Degenerate split: DDM pipeline cleanly separates premium from degenerate reasoning traces
  • High-fidelity curation: Near-perfect AUC separating premium from degenerate reasoning with >99% sensitivity
  • Improved segment modeling (v2): Enhanced modeling of individual reasoning segments for finer-grained quality separation

The model uses <think>...</think> blocks for extended multi-phase reasoning with self-correction and verification before providing final answers.


Available Quantizations

High Quality (Q4+)

Quantization BPW Size Use Case
Q8_0 8.50 26.6 GB Near-lossless, good for high-end consumer GPUs
Q6_K 6.57 20.6 GB High quality
Q5_K_M 5.69 17.9 GB Good balance
Q4_K_M 4.89 15.4 GB Recommended — strong quality/size tradeoff
IQ4_NL 4.70 15 GB High-quality 4-bit with importance matrix
Q4_K_S 4.63 15 GB Smaller Q4 K-quant
Q4_0 4.60 15 GB Legacy 4-bit
IQ4_XS 4.48 15 GB Smallest 4-bit with importance matrix

Medium (Q3)

Quantization BPW Size Use Case
Q3_K_L 4.26 14 GB Best Q3 quality
Q3_K_M 3.95 13 GB Mid-range Q3
IQ3_M 3.74 12 GB IQ3 with importance matrix
IQ3_S 3.69 12 GB Smaller IQ3
Q3_K_S 3.59 12 GB Smallest K-quant Q3
IQ3_XXS 3.32 11 GB Ultra-small IQ3

Low (Q2 and below)

Quantization BPW Size Use Case
Q2_K 3.18 10 GB Best Q2 quality
IQ2_M 2.97 9.4 GB IQ2 with importance matrix
IQ2_S 2.78 8.8 GB Smaller IQ2
IQ2_XXS 2.51 7.9 GB Ultra-small IQ2
IQ1_M 2.27 7.2 GB Extreme compression
IQ1_S 2.12 6.7 GB Maximum compression — experimental

IQ quants use an importance matrix for optimized weight distribution. Lower BPW quants (IQ2 and below) trade quality for significantly reduced memory — best suited for experimentation or constrained hardware.


Quick Start

llama.cpp

# Download a quantization (example: Q4_K_M)
huggingface-cli download DJLougen/Ornstein-27B-v2-GGUF Ornstein-27B-v2-Q4_K_M.gguf --local-dir .

# Run with llama.cpp
./llama-cli -m Ornstein-27B-v2-Q4_K_M.gguf \
  -p "You are a helpful reasoning assistant." \
  --temp 0.6 -n 8192

Ollama

# Create a Modelfile
cat <<EOF > Modelfile
FROM ./Ornstein-27B-v2-Q4_K_M.gguf
PARAMETER temperature 0.6
PARAMETER num_predict 8192
SYSTEM "You are a helpful reasoning assistant."
EOF

ollama create ornstein-v2 -f Modelfile
ollama run ornstein-v2

LM Studio

  1. Download the desired quantization from the Files tab
  2. Load it in LM Studio
  3. Set context length to 8192 for full reasoning depth

Recommended Settings

Parameter Suggested Value
Temperature 0.6
Top-P 0.95
Max Tokens 8192
Repeat Penalty 1.1

Training Details

Parameter Value
Base Model unsloth/Qwen3.5-27B
Parameters 27B
Method LoRA (rank 32, alpha 32)
Dropout 0.05
Epochs 1
Learning Rate 1e-4 (cosine schedule, 10% warmup)
Max Sequence Length 8192
Micro Batch Size 1
Gradient Accumulation 4 steps
Weight Decay 0.01
LoRA Targets q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Framework Unsloth

Intended Use

Designed for tasks requiring structured, multi-step reasoning:

  • Mathematics
  • Logic problems
  • Code analysis
  • Scientific problems
  • Complex question answering

Limitations

  • Single epoch training means the model retains most base Qwen 3.5 27B behavior; the fine-tune primarily shapes reasoning style rather than injecting new knowledge
  • Language scope: DDM pipeline optimized for English; other languages reflect base model performance
  • Edge cases: Extended thinking can occasionally loop on adversarial or highly ambiguous prompts

Citation

@misc{ornstein27bv2,
  author = {DJLougen},
  title = {Ornstein-27B-v2: DDM-Curated Reasoning Fine-Tune of Qwen 3.5 27B},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/DJLougen/Ornstein-27B-v2}
}

Links

Downloads last month
6,638
GGUF
Model size
27B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DJLougen/Ornstein-27B-v2-GGUF

Base model

Qwen/Qwen3.5-27B
Quantized
(1)
this model