Ornstein-27B-v2-GGUF

GGUF quantizations of DJLougen/Ornstein-27B-v2 — a reasoning-focused fine-tune of Qwen 3.5 27B trained on high-quality reasoning traces curated through a custom Drift Diffusion Modeling (DDM) pipeline. Second iteration of the Ornstein series with improved segment-level reasoning modeling.

Support This Work

I'm a PhD student in visual neuroscience at the University of Toronto who also happens to spend way too much time fine-tuning, merging, and quantizing open-weight models on rented H100s and a local DGX Spark. All training compute is self-funded — balancing GPU costs against a student budget. If my uploads have been useful to you, consider buying a PhD student a coffee. It goes a long way toward keeping these experiments running.

Support on Ko-fi

What Makes Ornstein Different

Unlike typical reasoning fine-tunes that use large volumes of synthetic data, Ornstein implements quality-over-quantity:

Detects degenerate reasoning: Identifies "fake" reasoning that mimics thought without substance (hedging, restating, circling)
Premium vs. Degenerate split: DDM pipeline cleanly separates premium from degenerate reasoning traces
High-fidelity curation: Near-perfect AUC separating premium from degenerate reasoning with >99% sensitivity
Improved segment modeling (v2): Enhanced modeling of individual reasoning segments for finer-grained quality separation

The model uses <think>...</think> blocks for extended multi-phase reasoning with self-correction and verification before providing final answers.

Available Quantizations

High Quality (Q4+)

Quantization	BPW	Size	Use Case
Q8_0	8.50	26.6 GB	Near-lossless, good for high-end consumer GPUs
Q6_K	6.57	20.6 GB	High quality
Q5_K_M	5.69	17.9 GB	Good balance
Q4_K_M	4.89	15.4 GB	Recommended — strong quality/size tradeoff
IQ4_NL	4.70	15 GB	High-quality 4-bit with importance matrix
Q4_K_S	4.63	15 GB	Smaller Q4 K-quant
Q4_0	4.60	15 GB	Legacy 4-bit
IQ4_XS	4.48	15 GB	Smallest 4-bit with importance matrix

Medium (Q3)

Quantization	BPW	Size	Use Case
Q3_K_L	4.26	14 GB	Best Q3 quality
Q3_K_M	3.95	13 GB	Mid-range Q3
IQ3_M	3.74	12 GB	IQ3 with importance matrix
IQ3_S	3.69	12 GB	Smaller IQ3
Q3_K_S	3.59	12 GB	Smallest K-quant Q3
IQ3_XXS	3.32	11 GB	Ultra-small IQ3

Low (Q2 and below)

Quantization	BPW	Size	Use Case
Q2_K	3.18	10 GB	Best Q2 quality
IQ2_M	2.97	9.4 GB	IQ2 with importance matrix
IQ2_S	2.78	8.8 GB	Smaller IQ2
IQ2_XXS	2.51	7.9 GB	Ultra-small IQ2
IQ1_M	2.27	7.2 GB	Extreme compression
IQ1_S	2.12	6.7 GB	Maximum compression — experimental

IQ quants use an importance matrix for optimized weight distribution. Lower BPW quants (IQ2 and below) trade quality for significantly reduced memory — best suited for experimentation or constrained hardware.

Quick Start

llama.cpp

# Download a quantization (example: Q4_K_M)
huggingface-cli download DJLougen/Ornstein-27B-v2-GGUF Ornstein-27B-v2-Q4_K_M.gguf --local-dir .

# Run with llama.cpp
./llama-cli -m Ornstein-27B-v2-Q4_K_M.gguf \
  -p "You are a helpful reasoning assistant." \
  --temp 0.6 -n 8192

Ollama

# Create a Modelfile
cat <<EOF > Modelfile
FROM ./Ornstein-27B-v2-Q4_K_M.gguf
PARAMETER temperature 0.6
PARAMETER num_predict 8192
SYSTEM "You are a helpful reasoning assistant."
EOF

ollama create ornstein-v2 -f Modelfile
ollama run ornstein-v2

LM Studio

Download the desired quantization from the Files tab
Load it in LM Studio
Set context length to 8192 for full reasoning depth

Recommended Settings

Parameter	Suggested Value
Temperature	0.6
Top-P	0.95
Max Tokens	8192
Repeat Penalty	1.1

Training Details

Parameter	Value
Base Model	`unsloth/Qwen3.5-27B`
Parameters	27B
Method	LoRA (rank 32, alpha 32)
Dropout	0.05
Epochs	1
Learning Rate	1e-4 (cosine schedule, 10% warmup)
Max Sequence Length	8192
Micro Batch Size	1
Gradient Accumulation	4 steps
Weight Decay	0.01
LoRA Targets	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Framework	Unsloth

Intended Use

Designed for tasks requiring structured, multi-step reasoning:

Mathematics
Logic problems
Code analysis
Scientific problems
Complex question answering

Limitations

Single epoch training means the model retains most base Qwen 3.5 27B behavior; the fine-tune primarily shapes reasoning style rather than injecting new knowledge
Language scope: DDM pipeline optimized for English; other languages reflect base model performance
Edge cases: Extended thinking can occasionally loop on adversarial or highly ambiguous prompts

Citation

@misc{ornstein27bv2,
  author = {DJLougen},
  title = {Ornstein-27B-v2: DDM-Curated Reasoning Fine-Tune of Qwen 3.5 27B},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/DJLougen/Ornstein-27B-v2}
}

Model tree for DJLougen/Ornstein-27B-v2-GGUF

Base model

Qwen/Qwen3.5-27B

Finetuned

unsloth/Qwen3.5-27B