Ornstein-122-A10B-GGUF
GGUF quantizations of DJLougen/Ornstein-122-A10B — a reasoning-focused fine-tune of Qwen 3.5 122B-A10B (MoE, ~10B active per token) trained on high-quality reasoning traces curated through a custom Drift Diffusion Modeling (DDM) pipeline.
Support This Work
I'm a PhD student in visual neuroscience at the University of Toronto who also happens to spend way too much time fine-tuning, merging, and quantizing open-weight models on rented H100s and a local DGX Spark. All training compute is self-funded — balancing GPU costs against a student budget. If my uploads have been useful to you, consider buying a PhD student a coffee. It goes a long way toward keeping these experiments running.
What Makes Ornstein Different
Unlike typical reasoning fine-tunes that use large volumes of synthetic data, Ornstein implements quality-over-quantity:
- Detects degenerate reasoning: Identifies "fake" reasoning that mimics thought without substance (hedging, restating, circling)
- Premium vs. Degenerate split: DDM pipeline cleanly separates premium from degenerate reasoning traces
- High-fidelity curation: Near-perfect AUC separating premium from degenerate reasoning with >99% sensitivity
- MoE efficiency: 122B total parameters with only ~10B active per token — big model reasoning at a fraction of the compute
The model uses <think>...</think> blocks for extended multi-phase reasoning with self-correction and verification before providing final answers.
Available Quantizations
Note: Uploads are in progress — more quantizations may be added.
| Quantization | Size | Use Case |
|---|---|---|
| F16 | ~261 GB (split) | Full precision, no quality loss |
| Q4_K_M | ~74 GB (split) | Best quality/size trade-off, recommended |
| Q8_0 | ~83 GB (split) | Higher precision, minimal quality loss |
Quick Start
llama.cpp
# Download a quantization (example: F16 split files)
huggingface-cli download DJLougen/Ornstein-122-A10B-gguf --local-dir .
# Run with llama.cpp
./llama-cli -m Ornstein-122-A10B-F16-00001-of-00006.gguf \
-p "You are a helpful reasoning assistant." \
--temp 0.6 -n 8192
Ollama
# Create a Modelfile
cat <<EOF > Modelfile
FROM ./Ornstein-122-A10B-F16-00001-of-00006.gguf
PARAMETER temperature 0.6
PARAMETER num_predict 8192
SYSTEM "You are a helpful reasoning assistant."
EOF
ollama create ornstein-122 -f Modelfile
ollama run ornstein-122
LM Studio
- Download the desired quantization from the Files tab
- Load it in LM Studio
- Set context length to 8192 for full reasoning depth
Recommended Settings
| Parameter | Suggested Value |
|---|---|
| Temperature | 0.6 |
| Top-P | 0.95 |
| Max Tokens | 8192 |
| Repeat Penalty | 1.1 |
Training Details
| Parameter | Value |
|---|---|
| Base Model | unsloth/Qwen3.5-122B-A10B |
| Architecture | Mixture-of-Experts (122B total, ~10B active) |
| Method | LoRA (rank 32, alpha 32) |
| Dropout | 0.0 |
| Epochs | 1 |
| Learning Rate | 1e-4 (cosine schedule, 10% warmup) |
| Max Sequence Length | 8192 |
| Micro Batch Size | 1 |
| Gradient Accumulation | 4 steps |
| Weight Decay | 0.01 |
| LoRA Targets | q_proj, k_proj, v_proj, o_proj |
| Framework | Unsloth |
Intended Use
Designed for tasks requiring structured, multi-step reasoning:
- Mathematics
- Logic problems
- Code analysis
- Scientific problems
- Complex question answering
The MoE architecture makes it practical to run 122B-class reasoning on hardware that couldn't handle a dense model of the same size.
Limitations
- Single epoch training means the model retains most base Qwen 3.5 122B-A10B behavior; the fine-tune primarily shapes reasoning style rather than injecting new knowledge
- Language scope: DDM pipeline optimized for English; other languages reflect base model performance
- Edge cases: Extended thinking can occasionally loop on adversarial or highly ambiguous prompts
- Size: Even quantized, the 122B MoE model requires substantial storage and memory
Citation
@misc{ornstein122a10b,
author = {DJLougen},
title = {Ornstein-122-A10B: DDM-Curated Reasoning Fine-Tune of Qwen 3.5 122B-A10B},
year = {2026},
publisher = {Hugging Face},
url = {https://huggingface.co/DJLougen/Ornstein-122-A10B}
}
Links
- Full-precision model: DJLougen/Ornstein-122-A10B
- NVFP4 quantization: DJLougen/Ornstein-122-A10B-nvfp4
- Ornstein-27B series: DJLougen/Ornstein-27B | v2
- Base model: Qwen/Qwen3.5-122B-A10B | unsloth/Qwen3.5-122B-A10B
- Downloads last month
- 426
4-bit
8-bit
16-bit
Model tree for DJLougen/Ornstein-122-A10B-gguf
Base model
Qwen/Qwen3.5-122B-A10B