Ornstein-122-A10B-GGUF

GGUF quantizations of DJLougen/Ornstein-122-A10B — a reasoning-focused fine-tune of Qwen 3.5 122B-A10B (MoE, ~10B active per token) trained on high-quality reasoning traces curated through a custom Drift Diffusion Modeling (DDM) pipeline.

Ornstein-122-A10B

Support This Work

I'm a PhD student in visual neuroscience at the University of Toronto who also happens to spend way too much time fine-tuning, merging, and quantizing open-weight models on rented H100s and a local DGX Spark. All training compute is self-funded — balancing GPU costs against a student budget. If my uploads have been useful to you, consider buying a PhD student a coffee. It goes a long way toward keeping these experiments running.

Support on Ko-fi



What Makes Ornstein Different

Unlike typical reasoning fine-tunes that use large volumes of synthetic data, Ornstein implements quality-over-quantity:

  • Detects degenerate reasoning: Identifies "fake" reasoning that mimics thought without substance (hedging, restating, circling)
  • Premium vs. Degenerate split: DDM pipeline cleanly separates premium from degenerate reasoning traces
  • High-fidelity curation: Near-perfect AUC separating premium from degenerate reasoning with >99% sensitivity
  • MoE efficiency: 122B total parameters with only ~10B active per token — big model reasoning at a fraction of the compute

The model uses <think>...</think> blocks for extended multi-phase reasoning with self-correction and verification before providing final answers.


Available Quantizations

Note: Uploads are in progress — more quantizations may be added.

Quantization Size Use Case
F16 ~261 GB (split) Full precision, no quality loss
Q4_K_M ~74 GB (split) Best quality/size trade-off, recommended
Q8_0 ~83 GB (split) Higher precision, minimal quality loss

Quick Start

llama.cpp

# Download a quantization (example: F16 split files)
huggingface-cli download DJLougen/Ornstein-122-A10B-gguf --local-dir .

# Run with llama.cpp
./llama-cli -m Ornstein-122-A10B-F16-00001-of-00006.gguf \
  -p "You are a helpful reasoning assistant." \
  --temp 0.6 -n 8192

Ollama

# Create a Modelfile
cat <<EOF > Modelfile
FROM ./Ornstein-122-A10B-F16-00001-of-00006.gguf
PARAMETER temperature 0.6
PARAMETER num_predict 8192
SYSTEM "You are a helpful reasoning assistant."
EOF

ollama create ornstein-122 -f Modelfile
ollama run ornstein-122

LM Studio

  1. Download the desired quantization from the Files tab
  2. Load it in LM Studio
  3. Set context length to 8192 for full reasoning depth

Recommended Settings

Parameter Suggested Value
Temperature 0.6
Top-P 0.95
Max Tokens 8192
Repeat Penalty 1.1

Training Details

Parameter Value
Base Model unsloth/Qwen3.5-122B-A10B
Architecture Mixture-of-Experts (122B total, ~10B active)
Method LoRA (rank 32, alpha 32)
Dropout 0.0
Epochs 1
Learning Rate 1e-4 (cosine schedule, 10% warmup)
Max Sequence Length 8192
Micro Batch Size 1
Gradient Accumulation 4 steps
Weight Decay 0.01
LoRA Targets q_proj, k_proj, v_proj, o_proj
Framework Unsloth

Intended Use

Designed for tasks requiring structured, multi-step reasoning:

  • Mathematics
  • Logic problems
  • Code analysis
  • Scientific problems
  • Complex question answering

The MoE architecture makes it practical to run 122B-class reasoning on hardware that couldn't handle a dense model of the same size.


Limitations

  • Single epoch training means the model retains most base Qwen 3.5 122B-A10B behavior; the fine-tune primarily shapes reasoning style rather than injecting new knowledge
  • Language scope: DDM pipeline optimized for English; other languages reflect base model performance
  • Edge cases: Extended thinking can occasionally loop on adversarial or highly ambiguous prompts
  • Size: Even quantized, the 122B MoE model requires substantial storage and memory

Citation

@misc{ornstein122a10b,
  author = {DJLougen},
  title = {Ornstein-122-A10B: DDM-Curated Reasoning Fine-Tune of Qwen 3.5 122B-A10B},
  year = {2026},
  publisher = {Hugging Face},
  url = {https://huggingface.co/DJLougen/Ornstein-122-A10B}
}

Links

Downloads last month
426
GGUF
Model size
122B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DJLougen/Ornstein-122-A10B-gguf

Quantized
(1)
this model