Qwen3.5-40B-RoughHouse-Claude-4.6-Opus — oQ2e (3.1 bpw)

oQ2e mixed-precision quantization of DavidAU/Qwen3.5-40B-RoughHouse-Claude-4.6-Opus-Polar-Deckard-Uncensored-Heretic-Thinking

Quantized using oMLX oQ — data-driven sensitivity-aware mixed-precision quantization for Apple Silicon. Standard mlx-lm compatible safetensors. Works with oMLX, LM Studio, mlx-lm, and any MLX-compatible inference server.

The recommended quant is oQ4e (~22GB, near-lossless). This oQ2e is provided as a research artifact showing the quality floor at extreme compression.


⚠️ Content Warning

This is a fully uncensored model. The original was abliterated via Heretic and fine-tuned without safety alignment. It may produce graphic, offensive, or inappropriate content. Use responsibly.


About the Original Model

Created by DavidAU. This is the "RoughHouse" variant — the raw, untrained version after expanding two 27B Qwen 3.5 fine-tunes to 40B parameters.

  • Architecture: 40B dense (not MoE), 96 layers, 1275 tensors
  • Base: Qwen3.5-27B expanded to 40B (50% more layers than base)
  • Training: Claude/Polaris (5 datasets) + Deckard/PDK (5 datasets) + Heretic uncensoring
  • RoughHouse: Raw release without final training step after expansion

All Available Quants

Quant BPW Size MMLU TruthfulQA ARC-C HellaSwag Status
bf16 (reference) 16.0 73.6 GB 86.2% 85.3% 94.3% 90.5% Source
oQ4e ~4.8 ~22 GB 85.4% 85.7% 96.0% 91.5% ✅ Recommended
oQ2e (this) ~3.1 ~14.3 GB 43.2% 40.0% 40.3% 20.6% ✅ Available

Benchmark Results — oQ2e (3.1 bpw, ~14.3 GB)

Benchmark Samples bf16 oQ2e Delta
MMLU 1000 86.2% 43.2% -43.0
HellaSwag 1000 90.5% 20.6% -69.9
TruthfulQA 200 85.3% 40.0% -45.3
ARC-Challenge 300 94.3% 40.3% -54.0
Winogrande 200 83.0% 44.0% -39.0
GSM8K 200 97.0% 30.5% -66.5
HumanEval Full (164) 82.3% 29.3% -53.0
MBPP 100 75.0% 2.0% -73.0
LiveCodeBench 100 28.0% 5.0% -23.0

Key Observations

  • 2-bit is the quality floor for this model — knowledge partially retained (MMLU 43.2%) but reasoning and coding collapse
  • MBPP at 2.0% — Python code generation essentially non-functional at this compression
  • Winogrande held up best (-39.0) suggesting coreference resolution is more robust to compression
  • Still outperforms uniform 4-bit — MLX-community uniform 4-bit of Jackrong's 27B distill scored 23.8% MMLU, while this oQ 2-bit scores 43.2%
  • The cliff is between oQ4e and oQ2e — oQ4e retains 99% quality, oQ2e loses ~50%

Quantization Settings

Parameter Value
Method oQ (oMLX Universal Dynamic Quantization)
Level oQ2e (Enhanced)
Enhanced (+) Yes (GPTQ error compensation)
Effective BPW ~3.1
Calibration Dataset Code + Multilingual + Tool Calling
Calibration Samples 128
Sequence Length 512
Hardware Apple M3 Ultra, 512GB Unified Memory

How to Use

oMLX

Drop the model folder into your oMLX models directory. Auto-detected on server start.

mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("Hunterx/Qwen3.5-40B-RoughHouse-Claude-4.6-Opus-oQ2BENCHMARKED")

messages = [{"role": "user", "content": "Your prompt here"}]
prompt = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True, enable_thinking=True
)
response = generate(model, tokenizer, prompt=prompt, max_tokens=2048)

Recommended Settings

Per DavidAU's guidance for the RoughHouse variant:

  • Temperature: 0.5 - 1.0
  • Min context window: 8k - 16k
  • Rep penalty: 1.05 - 1.1 (if looping occurs)
  • System prompt: Even a single sentence helps stabilize output

Credits

  • Original Model: DavidAU — fine-tuning, expansion to 40B, Heretic uncensoring
  • Base Architecture: Qwen3.5-27B by Alibaba/Qwen Team
  • Quantization: oQ by jundot/oMLX
  • Benchmarks & Quantization by: Hunterx — oMLX v0.2.20 Intelligence Benchmark suite on M3 Ultra (512GB)

License

Apache 2.0 (inherited from original model)

Downloads last month
521
Safetensors
Model size
5B params
Tensor type
U8
·
U32
·
BF16
·
MLX
Hardware compatibility
Log In to add your hardware

2-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Hunterx/Qwen3.5-40B-RoughHouse-Claude-4.6-Opus-oQ2BENCHMARKED