Qwen3.5-40B-RoughHouse-Claude-4.6-Opus — oQ2e (3.1 bpw)

oQ2e mixed-precision quantization of DavidAU/Qwen3.5-40B-RoughHouse-Claude-4.6-Opus-Polar-Deckard-Uncensored-Heretic-Thinking

Quantized using oMLX oQ — data-driven sensitivity-aware mixed-precision quantization for Apple Silicon. Standard mlx-lm compatible safetensors. Works with oMLX, LM Studio, mlx-lm, and any MLX-compatible inference server.

⭐ The recommended quant is oQ4e (~22GB, near-lossless). This oQ2e is provided as a research artifact showing the quality floor at extreme compression.

⚠️ Content Warning

This is a fully uncensored model. The original was abliterated via Heretic and fine-tuned without safety alignment. It may produce graphic, offensive, or inappropriate content. Use responsibly.

About the Original Model

Created by DavidAU. This is the "RoughHouse" variant — the raw, untrained version after expanding two 27B Qwen 3.5 fine-tunes to 40B parameters.

Architecture: 40B dense (not MoE), 96 layers, 1275 tensors
Base: Qwen3.5-27B expanded to 40B (50% more layers than base)
Training: Claude/Polaris (5 datasets) + Deckard/PDK (5 datasets) + Heretic uncensoring
RoughHouse: Raw release without final training step after expansion

All Available Quants

Quant	BPW	Size	MMLU	TruthfulQA	ARC-C	HellaSwag	Status
bf16 (reference)	16.0	73.6 GB	86.2%	85.3%	94.3%	90.5%	Source
oQ4e ⭐	~4.8	~22 GB	85.4%	85.7%	96.0%	91.5%	✅ Recommended
oQ2e (this)	~3.1	~14.3 GB	43.2%	40.0%	40.3%	20.6%	✅ Available

Benchmark Results — oQ2e (3.1 bpw, ~14.3 GB)

Benchmark	Samples	bf16	oQ2e	Delta
MMLU	1000	86.2%	43.2%	-43.0
HellaSwag	1000	90.5%	20.6%	-69.9
TruthfulQA	200	85.3%	40.0%	-45.3
ARC-Challenge	300	94.3%	40.3%	-54.0
Winogrande	200	83.0%	44.0%	-39.0
GSM8K	200	97.0%	30.5%	-66.5
HumanEval	Full (164)	82.3%	29.3%	-53.0
MBPP	100	75.0%	2.0%	-73.0
LiveCodeBench	100	28.0%	5.0%	-23.0

Key Observations

2-bit is the quality floor for this model — knowledge partially retained (MMLU 43.2%) but reasoning and coding collapse
MBPP at 2.0% — Python code generation essentially non-functional at this compression
Winogrande held up best (-39.0) suggesting coreference resolution is more robust to compression
Still outperforms uniform 4-bit — MLX-community uniform 4-bit of Jackrong's 27B distill scored 23.8% MMLU, while this oQ 2-bit scores 43.2%
The cliff is between oQ4e and oQ2e — oQ4e retains 99% quality, oQ2e loses ~50%

Quantization Settings

Parameter	Value
Method	oQ (oMLX Universal Dynamic Quantization)
Level	oQ2e (Enhanced)
Enhanced (+)	Yes (GPTQ error compensation)
Effective BPW	~3.1
Calibration Dataset	Code + Multilingual + Tool Calling
Calibration Samples	128
Sequence Length	512
Hardware	Apple M3 Ultra, 512GB Unified Memory

How to Use

oMLX

Drop the model folder into your oMLX models directory. Auto-detected on server start.

mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("Hunterx/Qwen3.5-40B-RoughHouse-Claude-4.6-Opus-oQ2BENCHMARKED")

messages = [{"role": "user", "content": "Your prompt here"}]
prompt = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True, enable_thinking=True
)
response = generate(model, tokenizer, prompt=prompt, max_tokens=2048)

Recommended Settings

Per DavidAU's guidance for the RoughHouse variant:

Temperature: 0.5 - 1.0
Min context window: 8k - 16k
Rep penalty: 1.05 - 1.1 (if looping occurs)
System prompt: Even a single sentence helps stabilize output

Credits

Original Model: DavidAU — fine-tuning, expansion to 40B, Heretic uncensoring
Base Architecture: Qwen3.5-27B by Alibaba/Qwen Team
Quantization: oQ by jundot/oMLX
Benchmarks & Quantization by: Hunterx — oMLX v0.2.20 Intelligence Benchmark suite on M3 Ultra (512GB)

License

Apache 2.0 (inherited from original model)

Downloads last month: 521

Safetensors

Model size

5B params

Tensor type

U32

BF16

MLX

Hardware compatibility

2-bit

Model tree for Hunterx/Qwen3.5-40B-RoughHouse-Claude-4.6-Opus-oQ2BENCHMARKED

Base model

DavidAU/Qwen3.5-40B-RoughHouse-Claude-4.6-Opus-Polar-Deckard-Uncensored-Heretic-Thinking

Quantized

(4)

this model