Qwen3.5-40B-RoughHouse-Claude-4.6-Opus — oQ2e (3.1 bpw)
oQ2e mixed-precision quantization of DavidAU/Qwen3.5-40B-RoughHouse-Claude-4.6-Opus-Polar-Deckard-Uncensored-Heretic-Thinking
Quantized using oMLX oQ — data-driven sensitivity-aware mixed-precision quantization for Apple Silicon. Standard mlx-lm compatible safetensors. Works with oMLX, LM Studio, mlx-lm, and any MLX-compatible inference server.
⭐ The recommended quant is oQ4e (~22GB, near-lossless). This oQ2e is provided as a research artifact showing the quality floor at extreme compression.
⚠️ Content Warning
This is a fully uncensored model. The original was abliterated via Heretic and fine-tuned without safety alignment. It may produce graphic, offensive, or inappropriate content. Use responsibly.
About the Original Model
Created by DavidAU. This is the "RoughHouse" variant — the raw, untrained version after expanding two 27B Qwen 3.5 fine-tunes to 40B parameters.
- Architecture: 40B dense (not MoE), 96 layers, 1275 tensors
- Base: Qwen3.5-27B expanded to 40B (50% more layers than base)
- Training: Claude/Polaris (5 datasets) + Deckard/PDK (5 datasets) + Heretic uncensoring
- RoughHouse: Raw release without final training step after expansion
All Available Quants
| Quant | BPW | Size | MMLU | TruthfulQA | ARC-C | HellaSwag | Status |
|---|---|---|---|---|---|---|---|
| bf16 (reference) | 16.0 | 73.6 GB | 86.2% | 85.3% | 94.3% | 90.5% | Source |
| oQ4e ⭐ | ~4.8 | ~22 GB | 85.4% | 85.7% | 96.0% | 91.5% | ✅ Recommended |
| oQ2e (this) | ~3.1 | ~14.3 GB | 43.2% | 40.0% | 40.3% | 20.6% | ✅ Available |
Benchmark Results — oQ2e (3.1 bpw, ~14.3 GB)
| Benchmark | Samples | bf16 | oQ2e | Delta |
|---|---|---|---|---|
| MMLU | 1000 | 86.2% | 43.2% | -43.0 |
| HellaSwag | 1000 | 90.5% | 20.6% | -69.9 |
| TruthfulQA | 200 | 85.3% | 40.0% | -45.3 |
| ARC-Challenge | 300 | 94.3% | 40.3% | -54.0 |
| Winogrande | 200 | 83.0% | 44.0% | -39.0 |
| GSM8K | 200 | 97.0% | 30.5% | -66.5 |
| HumanEval | Full (164) | 82.3% | 29.3% | -53.0 |
| MBPP | 100 | 75.0% | 2.0% | -73.0 |
| LiveCodeBench | 100 | 28.0% | 5.0% | -23.0 |
Key Observations
- 2-bit is the quality floor for this model — knowledge partially retained (MMLU 43.2%) but reasoning and coding collapse
- MBPP at 2.0% — Python code generation essentially non-functional at this compression
- Winogrande held up best (-39.0) suggesting coreference resolution is more robust to compression
- Still outperforms uniform 4-bit — MLX-community uniform 4-bit of Jackrong's 27B distill scored 23.8% MMLU, while this oQ 2-bit scores 43.2%
- The cliff is between oQ4e and oQ2e — oQ4e retains 99% quality, oQ2e loses ~50%
Quantization Settings
| Parameter | Value |
|---|---|
| Method | oQ (oMLX Universal Dynamic Quantization) |
| Level | oQ2e (Enhanced) |
| Enhanced (+) | Yes (GPTQ error compensation) |
| Effective BPW | ~3.1 |
| Calibration Dataset | Code + Multilingual + Tool Calling |
| Calibration Samples | 128 |
| Sequence Length | 512 |
| Hardware | Apple M3 Ultra, 512GB Unified Memory |
How to Use
oMLX
Drop the model folder into your oMLX models directory. Auto-detected on server start.
mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("Hunterx/Qwen3.5-40B-RoughHouse-Claude-4.6-Opus-oQ2BENCHMARKED")
messages = [{"role": "user", "content": "Your prompt here"}]
prompt = tokenizer.apply_chat_template(
messages, tokenize=False, add_generation_prompt=True, enable_thinking=True
)
response = generate(model, tokenizer, prompt=prompt, max_tokens=2048)
Recommended Settings
Per DavidAU's guidance for the RoughHouse variant:
- Temperature: 0.5 - 1.0
- Min context window: 8k - 16k
- Rep penalty: 1.05 - 1.1 (if looping occurs)
- System prompt: Even a single sentence helps stabilize output
Credits
- Original Model: DavidAU — fine-tuning, expansion to 40B, Heretic uncensoring
- Base Architecture: Qwen3.5-27B by Alibaba/Qwen Team
- Quantization: oQ by jundot/oMLX
- Benchmarks & Quantization by: Hunterx — oMLX v0.2.20 Intelligence Benchmark suite on M3 Ultra (512GB)
License
Apache 2.0 (inherited from original model)
- Downloads last month
- 521
2-bit