GGUF
English
Chinese
qwen3.5
Mixture of Experts
reasoning
distillation
claude-opus
qlora
unsloth
conversational

Qwopus MoE 35B-A3B — Claude Opus 4.6 Reasoning Distilled (GGUF)

QLoRA fine-tune of Qwen3.5-35B-A3B (MoE, 3B active parameters) with Claude Opus 4.6 reasoning distillation. Training recipe adapted from Jackrong's Qwopus3.5-27B-v3 — same datasets and methodology, applied to the MoE architecture.

Credits

This model is heavily inspired by and based on the work of Jackrong and his Qwopus3.5-27B-v3 training methodology. The datasets, training philosophy ("act-then-refine" paradigm), and structural reasoning approach are all derived from his research. Please check his complete training guide for the full methodology.

The key difference: we adapted his recipe from the 27B dense model to the 35B-A3B MoE architecture.

Available Quantizations

Quantization Size BPW Min VRAM
Q8_0 35 GB 8.52 1x 48GB GPU
Q6_K 27 GB 6.58 1x 32GB GPU
Q5_K_M 24 GB 5.70 1x 32GB GPU
Q4_K_M 20 GB 4.87 1x 24GB GPU

Model Details

Property Value
Base Model Qwen/Qwen3.5-35B-A3B
Architecture Mixture of Experts (MoE)
Total Parameters ~35B
Active Parameters ~3B per token
Max Context 131,072 tokens (128K)

Benchmark Results

Qwopus MoE (Jackrong recipe) vs Opus Distilled v2 (previous QLoRA)

Benchmarked across 8 diverse tasks: coding, bug detection, reasoning, instruction following, research, and agentic planning.

Test Qwopus MoE Opus Distilled v2 Winner
Coding: LRU Cache 6.9KB content 4.8KB content Qwopus
Coding: Async Scraper 8.5KB content 7.6KB content Qwopus
Bug Detection 2.5KB + 2.1KB thinking 2.4KB + 2.9KB thinking Tie
Reasoning: Probability 0 chars (stuck thinking) 1.3KB content v2
Reasoning: Logic 747 chars 949 chars v2
JSON Output 319 chars, 6.8s 325 chars, 1.4s v2 (5x faster)
Research: Architecture Analysis 4.5KB content 696 chars (overthinks) Qwopus
Agentic: CI/CD Planning 6.9KB content 5.8KB content Qwopus

Speed

Model tok/s
Qwopus MoE 175
Opus Distilled v2 204

Verdict

Qwopus MoE produces more useful visible output — better content/thinking ratio. It excels at tasks requiring detailed, user-facing responses (coding, research, planning). The Opus Distilled v2 is 16% faster but has an aggressive thinking mode that sometimes produces minimal visible content.

Best for: Coding assistants, research agents, content generation, agentic workflows where output quality matters more than raw speed.

Training Details

Recipe (adapted from Jackrong's Qwopus3.5-27B-v3)

Parameter Value
Method QLoRA (4-bit base + LoRA adapters in BF16)
Framework Unsloth 2026.4.2 + TRL
Base Model unsloth/Qwen3.5-35B-A3B
LoRA Rank 32
LoRA Alpha 32
LoRA Targets q_proj, k_proj, v_proj, o_proj (attention only)
Trainable Parameters 6,881,280 (0.02% of 35B)
Learning Rate 2e-5 (linear schedule)
Warmup 5% of steps
Weight Decay 0.001
Optimizer adamw_8bit
Epochs 2
Effective Batch Size 12 (1 x 12 grad accum)
Max Sequence Length 4096
Total Steps 536
Final Loss 0.5517
GPU NVIDIA RTX PRO 6000 Blackwell (96GB)
Training Time ~3.5 hours

Differences from Jackrong's 27B recipe

Aspect Jackrong (27B dense) Ours (35B-A3B MoE)
Base model Qwen3.5-27B (dense) Qwen3.5-35B-A3B (MoE)
LoRA rank 64 32 (GPU memory constraint)
LoRA targets q, k, v, o, gate, up, down q, k, v, o only (MoE experts too large)
Trainable params ~0.5% 0.02%
Batch size ~36 12
Context length 8192 4096 (GPU memory constraint)

Datasets (3,209 examples after quality filtering)

Dataset Examples Description
nohurry/Opus-4.6-Reasoning-3000x-filtered 2,326 Claude Opus 4.6 reasoning traces
Jackrong/Qwen3.5-reasoning-700x 633 Qwen reasoning conversations
Roman1111111/claude-opus-4.6-10000x ~250 (after filtering) Claude Opus 4.6 conversations

Quality filter: required assistant content >100 characters.

Usage with llama.cpp

llama-server \
  --model Qwopus-MoE-35B-A3B-Q8_0.gguf \
  --n-gpu-layers -1 \
  --ctx-size 131072 \
  --host 0.0.0.0 --port 8082

The model uses <think>...</think> reasoning tags natively (inherited from Qwen3.5 base).

Acknowledgements

  • Jackrong — Training methodology, datasets, and the Qwopus concept
  • Unsloth — Efficient QLoRA training framework
  • Qwen — Base model architecture
Downloads last month
3,179
GGUF
Model size
35B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for samuelcardillo/Qwopus-MoE-35B-A3B-GGUF

Quantized
(214)
this model
Quantizations
1 model

Datasets used to train samuelcardillo/Qwopus-MoE-35B-A3B-GGUF