Qwopus MoE 35B-A3B — Claude Opus 4.6 Reasoning Distilled (GGUF)

QLoRA fine-tune of Qwen3.5-35B-A3B (MoE, 3B active parameters) with Claude Opus 4.6 reasoning distillation. Training recipe adapted from Jackrong's Qwopus3.5-27B-v3 — same datasets and methodology, applied to the MoE architecture.

Credits

This model is heavily inspired by and based on the work of Jackrong and his Qwopus3.5-27B-v3 training methodology. The datasets, training philosophy ("act-then-refine" paradigm), and structural reasoning approach are all derived from his research. Please check his complete training guide for the full methodology.

The key difference: we adapted his recipe from the 27B dense model to the 35B-A3B MoE architecture.

Available Quantizations

Quantization	Size	BPW	Min VRAM
Q8_0	35 GB	8.52	1x 48GB GPU
Q6_K	27 GB	6.58	1x 32GB GPU
Q5_K_M	24 GB	5.70	1x 32GB GPU
Q4_K_M	20 GB	4.87	1x 24GB GPU

Model Details

Property	Value
Base Model	Qwen/Qwen3.5-35B-A3B
Architecture	Mixture of Experts (MoE)
Total Parameters	~35B
Active Parameters	~3B per token
Max Context	131,072 tokens (128K)

Benchmark Results

Qwopus MoE (Jackrong recipe) vs Opus Distilled v2 (previous QLoRA)

Benchmarked across 8 diverse tasks: coding, bug detection, reasoning, instruction following, research, and agentic planning.

Test	Qwopus MoE	Opus Distilled v2	Winner
Coding: LRU Cache	6.9KB content	4.8KB content	Qwopus
Coding: Async Scraper	8.5KB content	7.6KB content	Qwopus
Bug Detection	2.5KB + 2.1KB thinking	2.4KB + 2.9KB thinking	Tie
Reasoning: Probability	0 chars (stuck thinking)	1.3KB content	v2
Reasoning: Logic	747 chars	949 chars	v2
JSON Output	319 chars, 6.8s	325 chars, 1.4s	v2 (5x faster)
Research: Architecture Analysis	4.5KB content	696 chars (overthinks)	Qwopus
Agentic: CI/CD Planning	6.9KB content	5.8KB content	Qwopus

Speed

Model	tok/s
Qwopus MoE	175
Opus Distilled v2	204

Verdict

Qwopus MoE produces more useful visible output — better content/thinking ratio. It excels at tasks requiring detailed, user-facing responses (coding, research, planning). The Opus Distilled v2 is 16% faster but has an aggressive thinking mode that sometimes produces minimal visible content.

Best for: Coding assistants, research agents, content generation, agentic workflows where output quality matters more than raw speed.

Training Details

Recipe (adapted from Jackrong's Qwopus3.5-27B-v3)

Parameter	Value
Method	QLoRA (4-bit base + LoRA adapters in BF16)
Framework	Unsloth 2026.4.2 + TRL
Base Model	unsloth/Qwen3.5-35B-A3B
LoRA Rank	32
LoRA Alpha	32
LoRA Targets	q_proj, k_proj, v_proj, o_proj (attention only)
Trainable Parameters	6,881,280 (0.02% of 35B)
Learning Rate	2e-5 (linear schedule)
Warmup	5% of steps
Weight Decay	0.001
Optimizer	adamw_8bit
Epochs	2
Effective Batch Size	12 (1 x 12 grad accum)
Max Sequence Length	4096
Total Steps	536
Final Loss	0.5517
GPU	NVIDIA RTX PRO 6000 Blackwell (96GB)
Training Time	~3.5 hours

Differences from Jackrong's 27B recipe

Aspect	Jackrong (27B dense)	Ours (35B-A3B MoE)
Base model	Qwen3.5-27B (dense)	Qwen3.5-35B-A3B (MoE)
LoRA rank	64	32 (GPU memory constraint)
LoRA targets	q, k, v, o, gate, up, down	q, k, v, o only (MoE experts too large)
Trainable params	~0.5%	0.02%
Batch size	~36	12
Context length	8192	4096 (GPU memory constraint)

Datasets (3,209 examples after quality filtering)

Dataset	Examples	Description
nohurry/Opus-4.6-Reasoning-3000x-filtered	2,326	Claude Opus 4.6 reasoning traces
Jackrong/Qwen3.5-reasoning-700x	633	Qwen reasoning conversations
Roman1111111/claude-opus-4.6-10000x	~250 (after filtering)	Claude Opus 4.6 conversations

Quality filter: required assistant content >100 characters.

Usage with llama.cpp

llama-server \
  --model Qwopus-MoE-35B-A3B-Q8_0.gguf \
  --n-gpu-layers -1 \
  --ctx-size 131072 \
  --host 0.0.0.0 --port 8082

The model uses <think>...</think> reasoning tags natively (inherited from Qwen3.5 base).

Acknowledgements

Jackrong — Training methodology, datasets, and the Qwopus concept
Unsloth — Efficient QLoRA training framework
Qwen — Base model architecture

Downloads last month: 3,179

GGUF

Model size

35B params

Architecture

qwen35moe

Hardware compatibility

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for samuelcardillo/Qwopus-MoE-35B-A3B-GGUF

Base model

Qwen/Qwen3.5-35B-A3B-Base

Finetuned

Qwen/Qwen3.5-35B-A3B

Quantized

(214)

this model

Quantizations

1 model

samuelcardillo
/

Qwopus-MoE-35B-A3B-GGUF