Darwin-35B-A3B-Opus

Quality: original (bfloat16)

Model Specifications

Architecture Qwen3.5 MoE (Gated DeltaNet + MoE)
Total Parameters 35B
Active Parameters 3B per forward pass
Layers 40
Layout 10 x (3 x GDN-MoE + 1 x Attention-MoE)
Experts 256 (8 routed + 1 shared active)
Context Length 262,144 native
Languages 201
Multimodal Image and Video
License Apache 2.0

Parent Models

Both parents share the identical Qwen3.5-35B-A3B architecture (40 layers, 256 experts, GDN+MoE hybrid). The Mother is a LoRA SFT on the same base — not a different architecture. "Text-only" refers to the training data (Claude 4.6 Opus reasoning chains), not the model structure.

Role Model Architecture Training
Father Qwen/Qwen3.5-35B-A3B Qwen3.5-35B-A3B Original pre-training + RLHF
Mother Jackrong/Qwen3.5-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled Qwen3.5-35B-A3B (same) LoRA SFT with text-only Claude reasoning chains

Source

This model was converted to MLX format from FINAL-Bench/Darwin-35B-A3B-Opus using mlx-vlm version 0.4.4.

Downloads last month
142
Safetensors
Model size
35B params
Tensor type
BF16
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TheCluster/Darwin-35B-A3B-Opus-MLX-bf16

Finetuned
(3)
this model

Collection including TheCluster/Darwin-35B-A3B-Opus-MLX-bf16