Model Mad Science
Collection
Experimental models with weird architectures. Frankenmerges, layer duplications, expert pruning, expert merging, and whatever else seems worth trying. • 4 items • Updated
| Parameter | Value |
|---|---|
| direction_index | 19.72 |
| attn.o_proj.max_weight | 1.39 |
| attn.o_proj.max_weight_position | 24.41 |
| attn.o_proj.min_weight | 1.28 |
| attn.o_proj.min_weight_distance | 22.94 |
| Metric | This model | Original model (JackBinary/Qwen3.5-24B-A3B-Claude-Opus-Gemini-3.1-Pro-Reasoning-Distilled) |
|---|---|---|
| KL divergence | 0.0913 | 0 (by definition) |
| Refusals | 17/100 | 98/100 |
A fine-tuned version of sandeshrajx/Qwen3.5-24B-A3B-REAP-0.32, itself based on Qwen3.5-35B-A3B. The goal of this project is simple: the best reasoning model that can comfortably fit and run on a 16GB GPU.
Jackrong's distills
This qwen3_5_moe_text model was trained 2x faster with Unsloth