Qwen3.5-9B-Sculpt-Default
5% FFN compression with live teacher distillation. Drop-in replacement — no custom kernels, no runtime changes.
Dystrio Sculpt structurally compresses transformer FFN layers, producing dense models that load with standard transformers.
This is the Default tier of Qwen3.5-9B.
Use case: Enterprise — maximum quality preservation
Benchmark Results (lm_eval)
| Model | MMLU | HellaSwag | ARC-C | TruthfulQA | Winogrande | GSM8K |
|---|---|---|---|---|---|---|
| Qwen3.5-9B (baseline) | 78.7 | 78.1 | 55.6 | 53.7 | 73.0 | 87.3 |
| Sculpt Default (kf=0.95) | 76.2 (↓2.5) | 75.8 (↓2.3) | 56.4 (↑0.8) | 52.6 (↓1.1) | 68.7 (↓4.3) | 81.5 (↓5.8) |
| Sculpt Production (kf=0.9) | 73.9 (↓4.8) | 75.1 (↓3.0) | 56.8 (↑1.2) | 47.3 (↓6.4) | 69.8 (↓3.2) | 74.5 (↓12.8) |
| Sculpt Throughput (kf=0.88) | 70.8 (↓7.9) | 74.0 (↓4.1) | 57.2 (↑1.6) | 52.0 (↓1.7) | 70.7 (↓2.3) | 69.6 (↓17.7) |
| Sculpt Experimental (kf=0.82) | 70.2 (↓8.5) | 70.7 (↓7.4) | 53.6 (↓2.0) | 47.6 (↓6.1) | 66.6 (↓6.4) | 54.7 (↓32.6) |
This Model vs Baseline
| Benchmark | Default | Baseline | Delta |
|---|---|---|---|
| arc_challenge | 56.4 | 55.6 | +0.8 |
| gsm8k | 81.5 | 87.3 | -5.8 |
| hellaswag | 75.8 | 78.1 | -2.3 |
| mmlu | 76.2 | 78.7 | -2.5 |
| mmlu_abstract_algebra | 54.0 | 66.0 | -12.0 |
| mmlu_anatomy | 75.6 | 77.8 | -2.2 |
| mmlu_astronomy | 90.1 | 92.8 | -2.7 |
| mmlu_business_ethics | 79.0 | 82.0 | -3.0 |
| mmlu_clinical_knowledge | 82.3 | 86.8 | -4.5 |
| mmlu_college_biology | 91.7 | 93.1 | -1.4 |
| mmlu_college_chemistry | 60.0 | 59.0 | +1.0 |
| mmlu_college_computer_science | 68.0 | 82.0 | -14.0 |
| mmlu_college_mathematics | 58.0 | 64.0 | -6.0 |
| mmlu_college_medicine | 76.9 | 81.5 | -4.6 |
| mmlu_college_physics | 61.8 | 64.7 | -2.9 |
| mmlu_computer_security | 85.0 | 83.0 | +2.0 |
| mmlu_conceptual_physics | 87.2 | 90.2 | -3.0 |
| mmlu_econometrics | 66.7 | 73.7 | -7.0 |
| mmlu_electrical_engineering | 77.2 | 82.1 | -4.9 |
| mmlu_elementary_mathematics | 74.6 | 80.7 | -6.1 |
| mmlu_formal_logic | 65.1 | 65.9 | -0.8 |
| mmlu_global_facts | 42.0 | 50.0 | -8.0 |
| mmlu_high_school_biology | 92.6 | 93.5 | -0.9 |
| mmlu_high_school_chemistry | 73.4 | 77.8 | -4.4 |
| mmlu_high_school_computer_science | 83.0 | 88.0 | -5.0 |
| mmlu_high_school_european_history | 83.0 | 87.3 | -4.3 |
| mmlu_high_school_geography | 90.9 | 92.4 | -1.5 |
| mmlu_high_school_government_and_politics | 94.3 | 96.9 | -2.6 |
| mmlu_high_school_macroeconomics | 80.8 | 85.9 | -5.1 |
| mmlu_high_school_mathematics | 52.6 | 53.3 | -0.7 |
| mmlu_high_school_microeconomics | 88.7 | 93.3 | -4.6 |
| mmlu_high_school_physics | 67.5 | 72.8 | -5.3 |
| mmlu_high_school_psychology | 90.3 | 93.2 | -2.9 |
| mmlu_high_school_statistics | 74.5 | 78.7 | -4.2 |
| mmlu_high_school_us_history | 88.7 | 90.2 | -1.5 |
| mmlu_high_school_world_history | 86.9 | 89.9 | -3.0 |
| mmlu_human_aging | 75.8 | 78.9 | -3.1 |
| mmlu_human_sexuality | 82.4 | 86.3 | -3.9 |
| mmlu_humanities | 69.6 | 70.5 | -0.9 |
| mmlu_international_law | 87.6 | 90.1 | -2.5 |
| mmlu_jurisprudence | 85.2 | 84.3 | +0.9 |
| mmlu_logical_fallacies | 85.3 | 84.7 | +0.6 |
| mmlu_machine_learning | 59.8 | 66.1 | -6.3 |
| mmlu_management | 85.4 | 86.4 | -1.0 |
| mmlu_marketing | 92.7 | 95.7 | -3.0 |
| mmlu_medical_genetics | 88.0 | 91.0 | -3.0 |
| mmlu_miscellaneous | 87.7 | 90.3 | -2.6 |
| mmlu_moral_disputes | 80.1 | 81.2 | -1.1 |
| mmlu_moral_scenarios | 54.0 | 53.3 | +0.7 |
| mmlu_nutrition | 83.3 | 86.3 | -3.0 |
| mmlu_other | 80.3 | 83.1 | -2.8 |
| mmlu_philosophy | 79.7 | 80.4 | -0.7 |
| mmlu_prehistory | 81.2 | 84.3 | -3.1 |
| mmlu_professional_accounting | 65.2 | 65.6 | -0.4 |
| mmlu_professional_law | 59.3 | 60.3 | -1.0 |
| mmlu_professional_medicine | 89.0 | 91.5 | -2.5 |
| mmlu_professional_psychology | 78.9 | 82.8 | -3.9 |
| mmlu_public_relations | 68.2 | 73.6 | -5.4 |
| mmlu_security_studies | 77.6 | 76.7 | +0.9 |
| mmlu_social_sciences | 83.7 | 87.0 | -3.3 |
| mmlu_sociology | 88.1 | 89.1 | -1.0 |
| mmlu_stem | 74.5 | 78.3 | -3.8 |
| mmlu_us_foreign_policy | 87.0 | 90.0 | -3.0 |
| mmlu_virology | 56.0 | 56.6 | -0.6 |
| mmlu_world_religions | 87.7 | 86.5 | +1.2 |
| truthfulqa_mc2 | 52.6 | 53.7 | -1.1 |
| winogrande | 68.7 | 73.0 | -4.3 |
Performance
| Metric | Sculpt | Baseline | Change |
|---|---|---|---|
| Model size | 16.3 GB | 16.7 GB | -2.2% |
| Parameters | 8,752,476,672 | — | — |
| Prefill throughput | 4,618 tok/s | 4,566 tok/s | +1% |
| Decode throughput | 36 tok/s | 37 tok/s | -4% |
KV-cache footprint is unchanged — Sculpt only compresses FFN layers, not attention.
Quick Start
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"dystrio/Qwen3.5-9B-Sculpt-Default",
torch_dtype="bfloat16",
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("dystrio/Qwen3.5-9B-Sculpt-Default")
inputs = tokenizer("The future of AI inference is", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
All Sculpt Tiers
| Tier | HuggingFace | Config | Use Case |
|---|---|---|---|
| Default | dystrio/Qwen3.5-9B-Sculpt-Default | kf=0.95 | Enterprise — maximum quality preservation |
| Production | dystrio/Qwen3.5-9B-Sculpt-Production | kf=0.9 | Enterprise — balanced quality and efficiency |
| Throughput | dystrio/Qwen3.5-9B-Sculpt-Throughput | kf=0.88 | Local/throughput — speed sweet spot (1.25x prefill) |
| Experimental | dystrio/Qwen3.5-9B-Sculpt-Experimental | kf=0.82 | Local — maximum compression (1.27x prefill) |
Technical Details
- Method: Structural FFN pruning with importance-aware block selection + live teacher distillation (alpha=0.5)
- Keep fraction: 0.95 (5% of FFN neurons removed)
- Repair: 8-stage cosine-LR fine-tuning with best-checkpoint restore
- Training data: general_v2 mixture (WikiText, OpenHermes 2.5, MMLU, HellaSwag, GSM8K, OpenOrca)
- Hardware: 1x NVIDIA H200 141GB
- Output: Standard dense transformer — loads with any HuggingFace-compatible framework
Compatibility
- HuggingFace Transformers
- vLLM
- TGI (Text Generation Inference)
- llama.cpp / GGUF conversion
- AWQ / GPTQ quantization
- Any framework that loads standard safetensors
Citation
@misc{dystrio_sculpt_2026,
title={Dystrio Sculpt: Structural Compilation for Transformer LLMs},
author={Dystrio},
year={2026},
url={https://huggingface.co/dystrio}
}
- Downloads last month
- 337