Qwen3.5-9B-Sculpt-Production
10% FFN compression with live teacher distillation. Drop-in replacement — no custom kernels, no runtime changes.
Dystrio Sculpt structurally compresses transformer FFN layers, producing dense models that load with standard transformers.
This is the Production tier of Qwen3.5-9B.
Use case: Enterprise — balanced quality and efficiency
Benchmark Results (lm_eval)
| Model | MMLU | HellaSwag | ARC-C | TruthfulQA | Winogrande | GSM8K |
|---|---|---|---|---|---|---|
| Qwen3.5-9B (baseline) | 78.7 | 78.1 | 55.6 | 53.7 | 73.0 | 87.3 |
| Sculpt Default (kf=0.95) | 76.2 (↓2.5) | 75.8 (↓2.3) | 56.4 (↑0.8) | 52.6 (↓1.1) | 68.7 (↓4.3) | 81.5 (↓5.8) |
| Sculpt Production (kf=0.9) | 73.9 (↓4.8) | 75.1 (↓3.0) | 56.8 (↑1.2) | 47.3 (↓6.4) | 69.8 (↓3.2) | 74.5 (↓12.8) |
| Sculpt Throughput (kf=0.88) | 70.8 (↓7.9) | 74.0 (↓4.1) | 57.2 (↑1.6) | 52.0 (↓1.7) | 70.7 (↓2.3) | 69.6 (↓17.7) |
| Sculpt Experimental (kf=0.82) | 70.2 (↓8.5) | 70.7 (↓7.4) | 53.6 (↓2.0) | 47.6 (↓6.1) | 66.6 (↓6.4) | 54.7 (↓32.6) |
This Model vs Baseline
| Benchmark | Production | Baseline | Delta |
|---|---|---|---|
| arc_challenge | 56.8 | 55.6 | +1.2 |
| gsm8k | 74.5 | 87.3 | -12.8 |
| hellaswag | 75.1 | 78.1 | -3.0 |
| mmlu | 73.9 | 78.7 | -4.8 |
| mmlu_abstract_algebra | 53.0 | 66.0 | -13.0 |
| mmlu_anatomy | 70.4 | 77.8 | -7.4 |
| mmlu_astronomy | 86.2 | 92.8 | -6.6 |
| mmlu_business_ethics | 74.0 | 82.0 | -8.0 |
| mmlu_clinical_knowledge | 81.9 | 86.8 | -4.9 |
| mmlu_college_biology | 91.0 | 93.1 | -2.1 |
| mmlu_college_chemistry | 55.0 | 59.0 | -4.0 |
| mmlu_college_computer_science | 74.0 | 82.0 | -8.0 |
| mmlu_college_mathematics | 54.0 | 64.0 | -10.0 |
| mmlu_college_medicine | 78.0 | 81.5 | -3.5 |
| mmlu_college_physics | 62.7 | 64.7 | -2.0 |
| mmlu_computer_security | 77.0 | 83.0 | -6.0 |
| mmlu_conceptual_physics | 81.7 | 90.2 | -8.5 |
| mmlu_econometrics | 61.4 | 73.7 | -12.3 |
| mmlu_electrical_engineering | 73.1 | 82.1 | -9.0 |
| mmlu_elementary_mathematics | 71.7 | 80.7 | -9.0 |
| mmlu_formal_logic | 56.3 | 65.9 | -9.6 |
| mmlu_global_facts | 46.0 | 50.0 | -4.0 |
| mmlu_high_school_biology | 91.3 | 93.5 | -2.2 |
| mmlu_high_school_chemistry | 74.4 | 77.8 | -3.4 |
| mmlu_high_school_computer_science | 79.0 | 88.0 | -9.0 |
| mmlu_high_school_european_history | 85.5 | 87.3 | -1.8 |
| mmlu_high_school_geography | 88.9 | 92.4 | -3.5 |
| mmlu_high_school_government_and_politics | 88.6 | 96.9 | -8.3 |
| mmlu_high_school_macroeconomics | 80.8 | 85.9 | -5.1 |
| mmlu_high_school_mathematics | 47.4 | 53.3 | -5.9 |
| mmlu_high_school_microeconomics | 85.3 | 93.3 | -8.0 |
| mmlu_high_school_physics | 64.2 | 72.8 | -8.6 |
| mmlu_high_school_psychology | 91.4 | 93.2 | -1.8 |
| mmlu_high_school_statistics | 74.5 | 78.7 | -4.2 |
| mmlu_high_school_us_history | 83.3 | 90.2 | -6.9 |
| mmlu_high_school_world_history | 83.1 | 89.9 | -6.8 |
| mmlu_human_aging | 73.5 | 78.9 | -5.4 |
| mmlu_human_sexuality | 80.2 | 86.3 | -6.1 |
| mmlu_humanities | 66.4 | 70.5 | -4.1 |
| mmlu_international_law | 84.3 | 90.1 | -5.8 |
| mmlu_jurisprudence | 80.6 | 84.3 | -3.7 |
| mmlu_logical_fallacies | 84.7 | 84.7 | +0.0 |
| mmlu_machine_learning | 56.2 | 66.1 | -9.9 |
| mmlu_management | 87.4 | 86.4 | +1.0 |
| mmlu_marketing | 91.9 | 95.7 | -3.8 |
| mmlu_medical_genetics | 84.0 | 91.0 | -7.0 |
| mmlu_miscellaneous | 86.2 | 90.3 | -4.1 |
| mmlu_moral_disputes | 74.6 | 81.2 | -6.6 |
| mmlu_moral_scenarios | 52.5 | 53.3 | -0.8 |
| mmlu_nutrition | 78.4 | 86.3 | -7.9 |
| mmlu_other | 78.5 | 83.1 | -4.6 |
| mmlu_philosophy | 76.5 | 80.4 | -3.9 |
| mmlu_prehistory | 79.0 | 84.3 | -5.3 |
| mmlu_professional_accounting | 62.8 | 65.6 | -2.8 |
| mmlu_professional_law | 55.7 | 60.3 | -4.6 |
| mmlu_professional_medicine | 86.0 | 91.5 | -5.5 |
| mmlu_professional_psychology | 77.3 | 82.8 | -5.5 |
| mmlu_public_relations | 65.5 | 73.6 | -8.1 |
| mmlu_security_studies | 79.2 | 76.7 | +2.5 |
| mmlu_social_sciences | 82.8 | 87.0 | -4.2 |
| mmlu_sociology | 90.5 | 89.1 | +1.4 |
| mmlu_stem | 71.8 | 78.3 | -6.5 |
| mmlu_us_foreign_policy | 89.0 | 90.0 | -1.0 |
| mmlu_virology | 52.4 | 56.6 | -4.2 |
| mmlu_world_religions | 81.9 | 86.5 | -4.6 |
| truthfulqa_mc2 | 47.3 | 53.7 | -6.4 |
| winogrande | 69.8 | 73.0 | -3.2 |
Performance
| Metric | Sculpt | Baseline | Change |
|---|---|---|---|
| Model size | 15.8 GB | 16.7 GB | -5.1% |
| Parameters | 8,500,818,432 | — | — |
| Prefill throughput | 4,634 tok/s | 4,566 tok/s | +1% |
| Decode throughput | 36 tok/s | 37 tok/s | -4% |
KV-cache footprint is unchanged — Sculpt only compresses FFN layers, not attention.
Quick Start
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"dystrio/Qwen3.5-9B-Sculpt-Production",
torch_dtype="bfloat16",
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("dystrio/Qwen3.5-9B-Sculpt-Production")
inputs = tokenizer("The future of AI inference is", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
All Sculpt Tiers
| Tier | HuggingFace | Config | Use Case |
|---|---|---|---|
| Default | dystrio/Qwen3.5-9B-Sculpt-Default | kf=0.95 | Enterprise — maximum quality preservation |
| Production | dystrio/Qwen3.5-9B-Sculpt-Production | kf=0.9 | Enterprise — balanced quality and efficiency |
| Throughput | dystrio/Qwen3.5-9B-Sculpt-Throughput | kf=0.88 | Local/throughput — speed sweet spot (1.25x prefill) |
| Experimental | dystrio/Qwen3.5-9B-Sculpt-Experimental | kf=0.82 | Local — maximum compression (1.27x prefill) |
Technical Details
- Method: Structural FFN pruning with importance-aware block selection + live teacher distillation (alpha=0.5)
- Keep fraction: 0.9 (10% of FFN neurons removed)
- Repair: 8-stage cosine-LR fine-tuning with best-checkpoint restore
- Training data: general_v2 mixture (WikiText, OpenHermes 2.5, MMLU, HellaSwag, GSM8K, OpenOrca)
- Hardware: 1x NVIDIA H200 141GB
- Output: Standard dense transformer — loads with any HuggingFace-compatible framework
Compatibility
- HuggingFace Transformers
- vLLM
- TGI (Text Generation Inference)
- llama.cpp / GGUF conversion
- AWQ / GPTQ quantization
- Any framework that loads standard safetensors
Citation
@misc{dystrio_sculpt_2026,
title={Dystrio Sculpt: Structural Compilation for Transformer LLMs},
author={Dystrio},
year={2026},
url={https://huggingface.co/dystrio}
}
- Downloads last month
- 343