Qwen3.5-9B-Sculpt-Production

10% FFN compression with live teacher distillation. Drop-in replacement — no custom kernels, no runtime changes.

Dystrio Sculpt structurally compresses transformer FFN layers, producing dense models that load with standard transformers.

This is the Production tier of Qwen3.5-9B.

Use case: Enterprise — balanced quality and efficiency

Benchmark Results (lm_eval)

Model MMLU HellaSwag ARC-C TruthfulQA Winogrande GSM8K
Qwen3.5-9B (baseline) 78.7 78.1 55.6 53.7 73.0 87.3
Sculpt Default (kf=0.95) 76.2 (↓2.5) 75.8 (↓2.3) 56.4 (↑0.8) 52.6 (↓1.1) 68.7 (↓4.3) 81.5 (↓5.8)
Sculpt Production (kf=0.9) 73.9 (↓4.8) 75.1 (↓3.0) 56.8 (↑1.2) 47.3 (↓6.4) 69.8 (↓3.2) 74.5 (↓12.8)
Sculpt Throughput (kf=0.88) 70.8 (↓7.9) 74.0 (↓4.1) 57.2 (↑1.6) 52.0 (↓1.7) 70.7 (↓2.3) 69.6 (↓17.7)
Sculpt Experimental (kf=0.82) 70.2 (↓8.5) 70.7 (↓7.4) 53.6 (↓2.0) 47.6 (↓6.1) 66.6 (↓6.4) 54.7 (↓32.6)

This Model vs Baseline

Benchmark Production Baseline Delta
arc_challenge 56.8 55.6 +1.2
gsm8k 74.5 87.3 -12.8
hellaswag 75.1 78.1 -3.0
mmlu 73.9 78.7 -4.8
mmlu_abstract_algebra 53.0 66.0 -13.0
mmlu_anatomy 70.4 77.8 -7.4
mmlu_astronomy 86.2 92.8 -6.6
mmlu_business_ethics 74.0 82.0 -8.0
mmlu_clinical_knowledge 81.9 86.8 -4.9
mmlu_college_biology 91.0 93.1 -2.1
mmlu_college_chemistry 55.0 59.0 -4.0
mmlu_college_computer_science 74.0 82.0 -8.0
mmlu_college_mathematics 54.0 64.0 -10.0
mmlu_college_medicine 78.0 81.5 -3.5
mmlu_college_physics 62.7 64.7 -2.0
mmlu_computer_security 77.0 83.0 -6.0
mmlu_conceptual_physics 81.7 90.2 -8.5
mmlu_econometrics 61.4 73.7 -12.3
mmlu_electrical_engineering 73.1 82.1 -9.0
mmlu_elementary_mathematics 71.7 80.7 -9.0
mmlu_formal_logic 56.3 65.9 -9.6
mmlu_global_facts 46.0 50.0 -4.0
mmlu_high_school_biology 91.3 93.5 -2.2
mmlu_high_school_chemistry 74.4 77.8 -3.4
mmlu_high_school_computer_science 79.0 88.0 -9.0
mmlu_high_school_european_history 85.5 87.3 -1.8
mmlu_high_school_geography 88.9 92.4 -3.5
mmlu_high_school_government_and_politics 88.6 96.9 -8.3
mmlu_high_school_macroeconomics 80.8 85.9 -5.1
mmlu_high_school_mathematics 47.4 53.3 -5.9
mmlu_high_school_microeconomics 85.3 93.3 -8.0
mmlu_high_school_physics 64.2 72.8 -8.6
mmlu_high_school_psychology 91.4 93.2 -1.8
mmlu_high_school_statistics 74.5 78.7 -4.2
mmlu_high_school_us_history 83.3 90.2 -6.9
mmlu_high_school_world_history 83.1 89.9 -6.8
mmlu_human_aging 73.5 78.9 -5.4
mmlu_human_sexuality 80.2 86.3 -6.1
mmlu_humanities 66.4 70.5 -4.1
mmlu_international_law 84.3 90.1 -5.8
mmlu_jurisprudence 80.6 84.3 -3.7
mmlu_logical_fallacies 84.7 84.7 +0.0
mmlu_machine_learning 56.2 66.1 -9.9
mmlu_management 87.4 86.4 +1.0
mmlu_marketing 91.9 95.7 -3.8
mmlu_medical_genetics 84.0 91.0 -7.0
mmlu_miscellaneous 86.2 90.3 -4.1
mmlu_moral_disputes 74.6 81.2 -6.6
mmlu_moral_scenarios 52.5 53.3 -0.8
mmlu_nutrition 78.4 86.3 -7.9
mmlu_other 78.5 83.1 -4.6
mmlu_philosophy 76.5 80.4 -3.9
mmlu_prehistory 79.0 84.3 -5.3
mmlu_professional_accounting 62.8 65.6 -2.8
mmlu_professional_law 55.7 60.3 -4.6
mmlu_professional_medicine 86.0 91.5 -5.5
mmlu_professional_psychology 77.3 82.8 -5.5
mmlu_public_relations 65.5 73.6 -8.1
mmlu_security_studies 79.2 76.7 +2.5
mmlu_social_sciences 82.8 87.0 -4.2
mmlu_sociology 90.5 89.1 +1.4
mmlu_stem 71.8 78.3 -6.5
mmlu_us_foreign_policy 89.0 90.0 -1.0
mmlu_virology 52.4 56.6 -4.2
mmlu_world_religions 81.9 86.5 -4.6
truthfulqa_mc2 47.3 53.7 -6.4
winogrande 69.8 73.0 -3.2

Performance

Metric Sculpt Baseline Change
Model size 15.8 GB 16.7 GB -5.1%
Parameters 8,500,818,432
Prefill throughput 4,634 tok/s 4,566 tok/s +1%
Decode throughput 36 tok/s 37 tok/s -4%

KV-cache footprint is unchanged — Sculpt only compresses FFN layers, not attention.

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "dystrio/Qwen3.5-9B-Sculpt-Production",
    torch_dtype="bfloat16",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("dystrio/Qwen3.5-9B-Sculpt-Production")

inputs = tokenizer("The future of AI inference is", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

All Sculpt Tiers

Tier HuggingFace Config Use Case
Default dystrio/Qwen3.5-9B-Sculpt-Default kf=0.95 Enterprise — maximum quality preservation
Production dystrio/Qwen3.5-9B-Sculpt-Production kf=0.9 Enterprise — balanced quality and efficiency
Throughput dystrio/Qwen3.5-9B-Sculpt-Throughput kf=0.88 Local/throughput — speed sweet spot (1.25x prefill)
Experimental dystrio/Qwen3.5-9B-Sculpt-Experimental kf=0.82 Local — maximum compression (1.27x prefill)

Technical Details

  • Method: Structural FFN pruning with importance-aware block selection + live teacher distillation (alpha=0.5)
  • Keep fraction: 0.9 (10% of FFN neurons removed)
  • Repair: 8-stage cosine-LR fine-tuning with best-checkpoint restore
  • Training data: general_v2 mixture (WikiText, OpenHermes 2.5, MMLU, HellaSwag, GSM8K, OpenOrca)
  • Hardware: 1x NVIDIA H200 141GB
  • Output: Standard dense transformer — loads with any HuggingFace-compatible framework

Compatibility

  • HuggingFace Transformers
  • vLLM
  • TGI (Text Generation Inference)
  • llama.cpp / GGUF conversion
  • AWQ / GPTQ quantization
  • Any framework that loads standard safetensors

Citation

@misc{dystrio_sculpt_2026,
  title={Dystrio Sculpt: Structural Compilation for Transformer LLMs},
  author={Dystrio},
  year={2026},
  url={https://huggingface.co/dystrio}
}
Downloads last month
343
Safetensors
Model size
9B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dystrio/Qwen3.5-9B-Sculpt-Production

Finetuned
Qwen/Qwen3.5-9B
Finetuned
(180)
this model

Datasets used to train dystrio/Qwen3.5-9B-Sculpt-Production