Qwen3.5-9B-Sculpt-Default

5% FFN compression with live teacher distillation. Drop-in replacement — no custom kernels, no runtime changes.

Dystrio Sculpt structurally compresses transformer FFN layers, producing dense models that load with standard transformers.

This is the Default tier of Qwen3.5-9B.

Use case: Enterprise — maximum quality preservation

Benchmark Results (lm_eval)

Model MMLU HellaSwag ARC-C TruthfulQA Winogrande GSM8K
Qwen3.5-9B (baseline) 78.7 78.1 55.6 53.7 73.0 87.3
Sculpt Default (kf=0.95) 76.2 (↓2.5) 75.8 (↓2.3) 56.4 (↑0.8) 52.6 (↓1.1) 68.7 (↓4.3) 81.5 (↓5.8)
Sculpt Production (kf=0.9) 73.9 (↓4.8) 75.1 (↓3.0) 56.8 (↑1.2) 47.3 (↓6.4) 69.8 (↓3.2) 74.5 (↓12.8)
Sculpt Throughput (kf=0.88) 70.8 (↓7.9) 74.0 (↓4.1) 57.2 (↑1.6) 52.0 (↓1.7) 70.7 (↓2.3) 69.6 (↓17.7)
Sculpt Experimental (kf=0.82) 70.2 (↓8.5) 70.7 (↓7.4) 53.6 (↓2.0) 47.6 (↓6.1) 66.6 (↓6.4) 54.7 (↓32.6)

This Model vs Baseline

Benchmark Default Baseline Delta
arc_challenge 56.4 55.6 +0.8
gsm8k 81.5 87.3 -5.8
hellaswag 75.8 78.1 -2.3
mmlu 76.2 78.7 -2.5
mmlu_abstract_algebra 54.0 66.0 -12.0
mmlu_anatomy 75.6 77.8 -2.2
mmlu_astronomy 90.1 92.8 -2.7
mmlu_business_ethics 79.0 82.0 -3.0
mmlu_clinical_knowledge 82.3 86.8 -4.5
mmlu_college_biology 91.7 93.1 -1.4
mmlu_college_chemistry 60.0 59.0 +1.0
mmlu_college_computer_science 68.0 82.0 -14.0
mmlu_college_mathematics 58.0 64.0 -6.0
mmlu_college_medicine 76.9 81.5 -4.6
mmlu_college_physics 61.8 64.7 -2.9
mmlu_computer_security 85.0 83.0 +2.0
mmlu_conceptual_physics 87.2 90.2 -3.0
mmlu_econometrics 66.7 73.7 -7.0
mmlu_electrical_engineering 77.2 82.1 -4.9
mmlu_elementary_mathematics 74.6 80.7 -6.1
mmlu_formal_logic 65.1 65.9 -0.8
mmlu_global_facts 42.0 50.0 -8.0
mmlu_high_school_biology 92.6 93.5 -0.9
mmlu_high_school_chemistry 73.4 77.8 -4.4
mmlu_high_school_computer_science 83.0 88.0 -5.0
mmlu_high_school_european_history 83.0 87.3 -4.3
mmlu_high_school_geography 90.9 92.4 -1.5
mmlu_high_school_government_and_politics 94.3 96.9 -2.6
mmlu_high_school_macroeconomics 80.8 85.9 -5.1
mmlu_high_school_mathematics 52.6 53.3 -0.7
mmlu_high_school_microeconomics 88.7 93.3 -4.6
mmlu_high_school_physics 67.5 72.8 -5.3
mmlu_high_school_psychology 90.3 93.2 -2.9
mmlu_high_school_statistics 74.5 78.7 -4.2
mmlu_high_school_us_history 88.7 90.2 -1.5
mmlu_high_school_world_history 86.9 89.9 -3.0
mmlu_human_aging 75.8 78.9 -3.1
mmlu_human_sexuality 82.4 86.3 -3.9
mmlu_humanities 69.6 70.5 -0.9
mmlu_international_law 87.6 90.1 -2.5
mmlu_jurisprudence 85.2 84.3 +0.9
mmlu_logical_fallacies 85.3 84.7 +0.6
mmlu_machine_learning 59.8 66.1 -6.3
mmlu_management 85.4 86.4 -1.0
mmlu_marketing 92.7 95.7 -3.0
mmlu_medical_genetics 88.0 91.0 -3.0
mmlu_miscellaneous 87.7 90.3 -2.6
mmlu_moral_disputes 80.1 81.2 -1.1
mmlu_moral_scenarios 54.0 53.3 +0.7
mmlu_nutrition 83.3 86.3 -3.0
mmlu_other 80.3 83.1 -2.8
mmlu_philosophy 79.7 80.4 -0.7
mmlu_prehistory 81.2 84.3 -3.1
mmlu_professional_accounting 65.2 65.6 -0.4
mmlu_professional_law 59.3 60.3 -1.0
mmlu_professional_medicine 89.0 91.5 -2.5
mmlu_professional_psychology 78.9 82.8 -3.9
mmlu_public_relations 68.2 73.6 -5.4
mmlu_security_studies 77.6 76.7 +0.9
mmlu_social_sciences 83.7 87.0 -3.3
mmlu_sociology 88.1 89.1 -1.0
mmlu_stem 74.5 78.3 -3.8
mmlu_us_foreign_policy 87.0 90.0 -3.0
mmlu_virology 56.0 56.6 -0.6
mmlu_world_religions 87.7 86.5 +1.2
truthfulqa_mc2 52.6 53.7 -1.1
winogrande 68.7 73.0 -4.3

Performance

Metric Sculpt Baseline Change
Model size 16.3 GB 16.7 GB -2.2%
Parameters 8,752,476,672
Prefill throughput 4,618 tok/s 4,566 tok/s +1%
Decode throughput 36 tok/s 37 tok/s -4%

KV-cache footprint is unchanged — Sculpt only compresses FFN layers, not attention.

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "dystrio/Qwen3.5-9B-Sculpt-Default",
    torch_dtype="bfloat16",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("dystrio/Qwen3.5-9B-Sculpt-Default")

inputs = tokenizer("The future of AI inference is", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

All Sculpt Tiers

Tier HuggingFace Config Use Case
Default dystrio/Qwen3.5-9B-Sculpt-Default kf=0.95 Enterprise — maximum quality preservation
Production dystrio/Qwen3.5-9B-Sculpt-Production kf=0.9 Enterprise — balanced quality and efficiency
Throughput dystrio/Qwen3.5-9B-Sculpt-Throughput kf=0.88 Local/throughput — speed sweet spot (1.25x prefill)
Experimental dystrio/Qwen3.5-9B-Sculpt-Experimental kf=0.82 Local — maximum compression (1.27x prefill)

Technical Details

  • Method: Structural FFN pruning with importance-aware block selection + live teacher distillation (alpha=0.5)
  • Keep fraction: 0.95 (5% of FFN neurons removed)
  • Repair: 8-stage cosine-LR fine-tuning with best-checkpoint restore
  • Training data: general_v2 mixture (WikiText, OpenHermes 2.5, MMLU, HellaSwag, GSM8K, OpenOrca)
  • Hardware: 1x NVIDIA H200 141GB
  • Output: Standard dense transformer — loads with any HuggingFace-compatible framework

Compatibility

  • HuggingFace Transformers
  • vLLM
  • TGI (Text Generation Inference)
  • llama.cpp / GGUF conversion
  • AWQ / GPTQ quantization
  • Any framework that loads standard safetensors

Citation

@misc{dystrio_sculpt_2026,
  title={Dystrio Sculpt: Structural Compilation for Transformer LLMs},
  author={Dystrio},
  year={2026},
  url={https://huggingface.co/dystrio}
}
Downloads last month
337
Safetensors
Model size
9B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dystrio/Qwen3.5-9B-Sculpt-Default

Finetuned
Qwen/Qwen3.5-9B
Finetuned
(179)
this model

Datasets used to train dystrio/Qwen3.5-9B-Sculpt-Default