Qwen3.5-9B-Sculpt-Production

10% FFN compression with live teacher distillation. Drop-in replacement — no custom kernels, no runtime changes.

Dystrio Sculpt structurally compresses transformer FFN layers, producing dense models that load with standard transformers.

This is the Production tier of Qwen3.5-9B.

Use case: Enterprise — balanced quality and efficiency

Benchmark Results (lm_eval)

Model	MMLU	HellaSwag	ARC-C	TruthfulQA	Winogrande	GSM8K
Qwen3.5-9B (baseline)	78.7	78.1	55.6	53.7	73.0	87.3
Sculpt Default (kf=0.95)	76.2 (↓2.5)	75.8 (↓2.3)	56.4 (↑0.8)	52.6 (↓1.1)	68.7 (↓4.3)	81.5 (↓5.8)
Sculpt Production (kf=0.9)	73.9 (↓4.8)	75.1 (↓3.0)	56.8 (↑1.2)	47.3 (↓6.4)	69.8 (↓3.2)	74.5 (↓12.8)
Sculpt Throughput (kf=0.88)	70.8 (↓7.9)	74.0 (↓4.1)	57.2 (↑1.6)	52.0 (↓1.7)	70.7 (↓2.3)	69.6 (↓17.7)
Sculpt Experimental (kf=0.82)	70.2 (↓8.5)	70.7 (↓7.4)	53.6 (↓2.0)	47.6 (↓6.1)	66.6 (↓6.4)	54.7 (↓32.6)

This Model vs Baseline

Benchmark	Production	Baseline	Delta
arc_challenge	56.8	55.6	+1.2
gsm8k	74.5	87.3	-12.8
hellaswag	75.1	78.1	-3.0
mmlu	73.9	78.7	-4.8
mmlu_abstract_algebra	53.0	66.0	-13.0
mmlu_anatomy	70.4	77.8	-7.4
mmlu_astronomy	86.2	92.8	-6.6
mmlu_business_ethics	74.0	82.0	-8.0
mmlu_clinical_knowledge	81.9	86.8	-4.9
mmlu_college_biology	91.0	93.1	-2.1
mmlu_college_chemistry	55.0	59.0	-4.0
mmlu_college_computer_science	74.0	82.0	-8.0
mmlu_college_mathematics	54.0	64.0	-10.0
mmlu_college_medicine	78.0	81.5	-3.5
mmlu_college_physics	62.7	64.7	-2.0
mmlu_computer_security	77.0	83.0	-6.0
mmlu_conceptual_physics	81.7	90.2	-8.5
mmlu_econometrics	61.4	73.7	-12.3
mmlu_electrical_engineering	73.1	82.1	-9.0
mmlu_elementary_mathematics	71.7	80.7	-9.0
mmlu_formal_logic	56.3	65.9	-9.6
mmlu_global_facts	46.0	50.0	-4.0
mmlu_high_school_biology	91.3	93.5	-2.2
mmlu_high_school_chemistry	74.4	77.8	-3.4
mmlu_high_school_computer_science	79.0	88.0	-9.0
mmlu_high_school_european_history	85.5	87.3	-1.8
mmlu_high_school_geography	88.9	92.4	-3.5
mmlu_high_school_government_and_politics	88.6	96.9	-8.3
mmlu_high_school_macroeconomics	80.8	85.9	-5.1
mmlu_high_school_mathematics	47.4	53.3	-5.9
mmlu_high_school_microeconomics	85.3	93.3	-8.0
mmlu_high_school_physics	64.2	72.8	-8.6
mmlu_high_school_psychology	91.4	93.2	-1.8
mmlu_high_school_statistics	74.5	78.7	-4.2
mmlu_high_school_us_history	83.3	90.2	-6.9
mmlu_high_school_world_history	83.1	89.9	-6.8
mmlu_human_aging	73.5	78.9	-5.4
mmlu_human_sexuality	80.2	86.3	-6.1
mmlu_humanities	66.4	70.5	-4.1
mmlu_international_law	84.3	90.1	-5.8
mmlu_jurisprudence	80.6	84.3	-3.7
mmlu_logical_fallacies	84.7	84.7	+0.0
mmlu_machine_learning	56.2	66.1	-9.9
mmlu_management	87.4	86.4	+1.0
mmlu_marketing	91.9	95.7	-3.8
mmlu_medical_genetics	84.0	91.0	-7.0
mmlu_miscellaneous	86.2	90.3	-4.1
mmlu_moral_disputes	74.6	81.2	-6.6
mmlu_moral_scenarios	52.5	53.3	-0.8
mmlu_nutrition	78.4	86.3	-7.9
mmlu_other	78.5	83.1	-4.6
mmlu_philosophy	76.5	80.4	-3.9
mmlu_prehistory	79.0	84.3	-5.3
mmlu_professional_accounting	62.8	65.6	-2.8
mmlu_professional_law	55.7	60.3	-4.6
mmlu_professional_medicine	86.0	91.5	-5.5
mmlu_professional_psychology	77.3	82.8	-5.5
mmlu_public_relations	65.5	73.6	-8.1
mmlu_security_studies	79.2	76.7	+2.5
mmlu_social_sciences	82.8	87.0	-4.2
mmlu_sociology	90.5	89.1	+1.4
mmlu_stem	71.8	78.3	-6.5
mmlu_us_foreign_policy	89.0	90.0	-1.0
mmlu_virology	52.4	56.6	-4.2
mmlu_world_religions	81.9	86.5	-4.6
truthfulqa_mc2	47.3	53.7	-6.4
winogrande	69.8	73.0	-3.2

Performance

Metric	Sculpt	Baseline	Change
Model size	15.8 GB	16.7 GB	-5.1%
Parameters	8,500,818,432	—	—
Prefill throughput	4,634 tok/s	4,566 tok/s	+1%
Decode throughput	36 tok/s	37 tok/s	-4%

KV-cache footprint is unchanged — Sculpt only compresses FFN layers, not attention.

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "dystrio/Qwen3.5-9B-Sculpt-Production",
    torch_dtype="bfloat16",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("dystrio/Qwen3.5-9B-Sculpt-Production")

inputs = tokenizer("The future of AI inference is", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

All Sculpt Tiers

Tier	HuggingFace	Config	Use Case
Default	dystrio/Qwen3.5-9B-Sculpt-Default	kf=0.95	Enterprise — maximum quality preservation
Production	dystrio/Qwen3.5-9B-Sculpt-Production	kf=0.9	Enterprise — balanced quality and efficiency
Throughput	dystrio/Qwen3.5-9B-Sculpt-Throughput	kf=0.88	Local/throughput — speed sweet spot (1.25x prefill)
Experimental	dystrio/Qwen3.5-9B-Sculpt-Experimental	kf=0.82	Local — maximum compression (1.27x prefill)

Technical Details

Method: Structural FFN pruning with importance-aware block selection + live teacher distillation (alpha=0.5)
Keep fraction: 0.9 (10% of FFN neurons removed)
Repair: 8-stage cosine-LR fine-tuning with best-checkpoint restore
Training data: general_v2 mixture (WikiText, OpenHermes 2.5, MMLU, HellaSwag, GSM8K, OpenOrca)
Hardware: 1x NVIDIA H200 141GB
Output: Standard dense transformer — loads with any HuggingFace-compatible framework

Compatibility

HuggingFace Transformers
vLLM
TGI (Text Generation Inference)
llama.cpp / GGUF conversion
AWQ / GPTQ quantization
Any framework that loads standard safetensors

Citation

@misc{dystrio_sculpt_2026,
  title={Dystrio Sculpt: Structural Compilation for Transformer LLMs},
  author={Dystrio},
  year={2026},
  url={https://huggingface.co/dystrio}
}

Downloads last month: 343

Safetensors

Model size

9B params

Tensor type

BF16

Model tree for dystrio/Qwen3.5-9B-Sculpt-Production

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B