Qwen3.5-9B-Sculpt-Default

5% FFN compression with live teacher distillation. Drop-in replacement — no custom kernels, no runtime changes.

Dystrio Sculpt structurally compresses transformer FFN layers, producing dense models that load with standard transformers.

This is the Default tier of Qwen3.5-9B.

Use case: Enterprise — maximum quality preservation

Benchmark Results (lm_eval)

Model	MMLU	HellaSwag	ARC-C	TruthfulQA	Winogrande	GSM8K
Qwen3.5-9B (baseline)	78.7	78.1	55.6	53.7	73.0	87.3
Sculpt Default (kf=0.95)	76.2 (↓2.5)	75.8 (↓2.3)	56.4 (↑0.8)	52.6 (↓1.1)	68.7 (↓4.3)	81.5 (↓5.8)
Sculpt Production (kf=0.9)	73.9 (↓4.8)	75.1 (↓3.0)	56.8 (↑1.2)	47.3 (↓6.4)	69.8 (↓3.2)	74.5 (↓12.8)
Sculpt Throughput (kf=0.88)	70.8 (↓7.9)	74.0 (↓4.1)	57.2 (↑1.6)	52.0 (↓1.7)	70.7 (↓2.3)	69.6 (↓17.7)
Sculpt Experimental (kf=0.82)	70.2 (↓8.5)	70.7 (↓7.4)	53.6 (↓2.0)	47.6 (↓6.1)	66.6 (↓6.4)	54.7 (↓32.6)

This Model vs Baseline

Benchmark	Default	Baseline	Delta
arc_challenge	56.4	55.6	+0.8
gsm8k	81.5	87.3	-5.8
hellaswag	75.8	78.1	-2.3
mmlu	76.2	78.7	-2.5
mmlu_abstract_algebra	54.0	66.0	-12.0
mmlu_anatomy	75.6	77.8	-2.2
mmlu_astronomy	90.1	92.8	-2.7
mmlu_business_ethics	79.0	82.0	-3.0
mmlu_clinical_knowledge	82.3	86.8	-4.5
mmlu_college_biology	91.7	93.1	-1.4
mmlu_college_chemistry	60.0	59.0	+1.0
mmlu_college_computer_science	68.0	82.0	-14.0
mmlu_college_mathematics	58.0	64.0	-6.0
mmlu_college_medicine	76.9	81.5	-4.6
mmlu_college_physics	61.8	64.7	-2.9
mmlu_computer_security	85.0	83.0	+2.0
mmlu_conceptual_physics	87.2	90.2	-3.0
mmlu_econometrics	66.7	73.7	-7.0
mmlu_electrical_engineering	77.2	82.1	-4.9
mmlu_elementary_mathematics	74.6	80.7	-6.1
mmlu_formal_logic	65.1	65.9	-0.8
mmlu_global_facts	42.0	50.0	-8.0
mmlu_high_school_biology	92.6	93.5	-0.9
mmlu_high_school_chemistry	73.4	77.8	-4.4
mmlu_high_school_computer_science	83.0	88.0	-5.0
mmlu_high_school_european_history	83.0	87.3	-4.3
mmlu_high_school_geography	90.9	92.4	-1.5
mmlu_high_school_government_and_politics	94.3	96.9	-2.6
mmlu_high_school_macroeconomics	80.8	85.9	-5.1
mmlu_high_school_mathematics	52.6	53.3	-0.7
mmlu_high_school_microeconomics	88.7	93.3	-4.6
mmlu_high_school_physics	67.5	72.8	-5.3
mmlu_high_school_psychology	90.3	93.2	-2.9
mmlu_high_school_statistics	74.5	78.7	-4.2
mmlu_high_school_us_history	88.7	90.2	-1.5
mmlu_high_school_world_history	86.9	89.9	-3.0
mmlu_human_aging	75.8	78.9	-3.1
mmlu_human_sexuality	82.4	86.3	-3.9
mmlu_humanities	69.6	70.5	-0.9
mmlu_international_law	87.6	90.1	-2.5
mmlu_jurisprudence	85.2	84.3	+0.9
mmlu_logical_fallacies	85.3	84.7	+0.6
mmlu_machine_learning	59.8	66.1	-6.3
mmlu_management	85.4	86.4	-1.0
mmlu_marketing	92.7	95.7	-3.0
mmlu_medical_genetics	88.0	91.0	-3.0
mmlu_miscellaneous	87.7	90.3	-2.6
mmlu_moral_disputes	80.1	81.2	-1.1
mmlu_moral_scenarios	54.0	53.3	+0.7
mmlu_nutrition	83.3	86.3	-3.0
mmlu_other	80.3	83.1	-2.8
mmlu_philosophy	79.7	80.4	-0.7
mmlu_prehistory	81.2	84.3	-3.1
mmlu_professional_accounting	65.2	65.6	-0.4
mmlu_professional_law	59.3	60.3	-1.0
mmlu_professional_medicine	89.0	91.5	-2.5
mmlu_professional_psychology	78.9	82.8	-3.9
mmlu_public_relations	68.2	73.6	-5.4
mmlu_security_studies	77.6	76.7	+0.9
mmlu_social_sciences	83.7	87.0	-3.3
mmlu_sociology	88.1	89.1	-1.0
mmlu_stem	74.5	78.3	-3.8
mmlu_us_foreign_policy	87.0	90.0	-3.0
mmlu_virology	56.0	56.6	-0.6
mmlu_world_religions	87.7	86.5	+1.2
truthfulqa_mc2	52.6	53.7	-1.1
winogrande	68.7	73.0	-4.3

Performance

Metric	Sculpt	Baseline	Change
Model size	16.3 GB	16.7 GB	-2.2%
Parameters	8,752,476,672	—	—
Prefill throughput	4,618 tok/s	4,566 tok/s	+1%
Decode throughput	36 tok/s	37 tok/s	-4%

KV-cache footprint is unchanged — Sculpt only compresses FFN layers, not attention.

Quick Start

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "dystrio/Qwen3.5-9B-Sculpt-Default",
    torch_dtype="bfloat16",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("dystrio/Qwen3.5-9B-Sculpt-Default")

inputs = tokenizer("The future of AI inference is", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

All Sculpt Tiers

Tier	HuggingFace	Config	Use Case
Default	dystrio/Qwen3.5-9B-Sculpt-Default	kf=0.95	Enterprise — maximum quality preservation
Production	dystrio/Qwen3.5-9B-Sculpt-Production	kf=0.9	Enterprise — balanced quality and efficiency
Throughput	dystrio/Qwen3.5-9B-Sculpt-Throughput	kf=0.88	Local/throughput — speed sweet spot (1.25x prefill)
Experimental	dystrio/Qwen3.5-9B-Sculpt-Experimental	kf=0.82	Local — maximum compression (1.27x prefill)

Technical Details

Method: Structural FFN pruning with importance-aware block selection + live teacher distillation (alpha=0.5)
Keep fraction: 0.95 (5% of FFN neurons removed)
Repair: 8-stage cosine-LR fine-tuning with best-checkpoint restore
Training data: general_v2 mixture (WikiText, OpenHermes 2.5, MMLU, HellaSwag, GSM8K, OpenOrca)
Hardware: 1x NVIDIA H200 141GB
Output: Standard dense transformer — loads with any HuggingFace-compatible framework

Compatibility

HuggingFace Transformers
vLLM
TGI (Text Generation Inference)
llama.cpp / GGUF conversion
AWQ / GPTQ quantization
Any framework that loads standard safetensors

Citation

@misc{dystrio_sculpt_2026,
  title={Dystrio Sculpt: Structural Compilation for Transformer LLMs},
  author={Dystrio},
  year={2026},
  url={https://huggingface.co/dystrio}
}

Downloads last month: 337

Safetensors

Model size

9B params

Tensor type

BF16

Model tree for dystrio/Qwen3.5-9B-Sculpt-Default

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B