MiniCPM-o 4.5 — Sculpt Production (keep_frac=0.9)

10% compression — best quality/size tradeoff

Structurally pruned from openbmb/MiniCPM-o-4_5 using Dystrio Sculpt. Only the Qwen3-8B LLM backbone is pruned — vision (SigLip2), audio (Whisper), and TTS (CosyVoice2) modules are untouched.

Quality (Downstream Probe — 250 questions)

Metric	Baseline	This Model	Retention
Weighted Accuracy	0.6756	0.6374	94.3%
MMLU	0.6700	0.6400	95.5%
HellaSwag	0.7625	0.7125	93.4%
ARC-Challenge	0.6000	0.5571	92.9%

Compression Details

keep_frac: 0.9 (10% of MLP intermediate neurons removed)
Method: Structural pruning with live teacher distillation (alpha=0.5)
Repair: Full repair pass with workload-matched training data
Architecture: All multimodal modules preserved; only LLM MLP layers compressed

Intended Use

Drop-in replacement for MiniCPM-o 4.5 with reduced memory footprint. Suitable for:

LoRA fine-tuning on memory-constrained GPUs
File description and indexing workloads
Multimodal inference with lower VRAM requirements

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "dystrio/MiniCPM-o-4_5-Sculpt-Production",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "dystrio/MiniCPM-o-4_5-Sculpt-Production",
    trust_remote_code=True,
)

Downloads last month: 19

Safetensors

Model size

9B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dystrio/MiniCPM-o-4_5-Sculpt-Production

Base model

openbmb/MiniCPM-o-4_5

Finetuned

(4)

this model