MiniCPM-o 4.5 — Sculpt Aggressive (keep_frac=0.7)

30% compression — maximum compression

Structurally pruned from openbmb/MiniCPM-o-4_5 using Dystrio Sculpt. Only the Qwen3-8B LLM backbone is pruned — vision (SigLip2), audio (Whisper), and TTS (CosyVoice2) modules are untouched.

Quality (Downstream Probe — 250 questions)

Metric	Baseline	This Model	Retention
Weighted Accuracy	0.6756	0.5241	77.6%
MMLU	0.6700	0.5500	82.1%
HellaSwag	0.7625	0.5250	68.9%
ARC-Challenge	0.6000	0.4714	78.6%

Compression Details

keep_frac: 0.7 (30% of MLP intermediate neurons removed)
Method: Structural pruning with live teacher distillation (alpha=0.5)
Repair: Full repair pass with workload-matched training data
Architecture: All multimodal modules preserved; only LLM MLP layers compressed

Intended Use

Drop-in replacement for MiniCPM-o 4.5 with reduced memory footprint. Suitable for:

LoRA fine-tuning on memory-constrained GPUs
File description and indexing workloads
Multimodal inference with lower VRAM requirements

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "dystrio/MiniCPM-o-4_5-Sculpt-Aggressive",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "dystrio/MiniCPM-o-4_5-Sculpt-Aggressive",
    trust_remote_code=True,
)

Downloads last month: 13

Safetensors

Model size

8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dystrio/MiniCPM-o-4_5-Sculpt-Aggressive

Base model

openbmb/MiniCPM-o-4_5

Finetuned

(4)

this model