MiniCPM-o 4.5 β€” Sculpt Aggressive (keep_frac=0.7)

30% compression β€” maximum compression

Structurally pruned from openbmb/MiniCPM-o-4_5 using Dystrio Sculpt. Only the Qwen3-8B LLM backbone is pruned β€” vision (SigLip2), audio (Whisper), and TTS (CosyVoice2) modules are untouched.

Quality (Downstream Probe β€” 250 questions)

Metric Baseline This Model Retention
Weighted Accuracy 0.6756 0.5241 77.6%
MMLU 0.6700 0.5500 82.1%
HellaSwag 0.7625 0.5250 68.9%
ARC-Challenge 0.6000 0.4714 78.6%

Compression Details

  • keep_frac: 0.7 (30% of MLP intermediate neurons removed)
  • Method: Structural pruning with live teacher distillation (alpha=0.5)
  • Repair: Full repair pass with workload-matched training data
  • Architecture: All multimodal modules preserved; only LLM MLP layers compressed

Intended Use

Drop-in replacement for MiniCPM-o 4.5 with reduced memory footprint. Suitable for:

  • LoRA fine-tuning on memory-constrained GPUs
  • File description and indexing workloads
  • Multimodal inference with lower VRAM requirements

How to Use

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "dystrio/MiniCPM-o-4_5-Sculpt-Aggressive",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "dystrio/MiniCPM-o-4_5-Sculpt-Aggressive",
    trust_remote_code=True,
)
Downloads last month
13
Safetensors
Model size
8B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for dystrio/MiniCPM-o-4_5-Sculpt-Aggressive

Finetuned
(4)
this model