MiniCPM-o 4.5 β Sculpt Production (keep_frac=0.9)
10% compression β best quality/size tradeoff
Structurally pruned from openbmb/MiniCPM-o-4_5 using Dystrio Sculpt. Only the Qwen3-8B LLM backbone is pruned β vision (SigLip2), audio (Whisper), and TTS (CosyVoice2) modules are untouched.
Quality (Downstream Probe β 250 questions)
| Metric | Baseline | This Model | Retention |
|---|---|---|---|
| Weighted Accuracy | 0.6756 | 0.6374 | 94.3% |
| MMLU | 0.6700 | 0.6400 | 95.5% |
| HellaSwag | 0.7625 | 0.7125 | 93.4% |
| ARC-Challenge | 0.6000 | 0.5571 | 92.9% |
Compression Details
- keep_frac: 0.9 (10% of MLP intermediate neurons removed)
- Method: Structural pruning with live teacher distillation (alpha=0.5)
- Repair: Full repair pass with workload-matched training data
- Architecture: All multimodal modules preserved; only LLM MLP layers compressed
Intended Use
Drop-in replacement for MiniCPM-o 4.5 with reduced memory footprint. Suitable for:
- LoRA fine-tuning on memory-constrained GPUs
- File description and indexing workloads
- Multimodal inference with lower VRAM requirements
How to Use
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"dystrio/MiniCPM-o-4_5-Sculpt-Production",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
"dystrio/MiniCPM-o-4_5-Sculpt-Production",
trust_remote_code=True,
)
- Downloads last month
- 19
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support
Model tree for dystrio/MiniCPM-o-4_5-Sculpt-Production
Base model
openbmb/MiniCPM-o-4_5