MiniCPM-o 4.5 β Sculpt Aggressive (keep_frac=0.7)
30% compression β maximum compression
Structurally pruned from openbmb/MiniCPM-o-4_5 using Dystrio Sculpt. Only the Qwen3-8B LLM backbone is pruned β vision (SigLip2), audio (Whisper), and TTS (CosyVoice2) modules are untouched.
Quality (Downstream Probe β 250 questions)
| Metric | Baseline | This Model | Retention |
|---|---|---|---|
| Weighted Accuracy | 0.6756 | 0.5241 | 77.6% |
| MMLU | 0.6700 | 0.5500 | 82.1% |
| HellaSwag | 0.7625 | 0.5250 | 68.9% |
| ARC-Challenge | 0.6000 | 0.4714 | 78.6% |
Compression Details
- keep_frac: 0.7 (30% of MLP intermediate neurons removed)
- Method: Structural pruning with live teacher distillation (alpha=0.5)
- Repair: Full repair pass with workload-matched training data
- Architecture: All multimodal modules preserved; only LLM MLP layers compressed
Intended Use
Drop-in replacement for MiniCPM-o 4.5 with reduced memory footprint. Suitable for:
- LoRA fine-tuning on memory-constrained GPUs
- File description and indexing workloads
- Multimodal inference with lower VRAM requirements
How to Use
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"dystrio/MiniCPM-o-4_5-Sculpt-Aggressive",
torch_dtype=torch.bfloat16,
trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
"dystrio/MiniCPM-o-4_5-Sculpt-Aggressive",
trust_remote_code=True,
)
- Downloads last month
- 13
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support
Model tree for dystrio/MiniCPM-o-4_5-Sculpt-Aggressive
Base model
openbmb/MiniCPM-o-4_5