MiniCPM-o 4.5 Evaluation
Evaluation scripts for openbmb/MiniCPM-o-4_5 on the same 6 benchmarks as CleverHans-Evaluation:
- Sync (DPO test set: synced / delay / early)
- VGGSoundSync (3k freetext)
- VideoMME (MCQ A/B/C/D)
- LVBench (MCQ)
- WorldSense (MCQ)
- Daily-Omni (MCQ)
Why a separate folder
MiniCPM-o 4.5 has a completely different architecture (SigLip2 + Whisper + Qwen3-8B, 9B params) and API (model.chat(msgs=...) style) vs Qwen3-Omni (generate() + qwen_omni_utils). Sharing code is impractical; data loading / metrics can still be reused from the other repo.
Setup
bash setup_env.sh # install MiniCPM-o dependencies in conda env 'minicpmo'
Layout
MiniCPM-Evaluation/
βββ README.md
βββ setup_env.sh
βββ scripts/
βββ minicpmo_inference.py # common inference wrapper
βββ test_minicpmo.py # quick sanity check (single sample)
βββ eval_videomme.py # per-benchmark evaluators
βββ eval_lvbench.py
βββ eval_worldsense.py
βββ eval_daily_omni.py
βββ eval_vggsoundsync.py
βββ eval_dpo_sync.py
Quick Start
conda activate minicpmo
cd /home/ubuntu/MiniCPM-Evaluation
# 1. Sanity check: single-sample inference
python scripts/test_minicpmo.py
# 2. Run a full benchmark (e.g. Daily-Omni)
python scripts/eval_daily_omni.py \
--data-dir /opt/dlami/nvme/daily_omni \
--output-dir /home/ubuntu/eval_results/daily_omni \
--label do_minicpmo_45
Publish to Hugging Face (model repo)
This tree is evaluation code only (no model weights). You can still host it under a Hugging Face model repo as a snapshot (e.g. next to weight releases).
pip install huggingface_hub
export HF_TOKEN=hf_... # or: huggingface-cli login
cd MiniCPM-Evaluation
python scripts/upload_to_hf_model.py --repo-id YourUsername/MiniCPM-Evaluation
Private repo:
python scripts/upload_to_hf_model.py --repo-id YourUsername/MiniCPM-Evaluation --private
Data paths (reused from CleverHans-Evaluation)
| Benchmark | Path |
|---|---|
| Sync videos | /opt/dlami/nvme/video_source/{original,random_shift_video,extracted_audio} |
| VGGSoundSync | /opt/dlami/nvme/vggsoundsync_test/ |
| VideoMME | /opt/dlami/nvme/videomme/data/data/ |
| LVBench | /opt/dlami/nvme/lvbench/ |
| WorldSense | /opt/dlami/nvme/worldsense/ |
| Daily-Omni | /opt/dlami/nvme/daily_omni/ |