# MiniCPM-o 4.5 Evaluation Evaluation scripts for `openbmb/MiniCPM-o-4_5` on the same 6 benchmarks as `CleverHans-Evaluation`: - Sync (DPO test set: synced / delay / early) - VGGSoundSync (3k freetext) - VideoMME (MCQ A/B/C/D) - LVBench (MCQ) - WorldSense (MCQ) - Daily-Omni (MCQ) ## Why a separate folder MiniCPM-o 4.5 has a completely different architecture (SigLip2 + Whisper + Qwen3-8B, 9B params) and API (`model.chat(msgs=...)` style) vs Qwen3-Omni (`generate()` + `qwen_omni_utils`). Sharing code is impractical; data loading / metrics can still be reused from the other repo. ## Setup ```bash bash setup_env.sh # install MiniCPM-o dependencies in conda env 'minicpmo' ``` ## Layout ``` MiniCPM-Evaluation/ ├── README.md ├── setup_env.sh └── scripts/ ├── minicpmo_inference.py # common inference wrapper ├── test_minicpmo.py # quick sanity check (single sample) ├── eval_videomme.py # per-benchmark evaluators ├── eval_lvbench.py ├── eval_worldsense.py ├── eval_daily_omni.py ├── eval_vggsoundsync.py └── eval_dpo_sync.py ``` ## Quick Start ```bash conda activate minicpmo cd /home/ubuntu/MiniCPM-Evaluation # 1. Sanity check: single-sample inference python scripts/test_minicpmo.py # 2. Run a full benchmark (e.g. Daily-Omni) python scripts/eval_daily_omni.py \ --data-dir /opt/dlami/nvme/daily_omni \ --output-dir /home/ubuntu/eval_results/daily_omni \ --label do_minicpmo_45 ``` ## Publish to Hugging Face (model repo) This tree is **evaluation code only** (no model weights). You can still host it under a Hugging Face **model** repo as a snapshot (e.g. next to weight releases). ```bash pip install huggingface_hub export HF_TOKEN=hf_... # or: huggingface-cli login cd MiniCPM-Evaluation python scripts/upload_to_hf_model.py --repo-id YourUsername/MiniCPM-Evaluation ``` Private repo: ```bash python scripts/upload_to_hf_model.py --repo-id YourUsername/MiniCPM-Evaluation --private ``` ## Data paths (reused from CleverHans-Evaluation) | Benchmark | Path | |---|---| | Sync videos | `/opt/dlami/nvme/video_source/{original,random_shift_video,extracted_audio}` | | VGGSoundSync | `/opt/dlami/nvme/vggsoundsync_test/` | | VideoMME | `/opt/dlami/nvme/videomme/data/data/` | | LVBench | `/opt/dlami/nvme/lvbench/` | | WorldSense | `/opt/dlami/nvme/worldsense/` | | Daily-Omni | `/opt/dlami/nvme/daily_omni/` |