| # MiniCPM-o 4.5 Evaluation |
|
|
| Evaluation scripts for `openbmb/MiniCPM-o-4_5` on the same 6 benchmarks as `CleverHans-Evaluation`: |
|
|
| - Sync (DPO test set: synced / delay / early) |
| - VGGSoundSync (3k freetext) |
| - VideoMME (MCQ A/B/C/D) |
| - LVBench (MCQ) |
| - WorldSense (MCQ) |
| - Daily-Omni (MCQ) |
|
|
| ## Why a separate folder |
|
|
| MiniCPM-o 4.5 has a completely different architecture (SigLip2 + Whisper + Qwen3-8B, 9B params) and API (`model.chat(msgs=...)` style) vs Qwen3-Omni (`generate()` + `qwen_omni_utils`). Sharing code is impractical; data loading / metrics can still be reused from the other repo. |
|
|
| ## Setup |
|
|
| ```bash |
| bash setup_env.sh # install MiniCPM-o dependencies in conda env 'minicpmo' |
| ``` |
|
|
| ## Layout |
|
|
| ``` |
| MiniCPM-Evaluation/ |
| βββ README.md |
| βββ setup_env.sh |
| βββ scripts/ |
| βββ minicpmo_inference.py # common inference wrapper |
| βββ test_minicpmo.py # quick sanity check (single sample) |
| βββ eval_videomme.py # per-benchmark evaluators |
| βββ eval_lvbench.py |
| βββ eval_worldsense.py |
| βββ eval_daily_omni.py |
| βββ eval_vggsoundsync.py |
| βββ eval_dpo_sync.py |
| ``` |
|
|
| ## Quick Start |
|
|
| ```bash |
| conda activate minicpmo |
| cd /home/ubuntu/MiniCPM-Evaluation |
| |
| # 1. Sanity check: single-sample inference |
| python scripts/test_minicpmo.py |
| |
| # 2. Run a full benchmark (e.g. Daily-Omni) |
| python scripts/eval_daily_omni.py \ |
| --data-dir /opt/dlami/nvme/daily_omni \ |
| --output-dir /home/ubuntu/eval_results/daily_omni \ |
| --label do_minicpmo_45 |
| ``` |
|
|
| ## Publish to Hugging Face (model repo) |
|
|
| This tree is **evaluation code only** (no model weights). You can still host it |
| under a Hugging Face **model** repo as a snapshot (e.g. next to weight releases). |
|
|
| ```bash |
| pip install huggingface_hub |
| export HF_TOKEN=hf_... # or: huggingface-cli login |
| cd MiniCPM-Evaluation |
| python scripts/upload_to_hf_model.py --repo-id YourUsername/MiniCPM-Evaluation |
| ``` |
|
|
| Private repo: |
|
|
| ```bash |
| python scripts/upload_to_hf_model.py --repo-id YourUsername/MiniCPM-Evaluation --private |
| ``` |
|
|
| ## Data paths (reused from CleverHans-Evaluation) |
|
|
| | Benchmark | Path | |
| |---|---| |
| | Sync videos | `/opt/dlami/nvme/video_source/{original,random_shift_video,extracted_audio}` | |
| | VGGSoundSync | `/opt/dlami/nvme/vggsoundsync_test/` | |
| | VideoMME | `/opt/dlami/nvme/videomme/data/data/` | |
| | LVBench | `/opt/dlami/nvme/lvbench/` | |
| | WorldSense | `/opt/dlami/nvme/worldsense/` | |
| | Daily-Omni | `/opt/dlami/nvme/daily_omni/` | |
|
|