MiniCPM-o 4.5 Evaluation

Evaluation scripts for openbmb/MiniCPM-o-4_5 on the same 6 benchmarks as CleverHans-Evaluation:

Sync (DPO test set: synced / delay / early)
VGGSoundSync (3k freetext)
VideoMME (MCQ A/B/C/D)
LVBench (MCQ)
WorldSense (MCQ)
Daily-Omni (MCQ)

Why a separate folder

MiniCPM-o 4.5 has a completely different architecture (SigLip2 + Whisper + Qwen3-8B, 9B params) and API (model.chat(msgs=...) style) vs Qwen3-Omni (generate() + qwen_omni_utils). Sharing code is impractical; data loading / metrics can still be reused from the other repo.

Setup

bash setup_env.sh       # install MiniCPM-o dependencies in conda env 'minicpmo'

Layout

MiniCPM-Evaluation/
├── README.md
├── setup_env.sh
└── scripts/
    ├── minicpmo_inference.py        # common inference wrapper
    ├── test_minicpmo.py             # quick sanity check (single sample)
    ├── eval_videomme.py             # per-benchmark evaluators
    ├── eval_lvbench.py
    ├── eval_worldsense.py
    ├── eval_daily_omni.py
    ├── eval_vggsoundsync.py
    └── eval_dpo_sync.py

Quick Start

conda activate minicpmo
cd /home/ubuntu/MiniCPM-Evaluation

# 1. Sanity check: single-sample inference
python scripts/test_minicpmo.py

# 2. Run a full benchmark (e.g. Daily-Omni)
python scripts/eval_daily_omni.py \
  --data-dir /opt/dlami/nvme/daily_omni \
  --output-dir /home/ubuntu/eval_results/daily_omni \
  --label do_minicpmo_45

Publish to Hugging Face (model repo)

This tree is evaluation code only (no model weights). You can still host it under a Hugging Face model repo as a snapshot (e.g. next to weight releases).

pip install huggingface_hub
export HF_TOKEN=hf_...   # or: huggingface-cli login
cd MiniCPM-Evaluation
python scripts/upload_to_hf_model.py --repo-id YourUsername/MiniCPM-Evaluation

Private repo:

python scripts/upload_to_hf_model.py --repo-id YourUsername/MiniCPM-Evaluation --private

Data paths (reused from CleverHans-Evaluation)

Benchmark	Path
Sync videos	`/opt/dlami/nvme/video_source/{original,random_shift_video,extracted_audio}`
VGGSoundSync	`/opt/dlami/nvme/vggsoundsync_test/`
VideoMME	`/opt/dlami/nvme/videomme/data/data/`
LVBench	`/opt/dlami/nvme/lvbench/`
WorldSense	`/opt/dlami/nvme/worldsense/`
Daily-Omni	`/opt/dlami/nvme/daily_omni/`