File size: 2,480 Bytes
b2c2640
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
# MiniCPM-o 4.5 Evaluation

Evaluation scripts for `openbmb/MiniCPM-o-4_5` on the same 6 benchmarks as `CleverHans-Evaluation`:

- Sync (DPO test set: synced / delay / early)
- VGGSoundSync (3k freetext)
- VideoMME (MCQ A/B/C/D)
- LVBench (MCQ)
- WorldSense (MCQ)
- Daily-Omni (MCQ)

## Why a separate folder

MiniCPM-o 4.5 has a completely different architecture (SigLip2 + Whisper + Qwen3-8B, 9B params) and API (`model.chat(msgs=...)` style) vs Qwen3-Omni (`generate()` + `qwen_omni_utils`). Sharing code is impractical; data loading / metrics can still be reused from the other repo.

## Setup

```bash
bash setup_env.sh       # install MiniCPM-o dependencies in conda env 'minicpmo'
```

## Layout

```
MiniCPM-Evaluation/
β”œβ”€β”€ README.md
β”œβ”€β”€ setup_env.sh
└── scripts/
    β”œβ”€β”€ minicpmo_inference.py        # common inference wrapper
    β”œβ”€β”€ test_minicpmo.py             # quick sanity check (single sample)
    β”œβ”€β”€ eval_videomme.py             # per-benchmark evaluators
    β”œβ”€β”€ eval_lvbench.py
    β”œβ”€β”€ eval_worldsense.py
    β”œβ”€β”€ eval_daily_omni.py
    β”œβ”€β”€ eval_vggsoundsync.py
    └── eval_dpo_sync.py
```

## Quick Start

```bash
conda activate minicpmo
cd /home/ubuntu/MiniCPM-Evaluation

# 1. Sanity check: single-sample inference
python scripts/test_minicpmo.py

# 2. Run a full benchmark (e.g. Daily-Omni)
python scripts/eval_daily_omni.py \
  --data-dir /opt/dlami/nvme/daily_omni \
  --output-dir /home/ubuntu/eval_results/daily_omni \
  --label do_minicpmo_45
```

## Publish to Hugging Face (model repo)

This tree is **evaluation code only** (no model weights). You can still host it
under a Hugging Face **model** repo as a snapshot (e.g. next to weight releases).

```bash
pip install huggingface_hub
export HF_TOKEN=hf_...   # or: huggingface-cli login
cd MiniCPM-Evaluation
python scripts/upload_to_hf_model.py --repo-id YourUsername/MiniCPM-Evaluation
```

Private repo:

```bash
python scripts/upload_to_hf_model.py --repo-id YourUsername/MiniCPM-Evaluation --private
```

## Data paths (reused from CleverHans-Evaluation)

| Benchmark | Path |
|---|---|
| Sync videos | `/opt/dlami/nvme/video_source/{original,random_shift_video,extracted_audio}` |
| VGGSoundSync | `/opt/dlami/nvme/vggsoundsync_test/` |
| VideoMME | `/opt/dlami/nvme/videomme/data/data/` |
| LVBench | `/opt/dlami/nvme/lvbench/` |
| WorldSense | `/opt/dlami/nvme/worldsense/` |
| Daily-Omni | `/opt/dlami/nvme/daily_omni/` |