Rakancorle11
/

MiniCPM-Evaluation

Model card Files Files and versions

MiniCPM-Evaluation / README.md

Rakancorle11's picture

Upload folder using huggingface_hub

b2c2640 verified 12 days ago

|

history blame contribute delete

2.48 kB

	# MiniCPM-o 4.5 Evaluation

	Evaluation scripts for `openbmb/MiniCPM-o-4_5` on the same 6 benchmarks as `CleverHans-Evaluation`:

	- Sync (DPO test set: synced / delay / early)
	- VGGSoundSync (3k freetext)
	- VideoMME (MCQ A/B/C/D)
	- LVBench (MCQ)
	- WorldSense (MCQ)
	- Daily-Omni (MCQ)

	## Why a separate folder

	MiniCPM-o 4.5 has a completely different architecture (SigLip2 + Whisper + Qwen3-8B, 9B params) and API (`model.chat(msgs=...)` style) vs Qwen3-Omni (`generate()` + `qwen_omni_utils`). Sharing code is impractical; data loading / metrics can still be reused from the other repo.

	## Setup

	```bash
	bash setup_env.sh # install MiniCPM-o dependencies in conda env 'minicpmo'
	```

	## Layout

	```
	MiniCPM-Evaluation/
	├── README.md
	├── setup_env.sh
	└── scripts/
	├── minicpmo_inference.py # common inference wrapper
	├── test_minicpmo.py # quick sanity check (single sample)
	├── eval_videomme.py # per-benchmark evaluators
	├── eval_lvbench.py
	├── eval_worldsense.py
	├── eval_daily_omni.py
	├── eval_vggsoundsync.py
	└── eval_dpo_sync.py
	```

	## Quick Start

	```bash
	conda activate minicpmo
	cd /home/ubuntu/MiniCPM-Evaluation

	# 1. Sanity check: single-sample inference
	python scripts/test_minicpmo.py

	# 2. Run a full benchmark (e.g. Daily-Omni)
	python scripts/eval_daily_omni.py \
	--data-dir /opt/dlami/nvme/daily_omni \
	--output-dir /home/ubuntu/eval_results/daily_omni \
	--label do_minicpmo_45
	```

	## Publish to Hugging Face (model repo)

	This tree is evaluation code only (no model weights). You can still host it
	under a Hugging Face model repo as a snapshot (e.g. next to weight releases).

	```bash
	pip install huggingface_hub
	export HF_TOKEN=hf_... # or: huggingface-cli login
	cd MiniCPM-Evaluation
	python scripts/upload_to_hf_model.py --repo-id YourUsername/MiniCPM-Evaluation
	```

	Private repo:

	```bash
	python scripts/upload_to_hf_model.py --repo-id YourUsername/MiniCPM-Evaluation --private
	```

	## Data paths (reused from CleverHans-Evaluation)

	\| Benchmark \| Path \|
	\|---\|---\|
	\| Sync videos \| `/opt/dlami/nvme/video_source/{original,random_shift_video,extracted_audio}` \|
	\| VGGSoundSync \| `/opt/dlami/nvme/vggsoundsync_test/` \|
	\| VideoMME \| `/opt/dlami/nvme/videomme/data/data/` \|
	\| LVBench \| `/opt/dlami/nvme/lvbench/` \|
	\| WorldSense \| `/opt/dlami/nvme/worldsense/` \|
	\| Daily-Omni \| `/opt/dlami/nvme/daily_omni/` \|