update readme
Browse files
README.md
CHANGED
|
@@ -13,20 +13,7 @@ A Pocket-Sized MLLM for Ultra-Efficient Image and Video Understanding on Your Ph
|
|
| 13 |
|
| 14 |
## MiniCPM-V 4.6 Thinking
|
| 15 |
|
| 16 |
-
**MiniCPM-V 4.6 Thinking** is the long chain-of-thought reasoning variant of [MiniCPM-V 4.6](https://huggingface.co/openbmb/MiniCPM-V-4.6). It generates an explicit reasoning trace before producing the final answer, substantially boosting performance on complex multimodal reasoning, math, and OCR-heavy tasks, while keeping the same edge-friendly architecture (SigLIP2-400M vision encoder + Qwen3.5-0.8B LLM) and the mixed 4x/16x visual token compression of MiniCPM-V 4.6.
|
| 17 |
-
|
| 18 |
-
- 🔥 **Leading Foundation Capability.**
|
| 19 |
-
MiniCPM-V 4.6 scores 13 on the Artificial Analysis Intelligence Index benchmark, outperforming Qwen3.5-0.8B's score of 10 with 19x fewer token cost, and Qwen3.5-0.8B-Thinking's score of 11 with 43x fewer token cost. It also surpasses the larger Ministral 3 3B (score of 11).
|
| 20 |
-
|
| 21 |
-
- 💪 **Strong Multimodal Capability.**
|
| 22 |
-
MiniCPM-V 4.6 outperforms Qwen3.5-0.8B on most vision-language understanding tasks, and reaches Qwen3.5 2B-level capability on many benchmarks including OpenCompass, RefCOCO, HallusionBench, MUIRBench, and OCRBench.
|
| 23 |
-
- 🚀 **Ultra-Efficient Architecture.**
|
| 24 |
-
Based on the latest technique in [LLaVA-UHD v4](https://github.com/THUMAI-Lab/LLaVA-UHD-v4), MiniCPM-V 4.6 reduces the visual encoding computation FLOPs by more than 50%. It enables MiniCPM-V 4.6 to achieve better efficiency to even smaller models, achieving ~1.5x token throughput compared to Qwen3.5-0.8B.
|
| 25 |
-
It also supports mixed 4x/16x visual token compression rate, allowing flexible switching between accuracy and speed.
|
| 26 |
-
- 📱 **Broad Mobile Platform Coverage.**
|
| 27 |
-
MiniCPM-V 4.6 can be deployed across all three mainstream mobile platforms — iOS, Android, and HarmonyOS. With every edge adaptation code open-sourced, developers can reproduce the on-device experience in [just a few steps](#deploy-minicpm-v-46-on-ios-android-and-harmonyos-platforms).
|
| 28 |
-
- 🛠️ **Developer Friendly.**
|
| 29 |
-
MiniCPM-V 4.6 is adapted to [inference frameworks](#use-minicpm-v-46-in-other-inference-and-training-frameworks) such as vLLM, SGLang, llama.cpp, Ollama, and supports [fine-tuning ecosystems](#use-minicpm-v-46-in-other-inference-and-training-frameworks) such as SWIFT and LLaMA-Factory. Developers can quickly customize models for new domains and tasks on consumer-grade GPUs. We provide multiple quantized variants across GGUF, BNB, AWQ, and GPTQ formats.
|
| 30 |
|
| 31 |
|
| 32 |
### Evaluation <!-- omit in toc -->
|
|
|
|
| 13 |
|
| 14 |
## MiniCPM-V 4.6 Thinking
|
| 15 |
|
| 16 |
+
**MiniCPM-V 4.6 Thinking** is the long chain-of-thought reasoning variant of [MiniCPM-V 4.6](https://huggingface.co/openbmb/MiniCPM-V-4.6). It generates an explicit reasoning trace before producing the final answer, substantially boosting performance on complex multimodal reasoning, math, and OCR-heavy tasks, while keeping the same edge-friendly architecture (SigLIP2-400M vision encoder + Qwen3.5-0.8B LLM) and the mixed 4x/16x visual token compression of MiniCPM-V 4.6.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
|
| 18 |
|
| 19 |
### Evaluation <!-- omit in toc -->
|