Kun-Xiang
/

AtomThink-EMOVA-8B

Safetensors

llava_llama

Model card Files Files and versions

xet

Community

Kun Xiang commited on Dec 16, 2024

Commit

9128984

verified ·

1 Parent(s): c4e5991

Update README.md

Browse files

Files changed (1) hide show

README.md +54 -3

README.md CHANGED Viewed

@@ -1,3 +1,54 @@
----
-license: apache-2.0
----

+---
+task_categories:
+- text-generation
+size_categories:
+- 157K
+license: apache-2.0
+---
+# Model Card for AtomThink-EMOVA-8B
+The model is post-trained based on EMOVA-8B and the AtomThink framework, and can be used to solve complex multimodal mathematical problems.
+Comparison of accuracy with state-of-the-art methods on MathVista and MathVerse:
+| **Model**             | **Inference** | **General** | **Math** | **Total** | **TL**   | **TD**   | **VI**   | **VD**   | **VO**   | **Total** |
+|-----------------------|---------------|-------------|----------|-----------|----------|----------|----------|----------|----------|-----------|
+| Random Choice         | -             | -           | -        | 17.9      | 12.4     | 12.4     | 12.4     | 12.4     | 12.4     | 12.4      |
+| Human                 | -             | -           | -        | -         | 70.9     | 71.2     | 61.4     | 68.3     | 66.7     | 66.7      |
+| OpenAI o1             | Slow Think*   | -           | -        | 73.9      | -        | -        | -        | -        | -        | -         |
+| GPT-4o                | CoT           | -           | -        | 63.8      | -        | -        | -        | -        | -        | -         |
+| GPT-4V                | CoT           | -           | -        | 49.9      | 56.6     | 63.1     | 51.4     | 50.8     | 50.3     | 54.4      |
+| LLaVA-NeXT-34B        | Direct        | -           | -        | 46.5      | 25.5     | 33.8     | 23.5     | 20.3     | 15.7     | 23.8      |
+| InternLM-XComposer2   | Direct        | -           | -        | 57.6      | 17.0     | 22.3     | 15.7     | 16.4     | 11.0     | 16.5      |
+| Qwen-VL-Plus          | Direct        | -           | -        | 43.3      | 11.1     | 15.7     | 9.0      | 13.0     | 10.0     | 11.8      |
+| LLaVA-1.5-13B         | Direct        | -           | -        | 27.6      | 15.2     | 19.4     | 16.8     | 15.2     | 11.3     | 15.6      |
+| G-LLaVA-7B            | Direct        | -           | -        | 53.4      | 20.7     | 20.9     | 17.2     | 14.6     | 9.4      | 16.6      |
+| MAVIS-7B              | Direct        | -           | -        | -         | 29.1     | 41.4     | 27.4     | 24.9     | 14.6     | 27.5      |
+| LLaVA-Llama3-8B       | Direct        | 34.1        | 25.6     | 29.5      | 16.0     | 19.3     | 16.4     | 13.1     | 15.0     | 15.9      |
+| EMOVA-8B-200k         | Direct        | 52.4        | 51.1     | 51.7      | 34.4     | 39.0     | 33.4     | 30.1     | 23.5     | 32.1      |
+| EMOVA w/. Formatted   | CoT           | 30.9        | 31.3     | 31.1      | 26.5     | 36.5     | 25.3     | 20.4     | 19.8     | 25.7      |
+| AtomThink-EMOVA       | Direct        | 53.9        | 52.4     | 53.1      | 33.6     | 39.0     | 33.8     | 28.0     | 24.4     | 31.8      |
+| AtomThink-EMOVA       | Quick Think   | 48.7        | **54.4** | **51.8**  | **36.5** | **42.4** | **34.1** | **32.9** | **29.7** | **35.1**  |
+| AtomThink-EMOVA       | Slow Think    | 48.9        | **57.0** | **53.3**  | **42.1** | **51.5** | **39.0** | **36.7** | **33.1** | **40.5**  |
+# Citation
+If you use this dataset in your research, please cite:
+```text
+@article{xiang2024atomthink,
+  title={AtomThink: A Slow Thinking Framework for Multimodal Mathematical Reasoning},
+  author={Xiang, Kun and Liu, Zhili and Jiang, Zihao and Nie, Yunshuang and Huang, Runhui and Fan, Haoxiang and Li, Hanhui and Huang, Weiran and Zeng, Yihan and Han, Jianhua and others},
+  journal={arXiv preprint arXiv:2411.11930},
+  year={2024}
+}
+@article{chen2024emova,
+  title={Emova: Empowering language models to see, hear and speak with vivid emotions},
+  author={Chen, Kai and Gou, Yunhao and Huang, Runhui and Liu, Zhili and Tan, Daxin and Xu, Jing and Wang, Chunwei and Zhu, Yi and Zeng, Yihan and Yang, Kuo and others},
+  journal={arXiv preprint arXiv:2409.18042},
+  year={2024}
+}
+```
+# License
+The checkpoint is released under the Apache 2.0 license. Please ensure proper attribution when using this dataset.