AaronHan
/

MoSEAR

Model card Files Files and versions

xet

Community

AaronHan commited on Oct 11, 2025

Commit

6d57089

verified ·

1 Parent(s): 150555d

Update README.md

Browse files

Files changed (1) hide show

README.md +114 -3

README.md CHANGED Viewed

@@ -1,3 +1,114 @@
----
-license: bsd-3-clause
----

+---
+license: bsd-3-clause
+---
+---
+license: bsd-3-clause
+tags:
+- multimodal
+- emotion-recognition
+- llama
+- lora
+- acm-mm-2025
+---
+# MoSEAR: Benchmarking and Bridging Emotion Conflicts for Multimodal Emotion Reasoning
+<div align="center">
+[![Paper](https://img.shields.io/badge/arXiv-2508.01181-b31b1b.svg)](https://arxiv.org/abs/2508.01181)
+[![Conference](https://img.shields.io/badge/ACM%20MM-2025%20Oral-blue)](https://2025.acmmm.org/)
+[![GitHub](https://img.shields.io/badge/GitHub-Code-black?logo=github)](https://github.com/ZhiyuanHan-Aaron/MoSEAR)
+</div>
+## 📋 Model Description
+This repository contains the **MoSEAR.pth** model weights for **MoSEAR** (Modality-Specific Experts with Attention Reallocation), a framework designed to address emotion conflicts in multimodal emotion reasoning tasks.
+**Key Features:**
+- **MoSE (Modality-Specific Experts)**: Parameter-efficient LoRA-based training with modality-specific experts
+- **AR (Attention Reallocation)**: Inference-time attention intervention mechanism
+- **CA-MER Benchmark**: New benchmark for evaluating emotion conflict scenarios
+## 🎯 Model Information
+- **Model Type**: Multimodal Emotion Reasoning Model
+- **Base Architecture**: LLaMA with vision-language interface
+- **Training Method**: LoRA (Low-Rank Adaptation) with modality-specific experts
+- **Checkpoint**: Best model from training (epoch 29)
+- **Task**: Multimodal emotion recognition with conflict handling
+## 📊 Performance
+This model achieves state-of-the-art performance on emotion conflict scenarios:
+- Handles inconsistent emotional cues across audio, visual, and text modalities
+- Effective attention reallocation during inference
+- Robust performance on CA-MER benchmark
+## 🚀 Usage
+### Loading the Model
+```python
+import torch
+# Load checkpoint
+checkpoint = torch.load('MoSEAR.pth', map_location='cpu')
+# The checkpoint contains:
+# - model state dict
+# - optimizer state (if included)
+# - training metadata
+```
+### Full Pipeline
+For complete usage with the MoSEAR framework, please refer to the [GitHub repository](https://github.com/ZhiyuanHan-Aaron/MoSEAR).
+```bash
+# Clone the code repository
+git clone https://github.com/ZhiyuanHan-Aaron/MoSEAR.git
+cd MoSEAR
+# Download this checkpoint
+# Place it in the appropriate directory as per the repository instructions
+# Run inference
+bash scripts/inference.sh
+```
+## 📁 Model Files
+- `MoSEAR.pth`: Main model checkpoint (best performing model)
+## 📄 Citation
+If you use this model in your research, please cite:
+```bibtex
+@inproceedings{han2025mosear,
+  title={Benchmarking and Bridging Emotion Conflicts for Multimodal Emotion Reasoning},
+  author={Han, Zhiyuan and Li, Yifei and Chen, Yanyan and Liang, Xiaohan and Song, Mingming and Peng, Yongsheng and Yin, Guanghao and Ma, Huadong},
+  booktitle={Proceedings of the 33rd ACM International Conference on Multimedia},
+  year={2025}
+}
+```
+## 📧 Contact
+**Zhiyuan Han**
+- Email: aaronhan@mail.ustc.edu.cn
+- GitHub: [@ZhiyuanHan-Aaron](https://github.com/ZhiyuanHan-Aaron)
+## 🙏 Acknowledgements
+This work builds upon:
+- [Emotion-LLaMA](https://arxiv.org/abs/2406.11161)
+- [MiniGPT-v2](https://arxiv.org/abs/2310.09478)
+- [AffectGPT](https://arxiv.org/abs/2306.15401)
+## 📜 License
+This model is released under the BSD 3-Clause License. See the [LICENSE](https://github.com/ZhiyuanHan-Aaron/MoSEAR/blob/main/LICENSE.md) for details.
+**Copyright © 2025 Zhiyuan Han**