AaronHan commited on
Commit
6d57089
Β·
verified Β·
1 Parent(s): 150555d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +114 -3
README.md CHANGED
@@ -1,3 +1,114 @@
1
- ---
2
- license: bsd-3-clause
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: bsd-3-clause
3
+ ---
4
+ ---
5
+ license: bsd-3-clause
6
+ tags:
7
+ - multimodal
8
+ - emotion-recognition
9
+ - llama
10
+ - lora
11
+ - acm-mm-2025
12
+ ---
13
+
14
+ # MoSEAR: Benchmarking and Bridging Emotion Conflicts for Multimodal Emotion Reasoning
15
+
16
+ <div align="center">
17
+
18
+ [![Paper](https://img.shields.io/badge/arXiv-2508.01181-b31b1b.svg)](https://arxiv.org/abs/2508.01181)
19
+ [![Conference](https://img.shields.io/badge/ACM%20MM-2025%20Oral-blue)](https://2025.acmmm.org/)
20
+ [![GitHub](https://img.shields.io/badge/GitHub-Code-black?logo=github)](https://github.com/ZhiyuanHan-Aaron/MoSEAR)
21
+
22
+ </div>
23
+
24
+ ## πŸ“‹ Model Description
25
+
26
+ This repository contains the **MoSEAR.pth** model weights for **MoSEAR** (Modality-Specific Experts with Attention Reallocation), a framework designed to address emotion conflicts in multimodal emotion reasoning tasks.
27
+
28
+ **Key Features:**
29
+ - **MoSE (Modality-Specific Experts)**: Parameter-efficient LoRA-based training with modality-specific experts
30
+ - **AR (Attention Reallocation)**: Inference-time attention intervention mechanism
31
+ - **CA-MER Benchmark**: New benchmark for evaluating emotion conflict scenarios
32
+
33
+ ## 🎯 Model Information
34
+
35
+ - **Model Type**: Multimodal Emotion Reasoning Model
36
+ - **Base Architecture**: LLaMA with vision-language interface
37
+ - **Training Method**: LoRA (Low-Rank Adaptation) with modality-specific experts
38
+ - **Checkpoint**: Best model from training (epoch 29)
39
+ - **Task**: Multimodal emotion recognition with conflict handling
40
+
41
+ ## πŸ“Š Performance
42
+
43
+ This model achieves state-of-the-art performance on emotion conflict scenarios:
44
+ - Handles inconsistent emotional cues across audio, visual, and text modalities
45
+ - Effective attention reallocation during inference
46
+ - Robust performance on CA-MER benchmark
47
+
48
+ ## πŸš€ Usage
49
+
50
+ ### Loading the Model
51
+
52
+ ```python
53
+ import torch
54
+
55
+ # Load checkpoint
56
+ checkpoint = torch.load('MoSEAR.pth', map_location='cpu')
57
+
58
+ # The checkpoint contains:
59
+ # - model state dict
60
+ # - optimizer state (if included)
61
+ # - training metadata
62
+ ```
63
+
64
+ ### Full Pipeline
65
+
66
+ For complete usage with the MoSEAR framework, please refer to the [GitHub repository](https://github.com/ZhiyuanHan-Aaron/MoSEAR).
67
+
68
+ ```bash
69
+ # Clone the code repository
70
+ git clone https://github.com/ZhiyuanHan-Aaron/MoSEAR.git
71
+ cd MoSEAR
72
+
73
+ # Download this checkpoint
74
+ # Place it in the appropriate directory as per the repository instructions
75
+
76
+ # Run inference
77
+ bash scripts/inference.sh
78
+ ```
79
+
80
+ ## πŸ“ Model Files
81
+
82
+ - `MoSEAR.pth`: Main model checkpoint (best performing model)
83
+
84
+ ## πŸ“„ Citation
85
+
86
+ If you use this model in your research, please cite:
87
+
88
+ ```bibtex
89
+ @inproceedings{han2025mosear,
90
+ title={Benchmarking and Bridging Emotion Conflicts for Multimodal Emotion Reasoning},
91
+ author={Han, Zhiyuan and Li, Yifei and Chen, Yanyan and Liang, Xiaohan and Song, Mingming and Peng, Yongsheng and Yin, Guanghao and Ma, Huadong},
92
+ booktitle={Proceedings of the 33rd ACM International Conference on Multimedia},
93
+ year={2025}
94
+ }
95
+ ```
96
+
97
+ ## πŸ“§ Contact
98
+
99
+ **Zhiyuan Han**
100
+ - Email: aaronhan@mail.ustc.edu.cn
101
+ - GitHub: [@ZhiyuanHan-Aaron](https://github.com/ZhiyuanHan-Aaron)
102
+
103
+ ## πŸ™ Acknowledgements
104
+
105
+ This work builds upon:
106
+ - [Emotion-LLaMA](https://arxiv.org/abs/2406.11161)
107
+ - [MiniGPT-v2](https://arxiv.org/abs/2310.09478)
108
+ - [AffectGPT](https://arxiv.org/abs/2306.15401)
109
+
110
+ ## πŸ“œ License
111
+
112
+ This model is released under the BSD 3-Clause License. See the [LICENSE](https://github.com/ZhiyuanHan-Aaron/MoSEAR/blob/main/LICENSE.md) for details.
113
+
114
+ **Copyright Β© 2025 Zhiyuan Han**