| --- |
| library_name: transformers |
| pipeline_tag: text-generation |
| base_model: Qwen/Qwen3-8B |
| --- |
| |
| <p align="center"> |
| <img src="figures/logo.jpg" alt="AROMA Logo" width="120"> |
| </p> |
|
|
| <h2 align="center"> 𧬠AROMA: Augmented Reasoning Over a Multimodal Architecture for Virtual Cell Genetic Perturbation Modeling<br>(ACL 2026 Findings)</h2> |
|
|
| <p align="center"> |
| π <a href="https://huggingface.co/papers/2604.20263" target="_blank">Paper</a> β’ π <a href="https://github.com/blazerye/AROMA" target="_blank">Code</a> β’ ποΈ <a href="https://huggingface.co/datasets/blazerye/PerturbReason" target="_blank">Datasets</a><br> |
| </p> |
| </p> |
|
|
| > Please refer to our [repository](https://github.com/blazerye/AROMA) and [paper](https://huggingface.co/papers/2604.20263) for more details. |
|
|
| ## π Overview |
|
|
| AROMA is a novel multimodal architecture for virtual cell modeling that integrates textual evidence, graph topology, and protein sequences to predict the effects of genetic perturbations. |
|
|
| <p align="center"> |
| <img src="figures/overview.jpg" alt="Overview"> |
| </p> |
|
|
| The overall AROMA pipeline is illustrated in the figure above and is divided into three stages: |
|
|
| - **Data stage.** AROMA constructs two complementary knowledge graphs and a large-scale virtual cell reasoning dataset for evidence grounding. |
|
|
| - **Modeling stage.** AROMA adopts a retrieval-augmented strategy to incorporate query-relevant information, thereby providing explicit evidence cues for prediction. In addition, it jointly leverages topological representations learned from graph neural networks (GNN) and protein sequence representations encoded by ESM-2, and applies a cross-attention module to explicitly model perturbation-target gene dependencies across modalities. |
|
|
| - **Training stage.** AROMA first performs multimodal supervised fine-tuning (SFT), and is then further optimized with Group Relative Policy Optimization (GRPO) reinforcement learning to enhance predictive performance while generating biologically meaningful explanations. |
|
|
| ## π Citation |
| If you find AROMA useful for your research and applications, please cite using this BibTeX: |
| ```bibtex |
| @inproceedings{wang2026aroma, |
| title="{AROMA}: Augmented Reasoning Over a Multimodal Architecture for Virtual Cell Genetic Perturbation Modeling", |
| author="Wang, Zhenyu and Ye, Geyan and Liu, Wei and Ng, Man Tat Alexander", |
| booktitle="Findings of the Association for Computational Linguistics: ACL 2026", |
| year="2026", |
| publisher="Association for Computational Linguistics" |
| } |
| ``` |