File size: 2,535 Bytes
5b51a1f
 
 
 
 
 
64d5ceb
 
 
 
15b5ce8
64d5ceb
 
5b51a1f
64d5ceb
 
 
5b51a1f
64d5ceb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5b51a1f
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
---
library_name: transformers
pipeline_tag: text-generation
base_model: Qwen/Qwen3-8B
---

<p align="center">
  <img src="figures/logo.jpg" alt="AROMA Logo" width="120">
</p>

<h2 align="center"> 🧬 AROMA: Augmented Reasoning Over a Multimodal Architecture for Virtual Cell Genetic Perturbation Modeling<br>(ACL 2026 Findings)</h2>

<p align="center">
  πŸ“ƒ <a href="https://huggingface.co/papers/2604.20263" target="_blank">Paper</a> β€’ πŸ™ <a href="https://github.com/blazerye/AROMA" target="_blank">Code</a> β€’ πŸ—‚οΈ <a href="https://huggingface.co/datasets/blazerye/PerturbReason" target="_blank">Datasets</a><br>
</p>
</p>

> Please refer to our [repository](https://github.com/blazerye/AROMA) and [paper](https://huggingface.co/papers/2604.20263) for more details.

## 🌐 Overview

AROMA is a novel multimodal architecture for virtual cell modeling that integrates textual evidence, graph topology, and protein sequences to predict the effects of genetic perturbations.

<p align="center">
  <img src="figures/overview.jpg" alt="Overview">
</p>

The overall AROMA pipeline is illustrated in the figure above and is divided into three stages:

- **Data stage.** AROMA constructs two complementary knowledge graphs and a large-scale virtual cell reasoning dataset for evidence grounding.  

- **Modeling stage.** AROMA adopts a retrieval-augmented strategy to incorporate query-relevant information, thereby providing explicit evidence cues for prediction. In addition, it jointly leverages topological representations learned from graph neural networks (GNN) and protein sequence representations encoded by ESM-2, and applies a cross-attention module to explicitly model perturbation-target gene dependencies across modalities.  

- **Training stage.** AROMA first performs multimodal supervised fine-tuning (SFT), and is then further optimized with Group Relative Policy Optimization (GRPO) reinforcement learning to enhance predictive performance while generating biologically meaningful explanations.

## πŸ“Œ Citation
If you find AROMA useful for your research and applications, please cite using this BibTeX:
```bibtex
@inproceedings{wang2026aroma,
    title="{AROMA}: Augmented Reasoning Over a Multimodal Architecture for Virtual Cell Genetic Perturbation Modeling",
    author="Wang, Zhenyu and Ye, Geyan and Liu, Wei and Ng, Man Tat Alexander",
    booktitle="Findings of the Association for Computational Linguistics: ACL 2026",
    year="2026",
    publisher="Association for Computational Linguistics"
}
```