File size: 1,911 Bytes
64d5ceb 15b5ce8 64d5ceb 19ea7d7 64d5ceb 19ea7d7 64d5ceb | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | <p align="center">
<img src="figures/logo.jpg" alt="AROMA Logo" width="120">
</p>
<h2 align="center"> 🧬 AROMA: Augmented Reasoning Over a Multimodal Architecture for Virtual Cell Genetic Perturbation Modeling<br>(ACL 2026 Findings)</h2>
<p align="center">
📃 <a href="https://arxiv.org/pdf/2604.20263" target="_blank">Paper</a> • 🐙 <a href="https://github.com/blazerye/AROMA" target="_blank">Code</a> • 🗂️ <a href="https://huggingface.co/datasets/blazerye/PerturbReason" target="_blank">Datasets</a><br>
</p>
</p>
> Please refer to our [repository](https://github.com/blazerye/AROMA) and [paper](https://arxiv.org/pdf/2604.20263) for more details.
## 🌐 Overview
AROMA is a novel multimodal architecture for virtual cell modeling that integrates textual evidence, graph topology, and protein sequences to predict the effects of genetic perturbations.
<p align="center">
<img src="figures/overview.jpg" alt="Overview">
</p>
The overall AROMA pipeline is illustrated in the figure above and is divided into three stages:
- **Data stage.** AROMA constructs two complementary knowledge graphs and a large-scale virtual cell reasoning dataset for evidence grounding.
- **Modeling stage.** AROMA adopts a retrieval-augmented strategy to incorporate query-relevant information, thereby providing explicit evidence cues for prediction. In addition, it jointly leverages topological representations learned from graph neural networks (GNN) and protein sequence representations encoded by ESM-2, and applies a cross-attention module to explicitly model perturbation-target gene dependencies across modalities.
- **Training stage.** AROMA first performs multimodal supervised fine-tuning (SFT), and is then further optimized with Group Relative Policy Optimization (GRPO) reinforcement learning to enhance predictive performance while generating biologically meaningful explanations. |