blazerye commited on
Commit
64d5ceb
·
verified ·
1 Parent(s): 4845006

Upload 3 files

Browse files
Files changed (4) hide show
  1. .gitattributes +1 -0
  2. README.md +28 -3
  3. figures/logo.jpg +0 -0
  4. figures/overview.jpg +3 -0
.gitattributes CHANGED
@@ -40,3 +40,4 @@ Test_Dataset_Augmented_Prompt/Jurkat.json filter=lfs diff=lfs merge=lfs -text
40
  Test_Dataset_Augmented_Prompt/K562.json filter=lfs diff=lfs merge=lfs -text
41
  Test_Dataset_Augmented_Prompt/RPE1.json filter=lfs diff=lfs merge=lfs -text
42
  AROMA_Perturb_490k.json filter=lfs diff=lfs merge=lfs -text
 
 
40
  Test_Dataset_Augmented_Prompt/K562.json filter=lfs diff=lfs merge=lfs -text
41
  Test_Dataset_Augmented_Prompt/RPE1.json filter=lfs diff=lfs merge=lfs -text
42
  AROMA_Perturb_490k.json filter=lfs diff=lfs merge=lfs -text
43
+ figures/overview.jpg filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,3 +1,28 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <p align="center">
2
+ <img src="figures/logo.jpg" alt="AROMA Logo" width="120">
3
+ </p>
4
+
5
+ <h2 align="center"> 🧬 AROMA: Augmented Reasoning Over a Multimodal Architecture for Virtual Cell Genetic Perturbation Modeling </h2>
6
+
7
+ <p align="center">
8
+ 📃 <a href="https://openreview.net/pdf?id=gRbreJtjST" target="_blank">Paper</a> • 🐙 <a href="https://github.com/blazerye/AROMA" target="_blank">Code</a> • 🗂️ <a href="https://huggingface.co/datasets/blazerye/PerturbReason" target="_blank">Datasets</a><br>
9
+ </p>
10
+ </p>
11
+
12
+ > Please refer to our [repository](https://github.com/blazerye/AROMA) and [paper](https://openreview.net/pdf?id=gRbreJtjST) for more details.
13
+
14
+ ## 🌐 Overview
15
+
16
+ AROMA is a novel multimodal architecture for virtual cell modeling that integrates textual evidence, graph topology, and protein sequences to predict the effects of genetic perturbations.
17
+
18
+ <p align="center">
19
+ <img src="figures/overview.jpg" alt="Overview">
20
+ </p>
21
+
22
+ The overall AROMA pipeline is illustrated in the figure above and is divided into three stages:
23
+
24
+ - **Data stage.** AROMA constructs two complementary knowledge graphs and a large-scale virtual cell reasoning dataset for evidence grounding.
25
+
26
+ - **Modeling stage.** AROMA adopts a retrieval-augmented strategy to incorporate query-relevant information, thereby providing explicit evidence cues for prediction. In addition, it jointly leverages topological representations learned from graph neural networks (GNN) and protein sequence representations encoded by ESM-2, and applies a cross-attention module to explicitly model perturbation-target gene dependencies across modalities.
27
+
28
+ - **Training stage.** AROMA first performs multimodal supervised fine-tuning (SFT), and is then further optimized with Group Relative Policy Optimization (GRPO) reinforcement learning to enhance predictive performance while generating biologically meaningful explanations.
figures/logo.jpg ADDED
figures/overview.jpg ADDED

Git LFS Details

  • SHA256: 37573bab13c2aafbc7953f40dc1f58ccb7e6c552375e0634396cf0cd4d2e7010
  • Pointer size: 132 Bytes
  • Size of remote file: 1.19 MB