Building a causality-aware single-cell RNA-seq foundation model via context-specific causal regulation modeling
scCAFM is a causality-aware foundation model designed for large-scale single-cell transcriptomic analysis. Unlike existing single-cell foundation models that mainly learn associative gene relationships or operate only at the dataset‐ or cell-type level, scCAFM enables cell-specific causal inference at atlas scale while simultaneously learning transferable gene and cell embeddings enriched with causal semantics. By jointly modeling gene regulatory structure and context-dependent embeddings, scCAFM provides a powerful foundation for studying heterogeneous cellular states, developmental trajectories, disease progression, and perturbation responses.
Key features
Structure foundation module (SFM)
- Efficient, context-aware causal GRN inference in a latent factor space.
- Uses a Mixture-of-Experts (MoE) architecture so different latent experts capture distinct regulatory contexts; this enables per-cell GRN specialization without learning a full causal model per cell.
- Outputs: per-cell directed edges with causal confidence, context assignment, and compact latent summaries.
Embedding foundation module (EFM)
- Learns gene and cell embeddings guided by the SFM-inferred causal structure (e.g., contrastive/cause-aware objectives).
- Embeddings are transferable: they improve downstream supervised and unsupervised tasks (drug sensitivity, perturbation response prediction, trajectory/lineage inference).
Model assets
Model files are stored under models/:
models/sfm_config.jsonmodels/sfm_model.safetensorsmodels/cond_dict.jsonmodels/vocab.jsonmodels/vocab.safetensors
Supporting CSV resources such as human_tfs.csv, mouse_tfs.csv, OmniPath.csv, and homologous.csv stay at the repository root.
The source code, training pipeline, and full documentation are maintained in the GitHub repository: