scCAFM / README.md
kaichenxu's picture
Add files using upload-large-folder tool
48ed838 verified
metadata
license: gpl-3.0
tags:
  - single-cell
  - genomics

Building a causality-aware single-cell RNA-seq foundation model via context-specific causal regulation modeling

scCAFM is a causality-aware foundation model designed for large-scale single-cell transcriptomic analysis. Unlike existing single-cell foundation models that mainly learn associative gene relationships or operate only at the dataset‐ or cell-type level, scCAFM enables cell-specific causal inference at atlas scale while simultaneously learning transferable gene and cell embeddings enriched with causal semantics. By jointly modeling gene regulatory structure and context-dependent embeddings, scCAFM provides a powerful foundation for studying heterogeneous cellular states, developmental trajectories, disease progression, and perturbation responses.

Key features

Structure foundation module (SFM)

  • Efficient, context-aware causal GRN inference in a latent factor space.
  • Uses a Mixture-of-Experts (MoE) architecture so different latent experts capture distinct regulatory contexts; this enables per-cell GRN specialization without learning a full causal model per cell.
  • Outputs: per-cell directed edges with causal confidence, context assignment, and compact latent summaries.

Embedding foundation module (EFM)

  • Learns gene and cell embeddings guided by the SFM-inferred causal structure (e.g., contrastive/cause-aware objectives).
  • Embeddings are transferable: they improve downstream supervised and unsupervised tasks (drug sensitivity, perturbation response prediction, trajectory/lineage inference).

Model assets

Model files are stored under models/:

  • models/sfm_config.json
  • models/sfm_model.safetensors
  • models/cond_dict.json
  • models/vocab.json
  • models/vocab.safetensors

Supporting CSV resources such as human_tfs.csv, mouse_tfs.csv, OmniPath.csv, and homologous.csv stay at the repository root.

The source code, training pipeline, and full documentation are maintained in the GitHub repository: