Mirali33
/

MOMO

vision-transformer

foundation-model

planetary-science

Model card Files Files and versions

Mirali33 commited on 11 days ago

Commit

1b795c7

·

verified ·

1 Parent(s): bb81293

Add model card

Files changed (1) hide show

README.md +87 -0

README.md ADDED Viewed

	@@ -0,0 +1,87 @@

+---
+license: cc-by-4.0
+tags:
+  - mars
+  - remote-sensing
+  - vision-transformer
+  - foundation-model
+  - model-merging
+  - planetary-science
+---
+# MOMO: Mars Orbital Model
+**MOMO** is the first multi-sensor foundation model for Mars remote sensing, accepted at **CVPR 2026**.
+It integrates representations learned independently from three Martian orbital sensors — HiRISE, CTX, and THEMIS — spanning resolutions from 0.25 m/pixel to 100 m/pixel, using task arithmetic model merging with a novel **Equal Validation Loss (EVL)** checkpoint selection strategy.
+[![arXiv](https://img.shields.io/badge/arXiv-2604.02719-b31b1b.svg?logo=arxiv&logoColor=white)](https://arxiv.org/abs/2604.02719) [![GitHub](https://img.shields.io/badge/GitHub-kerner--lab%2FMOMO-black?logo=github&logoColor=white)](https://github.com/kerner-lab/MOMO)
+---
+## Checkpoints
+Each model size includes 5 checkpoints:
+| File | Description |
+|------|-------------|
+| `ctx.pth` | Pre-trained on CTX (ConTeXt Camera) |
+| `hirise.pth` | Pre-trained on HiRISE (High Resolution Imaging Science Experiment) |
+| `themis.pth` | Pre-trained on THEMIS (THermal EMission Imaging System) |
+| `hirise_ctx_themis.pth` | Pre-trained jointly on all three sensors |
+| `momo.pth` | **MOMO** — merged model via task arithmetic + EVL (main contribution) |
+Available for three ViT architectures:
+```
+vit-s-16/   ViT-Small (patch 16)
+vit-b-16/   ViT-Base  (patch 16)
+vit-l-16/   ViT-Large (patch 16)
+```
+ViT-Base is the primary model reported in the main paper. ViT-Small and ViT-Large results are reported in the supplementary material.
+---
+## Usage
+```python
+import torch
+from huggingface_hub import hf_hub_download
+# Download MOMO ViT-Base checkpoint
+path = hf_hub_download(repo_id="Mirali33/MOMO", filename="vit-b-16/momo.pth")
+checkpoint = torch.load(path, map_location="cpu", weights_only=False)
+```
+For full training and fine-tuning code, see the [MOMO GitHub repository](https://github.com/kerner-lab/MOMO).
+---
+## Training Data
+MOMO is pre-trained on ~12 million samples (~4M per sensor) from Mars orbital imagery:
+- **HiRISE** — 0.25 m/pixel high-resolution visible spectrum images
+- **CTX** — 5 m/pixel context camera images
+- **THEMIS** — 100 m/pixel thermal infrared images
+---
+## Evaluation
+MOMO is evaluated on 9 downstream tasks from [Mars-Bench](https://arxiv.org/abs/2510.24010) (4 classification, 5 segmentation), outperforming ImageNet pre-training, earth observation foundation models (SatMAE, CROMA, Prithvi, TerraFM), sensor-specific pre-training, and fully-supervised baselines.
+---
+## Citation
+```bibtex
+@inproceedings{purohit2026momo,
+  title     = {MOMO: Mars Orbital Model — Foundation Model for Mars Orbital Applications},
+  author    = {Purohit, Mirali and Gajera, Bimal and Mehta, Irish and Tokas, Bhanu and
+               Adler, Jacob and Lu, Steven and Dickenshied, Scott and Diniega, Serina and
+               Bue, Brian and Rebbapragada, Umaa and Kerner, Hannah},
+  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
+  year      = {2026}
+}
+```