Add model card
Browse files
README.md
ADDED
|
@@ -0,0 +1,87 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-4.0
|
| 3 |
+
tags:
|
| 4 |
+
- mars
|
| 5 |
+
- remote-sensing
|
| 6 |
+
- vision-transformer
|
| 7 |
+
- foundation-model
|
| 8 |
+
- model-merging
|
| 9 |
+
- planetary-science
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
# MOMO: Mars Orbital Model
|
| 13 |
+
|
| 14 |
+
**MOMO** is the first multi-sensor foundation model for Mars remote sensing, accepted at **CVPR 2026**.
|
| 15 |
+
|
| 16 |
+
It integrates representations learned independently from three Martian orbital sensors — HiRISE, CTX, and THEMIS — spanning resolutions from 0.25 m/pixel to 100 m/pixel, using task arithmetic model merging with a novel **Equal Validation Loss (EVL)** checkpoint selection strategy.
|
| 17 |
+
|
| 18 |
+
[](https://arxiv.org/abs/2604.02719) [](https://github.com/kerner-lab/MOMO)
|
| 19 |
+
|
| 20 |
+
---
|
| 21 |
+
|
| 22 |
+
## Checkpoints
|
| 23 |
+
|
| 24 |
+
Each model size includes 5 checkpoints:
|
| 25 |
+
|
| 26 |
+
| File | Description |
|
| 27 |
+
|------|-------------|
|
| 28 |
+
| `ctx.pth` | Pre-trained on CTX (ConTeXt Camera) |
|
| 29 |
+
| `hirise.pth` | Pre-trained on HiRISE (High Resolution Imaging Science Experiment) |
|
| 30 |
+
| `themis.pth` | Pre-trained on THEMIS (THermal EMission Imaging System) |
|
| 31 |
+
| `hirise_ctx_themis.pth` | Pre-trained jointly on all three sensors |
|
| 32 |
+
| `momo.pth` | **MOMO** — merged model via task arithmetic + EVL (main contribution) |
|
| 33 |
+
|
| 34 |
+
Available for three ViT architectures:
|
| 35 |
+
|
| 36 |
+
```
|
| 37 |
+
vit-s-16/ ViT-Small (patch 16)
|
| 38 |
+
vit-b-16/ ViT-Base (patch 16)
|
| 39 |
+
vit-l-16/ ViT-Large (patch 16)
|
| 40 |
+
```
|
| 41 |
+
|
| 42 |
+
ViT-Base is the primary model reported in the main paper. ViT-Small and ViT-Large results are reported in the supplementary material.
|
| 43 |
+
|
| 44 |
+
---
|
| 45 |
+
|
| 46 |
+
## Usage
|
| 47 |
+
|
| 48 |
+
```python
|
| 49 |
+
import torch
|
| 50 |
+
from huggingface_hub import hf_hub_download
|
| 51 |
+
|
| 52 |
+
# Download MOMO ViT-Base checkpoint
|
| 53 |
+
path = hf_hub_download(repo_id="Mirali33/MOMO", filename="vit-b-16/momo.pth")
|
| 54 |
+
checkpoint = torch.load(path, map_location="cpu", weights_only=False)
|
| 55 |
+
```
|
| 56 |
+
|
| 57 |
+
For full training and fine-tuning code, see the [MOMO GitHub repository](https://github.com/kerner-lab/MOMO).
|
| 58 |
+
|
| 59 |
+
---
|
| 60 |
+
|
| 61 |
+
## Training Data
|
| 62 |
+
|
| 63 |
+
MOMO is pre-trained on ~12 million samples (~4M per sensor) from Mars orbital imagery:
|
| 64 |
+
- **HiRISE** — 0.25 m/pixel high-resolution visible spectrum images
|
| 65 |
+
- **CTX** — 5 m/pixel context camera images
|
| 66 |
+
- **THEMIS** — 100 m/pixel thermal infrared images
|
| 67 |
+
|
| 68 |
+
---
|
| 69 |
+
|
| 70 |
+
## Evaluation
|
| 71 |
+
|
| 72 |
+
MOMO is evaluated on 9 downstream tasks from [Mars-Bench](https://arxiv.org/abs/2510.24010) (4 classification, 5 segmentation), outperforming ImageNet pre-training, earth observation foundation models (SatMAE, CROMA, Prithvi, TerraFM), sensor-specific pre-training, and fully-supervised baselines.
|
| 73 |
+
|
| 74 |
+
---
|
| 75 |
+
|
| 76 |
+
## Citation
|
| 77 |
+
|
| 78 |
+
```bibtex
|
| 79 |
+
@inproceedings{purohit2026momo,
|
| 80 |
+
title = {MOMO: Mars Orbital Model — Foundation Model for Mars Orbital Applications},
|
| 81 |
+
author = {Purohit, Mirali and Gajera, Bimal and Mehta, Irish and Tokas, Bhanu and
|
| 82 |
+
Adler, Jacob and Lu, Steven and Dickenshied, Scott and Diniega, Serina and
|
| 83 |
+
Bue, Brian and Rebbapragada, Umaa and Kerner, Hannah},
|
| 84 |
+
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
|
| 85 |
+
year = {2026}
|
| 86 |
+
}
|
| 87 |
+
```
|