Galaxy-CLIP (Fine-tuned)

Model Description

Galaxy-CLIP is a domain-adapted vision-language model fine-tuned from OpenAI’s CLIP on astronomical galaxy imagery paired with synthetic natural language descriptions.

The model learns a shared embedding space between galaxy images and descriptive text, enabling tasks such as:

Zero-shot galaxy classification
Image–text retrieval (e.g. “spiral galaxy with prominent arms”)
Semantic search over astronomical datasets
Embedding generation for downstream astronomy models

This model is designed to improve CLIP’s understanding of astronomical morphology, which is poorly represented in generic internet-scale datasets.

Model Details

Base model: OpenAI CLIP (ViT-based)
Architecture: Dual encoder (image encoder + text encoder)
Task: Contrastive image-text learning
Parameters: ~200M ([Hugging Face][1])
Framework: Hugging Face Transformers
Author: Michael Jupp (juppy44)

Training Data

The model was fine-tuned on:

Galaxy images (astronomy datasets, e.g. Galaxy Zoo-style data)
Text descriptions: astronolan/galaxy-descriptions

These descriptions are VLM-generated captions describing galaxy morphology and structure, such as:

“barred spiral galaxy with tightly wound arms”
“elliptical galaxy with smooth light distribution”
“irregular galaxy with asymmetrical structure”

This setup enables weakly-supervised multimodal learning without requiring manual annotation.

Training Procedure

The model was fine-tuned using standard CLIP-style contrastive learning:

Paired (image, text) samples
Cross-modal similarity objective (InfoNCE loss)
Batch-wise matching of correct vs incorrect pairs

Typical pipeline:

Image encoder → visual embeddings
Text encoder → language embeddings
Cosine similarity used for alignment

Fine-tuning allows the model to shift from generic vision-language understanding → astronomy-specific semantics, which is critical since base CLIP has little exposure to galaxy morphology.

Intended Uses

Direct Use

Zero-shot classification of galaxy morphology
Image–text similarity scoring
Semantic search over galaxy datasets
Embedding generation for clustering or retrieval

Downstream Use

Astronomy-focused RAG systems
Scientific dataset indexing
Feature extraction for classification models
Input embeddings for reasoning systems (LLMs + astronomy heads)

Out-of-Scope Use

High-precision scientific measurement (e.g. photometry, redshift estimation)
Astrophysical inference requiring calibrated data
Medical or non-astronomy domains

This is a representation model, not a physics model.

Limitations

Trained on synthetic descriptions, not human-verified annotations
May inherit biases or inaccuracies from caption generation
Performance depends heavily on dataset quality and diversity
Not robust to non-astronomical imagery

Example Usage

from transformers import CLIPProcessor, CLIPModel
from PIL import Image
import torch

model = CLIPModel.from_pretrained("juppy44/galaxy-clip-finetuned")
processor = CLIPProcessor.from_pretrained("juppy44/galaxy-clip-finetuned")

image = Image.open("galaxy.jpg")

texts = [
    "spiral galaxy with arms",
    "elliptical galaxy",
    "irregular galaxy"
]

inputs = processor(text=texts, images=image, return_tensors="pt", padding=True)

outputs = model(**inputs)
logits_per_image = outputs.logits_per_image
probs = logits_per_image.softmax(dim=1)

print(probs)

Evaluation

No formal benchmark evaluation has been conducted yet.

However, expected improvements over base CLIP:

Better alignment with astronomy-specific language
Improved retrieval for morphology-based queries
More meaningful embedding clusters for galaxy types

Bias & Risks

Synthetic captions may introduce hallucinated or oversimplified features
Model may overfit to text patterns rather than physical structure
Not suitable for scientific conclusions without validation

Future Work

Replace synthetic captions with expert-labelled datasets
Add spectral + tabular modalities
Train multi-head architectures for morphology + physics
Integrate with astronomy reasoning systems (LLMs + structured inputs)

Citation

If you use this model:

@misc{jupp2026galaxyclip,
  title={Galaxy-CLIP: Domain-adapted vision-language model for galaxy morphology},
  author={Jupp, Michael},
  year={2026},
  howpublished={\url{https://huggingface.co/juppy44/galaxy-clip-finetuned}}
}

Downloads last month: 21

Safetensors

Model size

0.2B params

Tensor type

F32