Text Generation
Transformers
Safetensors
gemma3_text
medical
radiology
medgemma
crimson
conversational
text-generation-inference

MedGemma-4B-IT Fine-tuned for CRIMSON Scoring

This model is a fine-tuned version of google/medgemma-4b-it, fully fine-tuned on the training set of ReXGradient-160K for radiology report evaluation with CRIMSON scoring.

Model Details

  • Base Model: google/medgemma-4b-it
  • Fine-tuning Method: LoRA (merged into base model)
  • Language: English
  • Domain: Medical / Radiology
  • Task: Radiology report generation evaluation

Intended Use

This model is designed for CRIMSON scoring — evaluating the quality of AI-generated radiology reports by comparing them against ground truth reports and identifying errors (false findings, missing findings, attribute errors).

Installation & Usage

1. Install RadGame-MedGemma

git clone https://github.com/MohammedSB/RadGame-MedGemma
cd RadGame-MedGemma
pip install -e .

2. Use with CRIMSON

from CRIMSON.CRIMSON.generate_score import CRIMSONScore

# Initialize scorer with the finetuned model
scorer = CRIMSONScore(model_name="CRIMSONScore/medgemma-4b-it-crimson")

# Evaluate a candidate report against ground truth
result = scorer.evaluate(
    reference_findings="No acute cardiopulmonary abnormality. Heart size is normal.",
    predicted_findings="No acute cardiopulmonary abnormality. Mild cardiomegaly.",
    patient_context={"age": "65", "indication": "chest pain"},
    include_guidelines=False,
)

print(f"CRIMSON Score: {result['crimson_score']}")
print(f"Error counts: {result['error_counts']}")

Training Data Generation

The training data was generated using a multi-regime candidate generation pipeline designed to create diverse (ground truth, candidate) pairs with varying types of errors.

Regimes

The pipeline samples from 6 regimes with equal probability:

Regime Type Description
0 Non-LLM Random Report: Substitutes a randomly selected report from a different study
1 Non-LLM Similar Report: Substitutes a semantically similar report using BERT embeddings (all-MiniLM-L6-v2) and cosine similarity, selecting from the top-5 most similar reports
2 LLM Perfect Rewrite: Rewrites the report to sound different while preserving exact clinical meaning
3 LLM False Finding Injection: Rewrites and introduces fabricated positive findings (e.g., new pathology, device, anatomical abnormality)
4 LLM Attribute Error: Rewrites and introduces attribute errors on existing findings (location/laterality, severity, morphology, measurements, certainty, temporal changes)
5 LLM Omission Error: Rewrites and omits clinically significant positive findings

For LLM-based regimes (3, 4, 5), up to 2 error types can be combined in a single candidate (e.g., a report with both a false finding and an attribute error).

Scoring Pipeline

Each (ground truth, candidate) pair was scored using CRIMSONScore (using Azure OpenAI GPT-5) to generate structured training labels including error analysis, CRIMSON scores, and significance-weighted error counts.

Patient Context Augmentation

Patient context (age, sex, indication) was included with 80% probability per field during training.

Training Details

Dataset

Hardware

  • 8x NVIDIA H100 80GB HBM3

Hyperparameters

Parameter Value
Epochs 10
Batch size (per device) 4
Gradient accumulation steps 2
Effective batch size 32
Learning rate 1e-4
Warmup ratio 0.05
Weight decay 0.05
Max sequence length 4048
Seed 42

LoRA Configuration

Parameter Value
LoRA rank (r) 16
LoRA alpha 32
LoRA dropout 0.05

Limitations

  • Research use only — not validated for clinical decision-making
  • Designed specifically for CRIMSON scoring; not a general-purpose radiology model

Citation

If you use this model, please cite:

@article{sellergren2025medgemma,
  title={Medgemma technical report},
  author={Sellergren, Andrew and Kazemzadeh, Sahar and Jaroensri, Tiam and Kiraly, Atilla and Traverse, Madeleine and Kohlberger, Timo and Xu, Shawn and Jamil, Fayaz and Hughes, C{\'\i}an and Lau, Charles and others},
  journal={arXiv preprint arXiv:2507.05201},
  year={2025}
}

@article{zhang2025rexgradient,
  title={Rexgradient-160k: A large-scale publicly available dataset of chest radiographs with free-text reports},
  author={Zhang, Xiaoman and Acosta, Juli{\'a}n N and Miller, Josh and Huang, Ouwen and Rajpurkar, Pranav},
  journal={arXiv preprint arXiv:2505.00228},
  year={2025}
}

License

This model is subject to:

https://developers.google.com/health-ai-developer-foundations/terms

Downloads last month
1,384
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rajpurkarlab/medgemma-4b-it-crimson

Finetuned
(584)
this model

Dataset used to train rajpurkarlab/medgemma-4b-it-crimson

Papers for rajpurkarlab/medgemma-4b-it-crimson