File size: 6,050 Bytes
fbf7151
 
ba54ea9
 
fbf7151
 
ba54ea9
fbf7151
ba54ea9
d85a788
fbf7151
ba54ea9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fbf7151
ba54ea9
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
---
title: Recap
emoji: 🩺
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
license: mit
short_description: A patient's whole life, cited. MedGemma+Qwen on MI300X.
---
---
title: Recap
emoji: 🩺
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
license: mit
short_description: Recap reads the whole chart so you don't have to.
---
# Recap

> *Reads the whole chart so you don't have to.*

Drop in a patient's scattered medical records β€” lab PDFs, scans, photos, discharge summaries β€” and Recap gives you back two things:

1. **A chronological timeline** of every event, color-coded by type
2. **A chat box** where you can ask plain-language questions, with every answer cited to the exact source page or lab row

No diagnosis. No treatment. Just *"read everything and answer questions about what's been read."*

## The hackathon angle

Recap is built for the [AMD x LabLab.ai Developer Hackathon](https://lablab.ai/ai-hackathons/amd-developer) (May 2026). The technical headline:

> **The only GPU with enough memory to keep a patient's whole record co-resident with the reasoner.**

The premium-mode backend runs **MedGemma-27B-MM** (medical multimodal specialist) and **Qwen-32B** (reasoning + multilingual orchestrator) **co-resident on a single AMD MI300X (192 GB HBM3)** along with cached imaging-foundation embeddings and a 128 K-token KV cache. Impossible on H100/A100 80 GB cards.

The public Hugging Face Space runs a lite version (MedGemma-4B-MM on ZeroGPU H200) so anyone can try it.

## Architecture

```
            β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ HF Space (Gradio) ──────────────┐
            β”‚  3 preloaded showcase patients                β”‚
            β”‚  Plotly timeline + chat with citations        β”‚
            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                             β”‚                 β”‚
                  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                  β”‚ ZeroGPU (H200)  β”‚ β”‚ AMD MI300X (192GB) β”‚
                  β”‚ MedGemma-4B-MM  β”‚ β”‚ MedGemma-27B-MM    β”‚
                  β”‚ Always-on lite  β”‚ β”‚ + Qwen-32B reasonerβ”‚
                  β”‚                 β”‚ β”‚ + foundation cache β”‚
                  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

## Project structure

```
src/recap/
β”œβ”€β”€ config.py             # env-driven config
β”œβ”€β”€ models.py             # Event, Citation, Patient, Answer
β”œβ”€β”€ ingestion/
β”‚   β”œβ”€β”€ fhir.py           # Synthea bundles β†’ events
β”‚   β”œβ”€β”€ pdf.py            # lab PDFs β†’ page records
β”‚   └── image.py          # medical images β†’ events
β”œβ”€β”€ timeline.py           # chronological event view (TBD)
β”œβ”€β”€ retrieval.py          # BM25 over events (TBD)
β”œβ”€β”€ inference/            # gateway routing zerogpu vs mi300x (TBD)
β”œβ”€β”€ reasoner.py           # two-stage MedGemma β†’ Qwen (TBD)
└── ui/                   # Gradio components (TBD)

backend/                  # FastAPI on MI300X (TBD)
data/cases/               # showcase patients (Synthea + curated images)
scripts/                  # generators + smoke tests
space/                    # HF Space deploy artifacts
tests/                    # 13 passing unit tests
```

## Showcase cases

Built from [Synthea](https://github.com/synthetichealth/synthea) (Apache 2.0 synthetic patient generator) paired with condition-matched public imaging:

- **Sarah, 67** β€” kidney decline over 8 years (tests time-axis questions)
- **Marcus, 54** β€” suspicious lump β†’ cancer journey (tests multimodal grounding)
- **Aisha, 29** β€” immigrant patient with foreign-language records (tests Qwen multilingual)

## Running locally

```bash
uv venv .venv --python 3.11
uv pip install --python .venv/bin/python -r requirements.txt
.venv/bin/python -m pytest tests/ -v       # 13 passing
.venv/bin/python app.py                    # local Gradio at :7860
```

Environment variables (all prefixed `RECAP_*`):

| Var | Default | Meaning |
|---|---|---|
| `RECAP_BACKEND` | `zerogpu` | One of `zerogpu`, `mi300x`, `mock` |
| `RECAP_MI300X_URL` | β€” | Premium-mode backend URL (set when the MI300X box is up) |
| `RECAP_MEDGEMMA_LITE` | `google/medgemma-1.5-4b-it` | Public-Space model |
| `RECAP_MEDGEMMA_PREMIUM` | `google/medgemma-27b-it` | MI300X model |
| `RECAP_QWEN` | `Qwen/Qwen3.6-27B` | Reasoner model β€” latest dense Qwen (Apr 2026), matched 27B class to MedGemma. Fallbacks: `Qwen/Qwen3-32B`, `Qwen/Qwen3-14B`, `Qwen/Qwen3.6-35B-A3B` |

## Hugging Face Space deployment

The HF Space requires YAML frontmatter at the top of its README, which GitHub renders as an ugly metadata table. To keep the GitHub README clean and the HF README correct, the frontmatter lives in `space/header.md` and the deploy script assembles a combined `space/README.md` before pushing to the HF Space remote:

```bash
./scripts/build_hf_readme.sh                # writes space/README.md
# then push space/README.md to the HF Space repo
```

## Tech stack

- **Models:** Google MedGemma 1.5 (4B-MM lite, 27B-MM premium), Alibaba **Qwen 3.6-27B** (latest, released 2026-04-22)
- **Serving:** vLLM-on-ROCm on MI300X, HF Transformers + ZeroGPU `@spaces.GPU` on the Space
- **Frontend:** Gradio 4.44, Plotly
- **Data:** Synthea synthetic FHIR + public CC0 imaging, packaged as an HF Dataset

## Disclaimer

**Not for clinical use.** Demo only. All patients are synthetic β€” no real PHI is touched, stored, or processed. The model card for MedGemma explicitly forbids unmodified clinical deployment.

## License

MIT (this repo). Upstream models retain their respective licenses (MedGemma β†’ Google's Health AI Developer Foundations terms; Qwen β†’ Tongyi Qianwen License).