Below is a compact but deepened tech design doc that applies your three constraints:

1. Reuse existing ClinicalTrials MCPs.  
2. Make Parlant workflows map tightly onto real clinical screening.  
3. Lay out a general patient plan (using synthetic data) that feels like a real-world journey.

No code; just user flow, data contracts, and architecture.

---

## **1\. Scope & Positioning**

**PoC Goal (2‑week sprint, YAGNI):**  
A working, demoable *patient‑centric* trial-matching copilot that:

* Takes **synthetic NSCLC patients** (documents \+ minimal metadata).  
* Uses **MedGemma 4B multimodal** to understand those artifacts.  
* Uses **Gemini 3 Pro \+ Parlant** to orchestrate **patient‑to‑trials matching** via an **off‑the‑shelf ClinicalTrials MCP server**.  
* Produces an **eligibility ledger \+ gap analysis** aligned with real clinical screening workflows (prescreen → validation), not “toy” UX.

We explicitly **don’t** build our own trial MCP, own search stack, or multi-service infra. Everything runs in a thin orchestrator \+ UI process.

---

## **2\. Real-World Screening Workflow Mapping**

Evidence from clinical practice and trial‑matching research converges on a two‑stage flow:[appliedclinicaltrialsonline+4](https://www.appliedclinicaltrialsonline.com/view/clinical-trial-matching-solutions-understanding-the-landscape)

1. **Prescreening**  
   * Quick eligibility judgment on a *minimal dataset*: diagnosis, stage, functional status (ECOG), basic labs, key comorbidities.  
   * Usually: oncologist \+ coordinator \+ minimal EHR context.  
   * Goal: “Is this patient worth deeper chart review for any trials here?”  
2. **Validation (Full Match / Chart Review)**  
   * Detailed comparison of **full record** vs **full inclusion/exclusion**, often 40–60 criteria per trial.  
   * Typically done by a coordinator/CRA with investigator sign‑off.  
   * Goal: for a *specific trial*, decide: *eligible / excluded / unclear → needs further tests*.

Our PoC should simulate this **two‑stage workflow**:

* **Stage 1 \= “Patient‑First Prescreen”** → shortlist trials via MCP \+ Gemini using MedGemma‑extracted “minimal dataset”.  
* **Stage 2 \= “Trial‑Specific Validation”** → trial‑by‑trial, criterion‑by‑criterion ledger using MedGemma evidence.

Parlant Journeys become the *explicit codification* of these two stages \+ transitions.

---

## **3\. High-Level Architecture (YAGNI, Reusing MCP)**

## **3.1 Components**

**1\) UI & Orchestrator (single process)**

* Streamlit/FastAPI-style app (exact stack is secondary) that:  
  * Hosts the chat/stepper UI.  
  * Embeds **Parlant** and maintains session state.  
  * Calls external tools (Gemini API, MedGemma HF endpoint, ClinicalTrials MCP).

**2\) Parlant Agent \+ Journey**

* Single Parlant agent, e.g. `patient_trial_copilot`.  
* One **Journey** with explicit stages mirroring real-world workflow:  
  * `INGEST` → `PRESCREEN` → `VALIDATE_TRIALS` → `GAP_FOLLOWUP` → `SUMMARY`.  
* Parlant rules enforce:  
  * When to call which tool.  
  * When to move from prescreen to validation.  
  * When to ask the patient (synthetic persona) for more documents.

**3\) MedGemma 4B Multimodal Service (HF endpoint)**

* Input: PDF(s) \+ optional images.  
* Output: structured **PatientProfile** \+ **evidence spans** (doc/page/region references).  
* Used twice:  
  * Once for **prescreen dataset** extraction.  
  * Once for **criterion‑level validation** (patient vs trial snippets).

**4\) Gemini 3 Pro (LLM Planner & Re‑ranker)**

* Uses Google AI / Vertex Gemini 3 Pro for:  
  * Generating query parameters for ClinicalTrials MCP from PatientProfile.  
  * Interpreting MCP results & producing ranked **TrialCandidate** list.  
  * Orchestrating criterion slicing and gap reasoning.  
* Strategy: keep Gemini in **tools \+ structured outputs** mode; no direct free-form “actions”.

**5\) ClinicalTrials MCP Server (Existing)**

* Choose an existing **ClinicalTrials MCP server** rather than hand-rolling: e.g. one of the open-source MCP servers wrapping the ClinicalTrials.gov REST API v2.[github+3](https://github.com/JackKuo666/ClinicalTrials-MCP-Server)  
* Must support at least:  
  * `search_trials(parameters)` → list of (NCT ID, title, conditions, locations, status, phase, eligibility text).  
  * `get_trial(nct_id)` → full record including inclusion/exclusion criteria.

## **3.2 Why Reuse MCP is Critical**

* **Time**: ClinicalTrials.gov v2 API is detailed and somewhat finicky; paging, filters, field lists. Existing MCPs already encode those details \+ JSON schemas.[nlm.nih+1](https://www.nlm.nih.gov/pubs/techbull/ma24/ma24_clinicaltrials_api.html)  
* **Alignment with agentic ecosystems**: These MCP servers are already shaped as “tools” for LLMs. We just plug Parlant/Gemini on top.  
* **YAGNI**: custom MCP or RAG index for trials is a post‑PoC optimization.

---

## **4\. Data Contracts (Core JSON Schemas)**

We keep contracts minimal but explicit, so we can test each piece in isolation.

## **4.1 PatientProfile (v1)**

Output of MedGemma’s **prescreen extraction**; updated as new docs arrive:

json  
`{`  
  `"patient_id": "string",`  
  `"source_docs": [`  
    `{ "doc_id": "string", "type": "clinic_letter|pathology|lab|imaging", "meta": {} }`  
  `],`  
  `"demographics": {`  
    `"age": 52,`  
    `"sex": "female"`  
  `},`  
  `"diagnosis": {`  
    `"primary_condition": "Non-Small Cell Lung Cancer",`  
    `"histology": "adenocarcinoma",`  
    `"stage": "IVa",`  
    `"diagnosis_date": "2025-11-15"`  
  `},`  
  `"performance_status": {`  
    `"scale": "ECOG",`  
    `"value": 1,`  
    `"evidence": [{ "doc_id": "clinic_1", "page": 2, "span_id": "s_17" }]`  
  `},`  
  `"biomarkers": [`  
    `{`  
      `"name": "EGFR",`  
      `"result": "Exon 19 deletion",`  
      `"date": "2026-01-10",`  
      `"evidence": [{ "doc_id": "path_egfr", "page": 1, "span_id": "s_3" }]`  
    `}`  
  `],`  
  `"key_labs": [`  
    `{`  
      `"name": "ANC",`  
      `"value": 1.8,`  
      `"unit": "10^9/L",`  
      `"date": "2026-01-28",`  
      `"evidence": [{ "doc_id": "labs_jan", "page": 1, "span_id": "tbl_anc" }]`  
    `}`  
  `],`  
  `"treatments": [`  
    `{`  
      `"drug_name": "Pembrolizumab",`  
      `"start_date": "2024-06-01",`  
      `"end_date": "2024-11-30",`  
      `"line": 1,`  
      `"evidence": [{ "doc_id": "clinic_2", "page": 3, "span_id": "s_45" }]`  
    `}`  
  `],`  
  `"comorbidities": [`  
    `{`  
      `"name": "CKD",`  
      `"grade": "Stage 3",`  
      `"evidence": [{ "doc_id": "clinic_1", "page": 2, "span_id": "s_20" }]`  
    `}`  
  `],`  
  `"imaging_summary": [`  
    `{`  
      `"modality": "MRI brain",`  
      `"date": "2026-01-20",`  
      `"finding": "Stable 3mm left frontal lesion, no enhancement",`  
      `"interpretation": "likely inactive scar",`  
      `"certainty": "low|medium|high",`  
      `"evidence": [{ "doc_id": "mri_report", "page": 1, "span_id": "s_9" }]`  
    `}`  
  `],`  
  `"unknowns": [`  
    `{ "field": "EGFR", "reason": "No clear mention", "importance": "high" }`  
  `]`  
`}`

Notes:

* `unknowns` is **explicit**, enabling Parlant to decide what to ask for in `GAP_FOLLOWUP`.  
* `evidence` structure enables later criterion-level ledger to reference the same spans.  
* This is **not** a fully normalized EHR; it’s what’s needed for prescreening.[pmc.ncbi.nlm.nih+1](https://pmc.ncbi.nlm.nih.gov/articles/PMC11612666/)

## **4.2 SearchAnchors (v1)**

Intermediate structure Gemini produces from PatientProfile to drive the MCP search:

json  
`{`  
  `"condition": "Non-Small Cell Lung Cancer",`  
  `"subtype": "adenocarcinoma",`  
  `"biomarkers": ["EGFR exon 19 deletion"],`  
  `"stage": "IV",`  
  `"geography": {`  
    `"country": "DE",`  
    `"max_distance_km": 200`  
  `},`  
  `"age": 52,`  
  `"performance_status_max": 1,`  
  `"trial_filters": {`  
    `"recruitment_status": ["Recruiting", "Not yet recruiting"],`  
    `"phase": ["Phase 2", "Phase 3"]`  
  `},`  
  `"relaxation_order": [`  
    `"phase",`  
    `"distance",`  
    `"biomarker_strictness"`  
  `]`  
`}`

This mirrors patient‑centric matching literature: patient characteristics \+ geography \+ site status.[nature+1](https://www.nature.com/articles/s41467-024-53081-z)

## **4.3 TrialCandidate (v1)**

Returned by ClinicalTrials MCP search and lightly normalized:

json  
`{`  
  `"nct_id": "NCT01234567",`  
  `"title": "Phase 3 Study of Osimertinib in EGFR+ NSCLC",`  
  `"conditions": ["NSCLC"],`  
  `"phase": "Phase 3",`  
  `"status": "Recruiting",`  
  `"locations": [`  
    `{ "country": "DE", "city": "Berlin" },`  
    `{ "country": "DE", "city": "Hamburg" }`  
  `],`  
  `"age_range": { "min": 18, "max": 75 },`  
  `"fingerprint_text": "short concatenation of title + key inclusion/exclusion + keywords",`  
  `"eligibility_text": {`  
    `"inclusion": "raw inclusion criteria text ...",`  
    `"exclusion": "raw exclusion criteria text ..."`  
  `}`  
`}`

`fingerprint_text` is purposely short and designed for Gemini reranking; full eligibility goes to MedGemma for criterion analysis.

## **4.4 EligibilityLedger (v1)**

Final artifact per trial, shown to the “clinician” or patient:

json  
`{`  
  `"patient_id": "P001",`  
  `"nct_id": "NCT01234567",`  
  `"overall_assessment": "likely_eligible|likely_ineligible|uncertain",`  
  `"criteria": [`  
    `{`  
      `"criterion_id": "inc_1",`  
      `"type": "inclusion",`  
      `"text": "Histologically confirmed NSCLC, stage IIIB/IV",`  
      `"decision": "met|not_met|unknown",`  
      `"patient_evidence": [{ "doc_id": "clinic_1", "page": 1, "span_id": "s_12" }],`  
      `"trial_evidence": [{ "field": "eligibility_text.inclusion", "offset_start": 0, "offset_end": 80 }]`  
    `},`  
    `{`  
      `"criterion_id": "exc_3",`  
      `"type": "exclusion",`  
      `"text": "No prior treatment with immune checkpoint inhibitors",`  
      `"decision": "not_met",`  
      `"patient_evidence": [{ "doc_id": "clinic_2", "page": 3, "span_id": "s_45" }],`  
      `"trial_evidence": [{ "field": "eligibility_text.exclusion", "offset_start": 211, "offset_end": 280 }]`  
    `}`  
  `],`  
  `"gaps": [`  
    `{`  
      `"description": "Requires brain MRI within 28 days; last MRI is 45 days old",`  
      `"recommended_action": "Repeat brain MRI",`  
      `"clinical_importance": "high"`  
    `}`  
  `]`  
`}`

This mirrors TrialGPT’s criterion‑level output (explanation \+ evidence locations \+ decision) but tuned to our multimodal extraction and PoC constraints.\[[nature](https://www.nature.com/articles/s41467-024-53081-z)\]​

---

## **5\. Parlant Workflow Design (Aligned with Real Clinical Work)**

We design a **single Parlant Journey** that approximates the real-world job of a trial coordinator/oncologist team, but in a patient‑centric context.[pmc.ncbi.nlm.nih+3](https://pmc.ncbi.nlm.nih.gov/articles/PMC6685132/)

## **5.1 Journey States**

**States:**

1. `INGEST` (Document Collection)  
2. `PRESCREEN` (Patient-Level Trial Shortlist)  
3. `VALIDATE_TRIALS` (Trial-Level Eligibility Ledger)  
4. `GAP_FOLLOWUP` (Patient Data Completion Loop)  
5. `SUMMARY` (Shareable Packet & Next Steps)

## **State 1 — INGEST**

**Role in real world:** Patient (or referrer) provides records; coordinator checks if enough to do prescreen.[trialchoices+2](https://www.trialchoices.org/post/what-to-expect-during-the-clinical-trial-screening-process)

**Inputs:**

* Uploaded PDFs/images (synthetic in PoC).  
* Lightweight metadata (age, sex, location) from user form.

**Actions:**

* Parlant calls MedGemma with multimodal input (images \+ text) to generate `PatientProfile.v1`.  
* Parlant agent summarises back to the patient:  
  * What it understood (“You have stage IV NSCLC, ECOG 1, EGFR unknown”).  
  * What it is missing (“I did not find EGFR mutation status or recent brain MRI”).

**Transitions:**

* If **minimal prescreen dataset is present** (diagnosis \+ stage \+ ECOG \+ rough labs): → `PRESCREEN`.  
* Else: stays in `INGEST` but triggers `GAP_FOLLOWUP`‑style prompts (“Can you upload a pathology report or discharge summary?”).

## **State 2 — PRESCREEN**

**Role in real world:** Pre‑filter to “worth reviewing” trials based on limited data.[pmc.ncbi.nlm.nih+1](https://pmc.ncbi.nlm.nih.gov/articles/PMC11612666/)

**Inputs:**

* `PatientProfile.v1`.

**Actions:**

* Gemini converts `PatientProfile` → `SearchAnchors.v1`.  
* Parlant calls **existing ClinicalTrials MCP** with `SearchAnchors` mapping to MCP’s parameters:  
  * Condition keywords  
  * Recruitment status  
  * Phase filters  
  * Geography  
* Trials returned as `TrialCandidate` list.  
* Gemini reranks them using `fingerprint_text` \+ `PatientProfile` to produce a shortlist (e.g., top 20).  
* Parlant communicates to user:  
  * “Based on your profile, I found 23 potentially relevant NSCLC trials; I’ll now check each more carefully.”

**Transitions:**

* If **0 trials** → `GAP_FOLLOWUP` (relax criteria and/or widen geography).  
* If **\>0 trials** → `VALIDATE_TRIALS`.

This maps to patient‑centric matching described in the applied literature: single patient → candidate trials, then deeper evaluation.[trec-cds+2](https://www.trec-cds.org/2021.html)

## **State 3 — VALIDATE\_TRIALS**

**Role in real world:** Detailed chart review vs full eligibility criteria.[pmc.ncbi.nlm.nih+1](https://pmc.ncbi.nlm.nih.gov/articles/PMC6685132/)

**Inputs:**

* Shortlisted `TrialCandidate` (e.g., top 10–20).

**Actions:**

For each trial in shortlist:

1. Gemini slices inclusion/exclusion text into atomic criteria (each with an ID and text).  
2. For each criterion:  
   * Parlant calls **MedGemma** with:  
     * `PatientProfile` \+ selected patient evidence snippets (and where available, underlying images).  
     * Criterion text snippet.  
   * MedGemma outputs:  
     * `decision: met/not_met/unknown`.  
     * `patient_evidence` span references (doc/page/span\_id).  
3. Parlant aggregates per‑trial into `EligibilityLedger.v1`.

**Outputs:**

* A ranked list of trials with:  
  * Traffic‑light label (green/yellow/red) for overall eligibility (+ explanation).  
  * Criterion‑level breakdowns & evidence pointers.

**Transitions:**

* If **no trial has any green/yellow** (all clearly ineligible):  
  * `GAP_FOLLOWUP` to explore whether missing data (e.g., outdated labs) could change this.  
* Else:  
  * Offer `SUMMARY` while keeping `GAP_FOLLOWUP` open.

## **State 4 — GAP\_FOLLOWUP**

**Role in real world:** Additional tests/data to confirm eligibility (e.g., labs, imaging).[pfizerclinicaltrials+2](https://www.pfizerclinicaltrials.com/about/steps-to-join)

**Inputs:**

* `PatientProfile.unknowns` \+ `EligibilityLedger.gaps`.

**Actions:**

* Gemini synthesizes the **minimal actionable set** of missing data:  
  * E.g., “Most promising trials require: (1) current EGFR mutation status, (2) brain MRI \< 28 days old.”  
* Parlant:  
  * Poses this to the patient in simple language.  
  * For PoC, user (you, or script) uploads new synthetic documents representing those tests.  
* On new upload, we go back through `INGEST` → update `PatientProfile` → fast‑path direct to `PRESCREEN`/`VALIDATE_TRIALS`.

**Transitions:**

* On new docs → `INGEST` (update and re‑run).  
* If user declines or no additional data possible → `SUMMARY` with clear explanation (“Here’s why current trials don’t fit”).

## **State 5 — SUMMARY**

**Role in real world:** Coordinator/oncologist summarises findings, shares options, and discusses next steps.[pfizerclinicaltrials+2](https://www.pfizerclinicaltrials.com/about/steps-to-join)

**Inputs:**

* Final `PatientProfile`.  
* Set of `EligibilityLedger` objects for top trials.  
* List of `gaps`.

**Actions:**

* Generate:  
  * **Patient‑friendly summary**: 3–5 bullet explanation of matches.  
  * **Clinician packet**: aggregated ledger and evidence pointers, referencing doc IDs and trial NCT IDs.  
* For PoC: show in UI \+ downloadable JSON/Markdown.

**Transitions:**

* End of Journey.

---

## **6\. General Patient Plan (Synthetic Data Flow)**

We simulate realistic but synthetic patients, and run them through exactly the above journey.

## **6.1 Synthetic Patient Generation & Formats**

**Source:**

* TREC Clinical Trials Track 2021/2022 patient topics (free‑text vignettes) as the ground truth for “what the patient’s story should convey”.[trec-cds+3](https://www.trec-cds.org/2022.html)  
* Synthea or custom scripts to generate structured NSCLC trajectories consistent with those vignettes (for additional fields we want).

**Artifacts per patient:**

1. **Clinic letter PDF**  
   * Plain text \+ embedded logo; maybe 1–2 key tables (comorbidities, meds).  
2. **Biomarker/pathology PDF**  
   * EGFR/ALK/PD‑L1 etc, with small table or scanned‑like image.  
3. **Lab report PDF**  
   * Hematology and chemistry values, with dates.  
4. **Imaging report PDF** (+ optional illustrative image)  
   * Brain MRI/CT narrative with lesion description; maybe a low‑res “snapshot” image.

Each artifact is saved with metadata mapping to the underlying TREC topic (so we can label what the “true” conditions/stage/biomarkers are).

## **6.2 Patient Journey (Narrative)**

For each synthetic patient “Anna”:

1. **Pre‑visit (INGEST)**  
   * Anna (or a proxy) uploads her documents to the copilot.  
   * MedGemma extracts a `PatientProfile`.  
   * Parlant confirms: “You have stage IV NSCLC with ECOG 1 and prior pembrolizumab; I don’t see your EGFR mutation test yet.”  
2. **Prescreen (PRESCREEN)**  
   * Using `SearchAnchors`, trials are fetched via ClinicalTrials MCP.  
   * The system returns, e.g., 30 candidates; after reranking, top 10 are selected for validation.  
3. **Trial Validation (VALIDATE\_TRIALS)**  
   * For each of top 10, the eligibility ledger is computed.  
   * System identifies, say, 3 trials with many green criteria but a few unknowns (e.g., recent brain MRI).  
4. **Gap‑Driven Iteration (GAP\_FOLLOWUP)**  
   * Copilot: “You likely qualify for trial NCT01234567 if you have a brain MRI within the last 28 days. Your last MRI is 45 days ago. If your doctor orders a new MRI and the report shows no active brain metastases, you may qualify. For this PoC, you can upload a ‘new MRI report’ file to simulate this.”  
   * New synthetic PDF is uploaded; `PatientProfile` is updated.  
5. **Re‑match & Summary (PRESCREEN → VALIDATE\_TRIALS → SUMMARY)**  
   * System re‑runs with updated `PatientProfile`.  
   * Now 3 trials are “likely eligible”, with red flags on only non‑critical criteria.  
   * Copilot generates:  
     * Patient summary: “Here are three trials that look promising for your situation, and why.”  
     * Clinician packet: ledger \+ evidence pointers that mimic a coordinator’s notes.

This general patient plan is consistent across synthetic cases but parameterized by each TREC topic (e.g. biomarker variant, comorbidity pattern).

---

## **7\. How This Plan Fixes Earlier Gaps**

1. **No custom trial search stack**  
   * We explicitly plug into existing ClinicalTrials MCPs built for LLM agents, aligning with your “don’t reinvent the wheel” constraint and drastically lowering infra risk in 2 weeks.[github+2](https://github.com/cyanheads/clinicaltrialsgov-mcp-server)  
2. **Parlant used as a real workflow engine, not just a wrapper**  
   * States mirror prescreen vs validation vs gap‑closure described in empirical screening studies and trial‑matching frameworks.[appliedclinicaltrialsonline+3](https://www.appliedclinicaltrialsonline.com/view/clinical-trial-matching-solutions-understanding-the-landscape)  
   * Parlant becomes the place where you encode “when do we ask a human for more information vs when do we refine a query vs when do we stop?”  
3. **Patient plan grounded in real‑world processes**  
   * The synthetic patient journey isn’t just “upload docs → list trials.”  
   * It follows actual clinical workflows: minimal dataset, prescreen, chart review, additional tests, and finally discussion/summary.[trialchoices+3](https://www.trialchoices.org/post/what-to-expect-during-the-clinical-trial-screening-process)  
4. **Minimal, testable contracts**  
   * PatientProfile, SearchAnchors, TrialCandidate, EligibilityLedger together give you:  
     * Places to measure MedGemma extraction F1.  
     * Places to plug TREC qrels (TrialCandidate → NDCG@10).[arxiv+2](https://arxiv.org/pdf/2202.07858.pdf)  
   * They’re small enough to implement quickly but rich enough to survive PoC → MVP.

Source: [https://www.perplexity.ai/search/simulate-as-an-experienced-cto-i6TIXOP9TX.rqA97awuc1Q?sm=d\#3](https://www.perplexity.ai/search/simulate-as-an-experienced-cto-i6TIXOP9TX.rqA97awuc1Q?sm=d#3)