CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
TrialPath is an AI-powered clinical trial matching system for NSCLC (Non-Small Cell Lung Cancer) patients. Currently in PoC phase β models, service stubs, and UI with mock data are implemented; live AI integrations are pending.
Core idea: Help patients understand which clinical trials they may qualify for, transform "rejection" into "actionable next steps" via gap analysis.
Architecture
See architecture/overview.md for full architecture diagram, data flow, component details, and implementation status.
5 Components: Streamlit UI β Parlant Orchestrator β MedGemma 4B (extraction) + Gemini 3 Pro (planning) + ClinicalTrials MCP Server (search)
5 Data Contracts (Pydantic v2 in trialpath/models/): PatientProfile, SearchAnchors, TrialCandidate, EligibilityLedger, SearchLog
Project Structure
trialpath/ # Backend module
models/ # 5 Pydantic v2 data contracts (implemented)
services/ # 4 service stubs: medgemma, gemini, mcp, parlant
agent/ # Parlant journey logic (not yet implemented)
tests/ # Backend TDD tests (37+ model, 33 service)
app/ # Streamlit frontend
pages/ # 5-page journey (upload β profile β matching β gaps β summary)
components/ # 6 reusable widgets
services/ # State manager, parlant client, mock data
tests/ # Frontend TDD tests (30+ component, 5 page)
tests/ # Integration tests (18 tests)
architecture/ # Architecture documentation
docs/ # Design docs and TDD guides
Documents
docs/Trialpath PRD.mdβ Product requirements, success metrics, HAI-DEF submission plandocs/TrialPath AI technical design.mdβ Technical architecture, data contracts, Parlant workflowdocs/tdd-guide-*.mdβ TDD implementation guides (backend, frontend, data/eval)architecture/overview.mdβ Architecture overview, data flow, component status
Tech Stack
- Python 3.11+ (Streamlit + Pydantic v2)
- Google Gemini 3 Pro (orchestration) β stubbed
- MedGemma 4B via Hugging Face endpoint (multimodal extraction) β stubbed
- Parlant (agentic workflow engine) β client ready, agent pending
- ClinicalTrials MCP Server (ClinicalTrials.gov API v2) β client ready
Success Targets
- MedGemma Extraction F1 >= 0.85
- Trial Retrieval Recall@50 >= 0.75
- Trial Ranking NDCG@10 >= 0.60
- Criterion Decision Accuracy >= 0.85
- Latency < 15s, Cost < $0.50/session
Scope
- Disease: NSCLC only
- Data: Synthetic patients only (no real PHI)
- Timeline: 3-month PoC
Dev tools
- use
huggingfacecli for model deployment - use
uv,ruff, astralty - use
ripgrepfor exploring codebase
Commit atomically
Always commit atomically to build a clear git history for the larger dev team
ALWAYS run scripts (bash/tests) in the background
- you MUST always run the scripts in background to unblock the main context window;
- When using timeout, it must be under 1 minute.
Lessons Learned (from past errors)
Async/Sync: never use asyncio.run() in Streamlit
- Streamlit has its own event loop;
asyncio.run()will raiseRuntimeError: This event loop is already running - Use
ThreadPoolExecutor+asyncio.runin a background thread as sync bridge - If a method is declared
async, verify the body actually awaits async I/O β don't wrap sync blocking calls inasync defwithoutasyncio.to_thread
Mocks must match real implementation
- Before writing test mocks, READ the actual service code first
- Example: MCP client switched from
client.post()toclient.stream()but tests still mocked.post()β all tests passed locally but broke on integration - Always verify mock signatures against the real method being called
Python import/path conflicts
- Never place an entrypoint file inside a package with the same name (e.g.,
app/app.pyinsideapp/package) - Streamlit adds parent dirs to
sys.path, creating ambiguous imports
Git hygiene
- Always check
.gitignorebefore committing; never commit__pycache__/,.env, or binary files - Use
git diff --stagedto review before every commit
Test stability
- Centralize mock data in
conftest.pyshared fixtures, not inline per-test - When data contracts change, update fixtures in ONE place
Bash output: prefer dedicated tools
- Use Read/Grep/Glob instead of bash pipes for file operations
- Keep bash commands simple and single-purpose; complex piped commands risk misreading output
- Always read the FULL output of bash commands before drawing conclusions
Cognitive Lessons (avoid repeating these thinking errors)
Know where configs live β don't re-discover every session
- ALL env vars and defaults:
trialpath/config.py(single source of truth) - Key env vars:
GEMINI_API_KEY,GEMINI_MODEL(gemini-3-pro),HF_TOKEN,MEDGEMMA_ENDPOINT_URL,MCP_URL(:3000),PARLANT_URL(:8800),SESSION_COST_BUDGET - MedGemma retry settings:
MEDGEMMA_MAX_RETRIES,MEDGEMMA_RETRY_BACKOFF,MEDGEMMA_MAX_WAIT,MEDGEMMA_COLD_START_TIMEOUT .envfile is gitignored β never commit it again (API keys were leaked once in commit 53efc3c)- Config consumers: gemini_planner, medgemma_extractor, mcp_client, parlant_bridge, agent/tools, direct_pipeline
Don't flip-flop on implementation decisions
max_output_tokenswas added (65536) to fix truncation, then removed to "use defaults", causing regressionsos.environ.get()inline was refactored to config imports, touching 6+ files each time- LESSON: Make the decision ONCE with reasoning, document it, stick with it
Remember the project's fallback chain
- Pipeline has 3-tier fallback: Parlant β direct API (direct_pipeline.py) β mock data
- Demo mode bypasses file upload and loads MOCK_PATIENT_PROFILE directly
- Don't re-implement fallback logic β it already exists in
direct_pipeline.py
Read existing code before writing new code
- Service instances were re-created per call in agent/tools.py until caching fix
- This pattern (wasteful instantiation) could have been caught by reading the code first
- ALWAYS read the file you're about to modify, especially service constructors
Don't lose track of what's stubbed vs real
- MedGemma: real HF endpoint wired (with retry/cold-start logic)
- Gemini: real API wired (with rate limiting)
- MCP/ClinicalTrials: has both MCP client AND direct API fallback
- Parlant: client ready, agent journey logic NOT yet implemented
- UI: all 5 pages functional with mock data fallback
Centralize shared state β don't scatter it
- Streamlit state keys:
patient_profile,trial_candidates,eligibility_ledgers,parlant_session_id,parlant_session_active,last_event_offset,journey_state - Test fixtures: centralized in
conftest.py(root level), not per-test-file - Mock data:
app/services/mock_data.py(single file for all mock objects)