# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview TrialPath is an AI-powered clinical trial matching system for NSCLC (Non-Small Cell Lung Cancer) patients. Currently in **PoC phase** — models, service stubs, and UI with mock data are implemented; live AI integrations are pending. **Core idea:** Help patients understand which clinical trials they may qualify for, transform "rejection" into "actionable next steps" via gap analysis. ## Architecture See `architecture/overview.md` for full architecture diagram, data flow, component details, and implementation status. **5 Components**: Streamlit UI → Parlant Orchestrator → MedGemma 4B (extraction) + Gemini 3 Pro (planning) + ClinicalTrials MCP Server (search) **5 Data Contracts** (Pydantic v2 in `trialpath/models/`): `PatientProfile`, `SearchAnchors`, `TrialCandidate`, `EligibilityLedger`, `SearchLog` ## Project Structure ``` trialpath/ # Backend module models/ # 5 Pydantic v2 data contracts (implemented) services/ # 4 service stubs: medgemma, gemini, mcp, parlant agent/ # Parlant journey logic (not yet implemented) tests/ # Backend TDD tests (37+ model, 33 service) app/ # Streamlit frontend pages/ # 5-page journey (upload → profile → matching → gaps → summary) components/ # 6 reusable widgets services/ # State manager, parlant client, mock data tests/ # Frontend TDD tests (30+ component, 5 page) tests/ # Integration tests (18 tests) architecture/ # Architecture documentation docs/ # Design docs and TDD guides ``` ## Documents - `docs/Trialpath PRD.md` — Product requirements, success metrics, HAI-DEF submission plan - `docs/TrialPath AI technical design.md` — Technical architecture, data contracts, Parlant workflow - `docs/tdd-guide-*.md` — TDD implementation guides (backend, frontend, data/eval) - `architecture/overview.md` — Architecture overview, data flow, component status ## Tech Stack - Python 3.11+ (Streamlit + Pydantic v2) - Google Gemini 3 Pro (orchestration) — stubbed - MedGemma 4B via Hugging Face endpoint (multimodal extraction) — stubbed - Parlant (agentic workflow engine) — client ready, agent pending - ClinicalTrials MCP Server (ClinicalTrials.gov API v2) — client ready ## Success Targets - MedGemma Extraction F1 >= 0.85 - Trial Retrieval Recall@50 >= 0.75 - Trial Ranking NDCG@10 >= 0.60 - Criterion Decision Accuracy >= 0.85 - Latency < 15s, Cost < $0.50/session ## Scope - Disease: NSCLC only - Data: Synthetic patients only (no real PHI) - Timeline: 3-month PoC ## Dev tools - use `huggingface` cli for model deployment - use `uv`, `ruff`, astral `ty` - use `ripgrep` for exploring codebase ## Commit atomically Always commit atomically to build a clear git history for the larger dev team ## ALWAYS run scripts (bash/tests) in the background - you MUST always run the scripts in background to unblock the main context window; - When using timeout, it must be under 1 minute. ## Lessons Learned (from past errors) ### Async/Sync: never use asyncio.run() in Streamlit - Streamlit has its own event loop; `asyncio.run()` will raise `RuntimeError: This event loop is already running` - Use `ThreadPoolExecutor` + `asyncio.run` in a background thread as sync bridge - If a method is declared `async`, verify the body actually awaits async I/O — don't wrap sync blocking calls in `async def` without `asyncio.to_thread` ### Mocks must match real implementation - Before writing test mocks, READ the actual service code first - Example: MCP client switched from `client.post()` to `client.stream()` but tests still mocked `.post()` → all tests passed locally but broke on integration - Always verify mock signatures against the real method being called ### Python import/path conflicts - Never place an entrypoint file inside a package with the same name (e.g., `app/app.py` inside `app/` package) - Streamlit adds parent dirs to `sys.path`, creating ambiguous imports ### Git hygiene - Always check `.gitignore` before committing; never commit `__pycache__/`, `.env`, or binary files - Use `git diff --staged` to review before every commit ### Test stability - Centralize mock data in `conftest.py` shared fixtures, not inline per-test - When data contracts change, update fixtures in ONE place ### Bash output: prefer dedicated tools - Use Read/Grep/Glob instead of bash pipes for file operations - Keep bash commands simple and single-purpose; complex piped commands risk misreading output - Always read the FULL output of bash commands before drawing conclusions ## Cognitive Lessons (avoid repeating these thinking errors) ### Know where configs live — don't re-discover every session - ALL env vars and defaults: `trialpath/config.py` (single source of truth) - Key env vars: `GEMINI_API_KEY`, `GEMINI_MODEL` (gemini-3-pro), `HF_TOKEN`, `MEDGEMMA_ENDPOINT_URL`, `MCP_URL` (:3000), `PARLANT_URL` (:8800), `SESSION_COST_BUDGET` - MedGemma retry settings: `MEDGEMMA_MAX_RETRIES`, `MEDGEMMA_RETRY_BACKOFF`, `MEDGEMMA_MAX_WAIT`, `MEDGEMMA_COLD_START_TIMEOUT` - `.env` file is gitignored — never commit it again (API keys were leaked once in commit 53efc3c) - Config consumers: gemini_planner, medgemma_extractor, mcp_client, parlant_bridge, agent/tools, direct_pipeline ### Don't flip-flop on implementation decisions - `max_output_tokens` was added (65536) to fix truncation, then removed to "use defaults", causing regressions - `os.environ.get()` inline was refactored to config imports, touching 6+ files each time - LESSON: Make the decision ONCE with reasoning, document it, stick with it ### Remember the project's fallback chain - Pipeline has 3-tier fallback: Parlant → direct API (direct_pipeline.py) → mock data - Demo mode bypasses file upload and loads MOCK_PATIENT_PROFILE directly - Don't re-implement fallback logic — it already exists in `direct_pipeline.py` ### Read existing code before writing new code - Service instances were re-created per call in agent/tools.py until caching fix - This pattern (wasteful instantiation) could have been caught by reading the code first - ALWAYS read the file you're about to modify, especially service constructors ### Don't lose track of what's stubbed vs real - MedGemma: real HF endpoint wired (with retry/cold-start logic) - Gemini: real API wired (with rate limiting) - MCP/ClinicalTrials: has both MCP client AND direct API fallback - Parlant: client ready, agent journey logic NOT yet implemented - UI: all 5 pages functional with mock data fallback ### Centralize shared state — don't scatter it - Streamlit state keys: `patient_profile`, `trial_candidates`, `eligibility_ledgers`, `parlant_session_id`, `parlant_session_active`, `last_event_offset`, `journey_state` - Test fixtures: centralized in `conftest.py` (root level), not per-test-file - Mock data: `app/services/mock_data.py` (single file for all mock objects)