# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

TrialPath is an AI-powered clinical trial matching system for NSCLC (Non-Small Cell Lung Cancer) patients. Currently in **PoC phase** — models, service stubs, and UI with mock data are implemented; live AI integrations are pending.

**Core idea:** Help patients understand which clinical trials they may qualify for, transform "rejection" into "actionable next steps" via gap analysis.

## Architecture

See `architecture/overview.md` for full architecture diagram, data flow, component details, and implementation status.

**5 Components**: Streamlit UI → Parlant Orchestrator → MedGemma 4B (extraction) + Gemini 3 Pro (planning) + ClinicalTrials MCP Server (search)

**5 Data Contracts** (Pydantic v2 in `trialpath/models/`): `PatientProfile`, `SearchAnchors`, `TrialCandidate`, `EligibilityLedger`, `SearchLog`

## Project Structure

```
trialpath/                  # Backend module
  models/                   # 5 Pydantic v2 data contracts (implemented)
  services/                 # 4 service stubs: medgemma, gemini, mcp, parlant
  agent/                    # Parlant journey logic (not yet implemented)
  tests/                    # Backend TDD tests (37+ model, 33 service)
app/                        # Streamlit frontend
  pages/                    # 5-page journey (upload → profile → matching → gaps → summary)
  components/               # 6 reusable widgets
  services/                 # State manager, parlant client, mock data
  tests/                    # Frontend TDD tests (30+ component, 5 page)
tests/                      # Integration tests (18 tests)
architecture/               # Architecture documentation
docs/                       # Design docs and TDD guides
```

## Documents

- `docs/Trialpath PRD.md` — Product requirements, success metrics, HAI-DEF submission plan
- `docs/TrialPath AI technical design.md` — Technical architecture, data contracts, Parlant workflow
- `docs/tdd-guide-*.md` — TDD implementation guides (backend, frontend, data/eval)
- `architecture/overview.md` — Architecture overview, data flow, component status

## Tech Stack

- Python 3.11+ (Streamlit + Pydantic v2)
- Google Gemini 3 Pro (orchestration) — stubbed
- MedGemma 4B via Hugging Face endpoint (multimodal extraction) — stubbed
- Parlant (agentic workflow engine) — client ready, agent pending
- ClinicalTrials MCP Server (ClinicalTrials.gov API v2) — client ready

## Success Targets

- MedGemma Extraction F1 >= 0.85
- Trial Retrieval Recall@50 >= 0.75
- Trial Ranking NDCG@10 >= 0.60
- Criterion Decision Accuracy >= 0.85
- Latency < 15s, Cost < $0.50/session

## Scope

- Disease: NSCLC only
- Data: Synthetic patients only (no real PHI)
- Timeline: 3-month PoC


## Dev tools 

- use `huggingface` cli for model deployment 
- use `uv`, `ruff`, astral `ty`
- use `ripgrep` for exploring codebase 


## Commit atomically 
Always commit atomically to build a clear git history for the larger dev team 

## ALWAYS run scripts (bash/tests) in the background
- you MUST always run the scripts in background to unblock the main context window;
- When using timeout, it must be under 1 minute.

## Lessons Learned (from past errors)

### Async/Sync: never use asyncio.run() in Streamlit
- Streamlit has its own event loop; `asyncio.run()` will raise `RuntimeError: This event loop is already running`
- Use `ThreadPoolExecutor` + `asyncio.run` in a background thread as sync bridge
- If a method is declared `async`, verify the body actually awaits async I/O — don't wrap sync blocking calls in `async def` without `asyncio.to_thread`

### Mocks must match real implementation
- Before writing test mocks, READ the actual service code first
- Example: MCP client switched from `client.post()` to `client.stream()` but tests still mocked `.post()` → all tests passed locally but broke on integration
- Always verify mock signatures against the real method being called

### Python import/path conflicts
- Never place an entrypoint file inside a package with the same name (e.g., `app/app.py` inside `app/` package)
- Streamlit adds parent dirs to `sys.path`, creating ambiguous imports

### Git hygiene
- Always check `.gitignore` before committing; never commit `__pycache__/`, `.env`, or binary files
- Use `git diff --staged` to review before every commit

### Test stability
- Centralize mock data in `conftest.py` shared fixtures, not inline per-test
- When data contracts change, update fixtures in ONE place

### Bash output: prefer dedicated tools
- Use Read/Grep/Glob instead of bash pipes for file operations
- Keep bash commands simple and single-purpose; complex piped commands risk misreading output
- Always read the FULL output of bash commands before drawing conclusions

## Cognitive Lessons (avoid repeating these thinking errors)

### Know where configs live — don't re-discover every session
- ALL env vars and defaults: `trialpath/config.py` (single source of truth)
- Key env vars: `GEMINI_API_KEY`, `GEMINI_MODEL` (gemini-3-pro), `HF_TOKEN`, `MEDGEMMA_ENDPOINT_URL`, `MCP_URL` (:3000), `PARLANT_URL` (:8800), `SESSION_COST_BUDGET`
- MedGemma retry settings: `MEDGEMMA_MAX_RETRIES`, `MEDGEMMA_RETRY_BACKOFF`, `MEDGEMMA_MAX_WAIT`, `MEDGEMMA_COLD_START_TIMEOUT`
- `.env` file is gitignored — never commit it again (API keys were leaked once in commit 53efc3c)
- Config consumers: gemini_planner, medgemma_extractor, mcp_client, parlant_bridge, agent/tools, direct_pipeline

### Don't flip-flop on implementation decisions
- `max_output_tokens` was added (65536) to fix truncation, then removed to "use defaults", causing regressions
- `os.environ.get()` inline was refactored to config imports, touching 6+ files each time
- LESSON: Make the decision ONCE with reasoning, document it, stick with it

### Remember the project's fallback chain
- Pipeline has 3-tier fallback: Parlant → direct API (direct_pipeline.py) → mock data
- Demo mode bypasses file upload and loads MOCK_PATIENT_PROFILE directly
- Don't re-implement fallback logic — it already exists in `direct_pipeline.py`

### Read existing code before writing new code
- Service instances were re-created per call in agent/tools.py until caching fix
- This pattern (wasteful instantiation) could have been caught by reading the code first
- ALWAYS read the file you're about to modify, especially service constructors

### Don't lose track of what's stubbed vs real
- MedGemma: real HF endpoint wired (with retry/cold-start logic)
- Gemini: real API wired (with rate limiting)
- MCP/ClinicalTrials: has both MCP client AND direct API fallback
- Parlant: client ready, agent journey logic NOT yet implemented
- UI: all 5 pages functional with mock data fallback

### Centralize shared state — don't scatter it
- Streamlit state keys: `patient_profile`, `trial_candidates`, `eligibility_ledgers`, `parlant_session_id`, `parlant_session_active`, `last_event_offset`, `journey_state`
- Test fixtures: centralized in `conftest.py` (root level), not per-test-file
- Mock data: `app/services/mock_data.py` (single file for all mock objects)