TrialPath / CLAUDE.md
yakilee's picture
docs: add lessons learned and cognitive notes to CLAUDE.md
51220b7

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

TrialPath is an AI-powered clinical trial matching system for NSCLC (Non-Small Cell Lung Cancer) patients. Currently in PoC phase β€” models, service stubs, and UI with mock data are implemented; live AI integrations are pending.

Core idea: Help patients understand which clinical trials they may qualify for, transform "rejection" into "actionable next steps" via gap analysis.

Architecture

See architecture/overview.md for full architecture diagram, data flow, component details, and implementation status.

5 Components: Streamlit UI β†’ Parlant Orchestrator β†’ MedGemma 4B (extraction) + Gemini 3 Pro (planning) + ClinicalTrials MCP Server (search)

5 Data Contracts (Pydantic v2 in trialpath/models/): PatientProfile, SearchAnchors, TrialCandidate, EligibilityLedger, SearchLog

Project Structure

trialpath/                  # Backend module
  models/                   # 5 Pydantic v2 data contracts (implemented)
  services/                 # 4 service stubs: medgemma, gemini, mcp, parlant
  agent/                    # Parlant journey logic (not yet implemented)
  tests/                    # Backend TDD tests (37+ model, 33 service)
app/                        # Streamlit frontend
  pages/                    # 5-page journey (upload β†’ profile β†’ matching β†’ gaps β†’ summary)
  components/               # 6 reusable widgets
  services/                 # State manager, parlant client, mock data
  tests/                    # Frontend TDD tests (30+ component, 5 page)
tests/                      # Integration tests (18 tests)
architecture/               # Architecture documentation
docs/                       # Design docs and TDD guides

Documents

  • docs/Trialpath PRD.md β€” Product requirements, success metrics, HAI-DEF submission plan
  • docs/TrialPath AI technical design.md β€” Technical architecture, data contracts, Parlant workflow
  • docs/tdd-guide-*.md β€” TDD implementation guides (backend, frontend, data/eval)
  • architecture/overview.md β€” Architecture overview, data flow, component status

Tech Stack

  • Python 3.11+ (Streamlit + Pydantic v2)
  • Google Gemini 3 Pro (orchestration) β€” stubbed
  • MedGemma 4B via Hugging Face endpoint (multimodal extraction) β€” stubbed
  • Parlant (agentic workflow engine) β€” client ready, agent pending
  • ClinicalTrials MCP Server (ClinicalTrials.gov API v2) β€” client ready

Success Targets

  • MedGemma Extraction F1 >= 0.85
  • Trial Retrieval Recall@50 >= 0.75
  • Trial Ranking NDCG@10 >= 0.60
  • Criterion Decision Accuracy >= 0.85
  • Latency < 15s, Cost < $0.50/session

Scope

  • Disease: NSCLC only
  • Data: Synthetic patients only (no real PHI)
  • Timeline: 3-month PoC

Dev tools

  • use huggingface cli for model deployment
  • use uv, ruff, astral ty
  • use ripgrep for exploring codebase

Commit atomically

Always commit atomically to build a clear git history for the larger dev team

ALWAYS run scripts (bash/tests) in the background

  • you MUST always run the scripts in background to unblock the main context window;
  • When using timeout, it must be under 1 minute.

Lessons Learned (from past errors)

Async/Sync: never use asyncio.run() in Streamlit

  • Streamlit has its own event loop; asyncio.run() will raise RuntimeError: This event loop is already running
  • Use ThreadPoolExecutor + asyncio.run in a background thread as sync bridge
  • If a method is declared async, verify the body actually awaits async I/O β€” don't wrap sync blocking calls in async def without asyncio.to_thread

Mocks must match real implementation

  • Before writing test mocks, READ the actual service code first
  • Example: MCP client switched from client.post() to client.stream() but tests still mocked .post() β†’ all tests passed locally but broke on integration
  • Always verify mock signatures against the real method being called

Python import/path conflicts

  • Never place an entrypoint file inside a package with the same name (e.g., app/app.py inside app/ package)
  • Streamlit adds parent dirs to sys.path, creating ambiguous imports

Git hygiene

  • Always check .gitignore before committing; never commit __pycache__/, .env, or binary files
  • Use git diff --staged to review before every commit

Test stability

  • Centralize mock data in conftest.py shared fixtures, not inline per-test
  • When data contracts change, update fixtures in ONE place

Bash output: prefer dedicated tools

  • Use Read/Grep/Glob instead of bash pipes for file operations
  • Keep bash commands simple and single-purpose; complex piped commands risk misreading output
  • Always read the FULL output of bash commands before drawing conclusions

Cognitive Lessons (avoid repeating these thinking errors)

Know where configs live β€” don't re-discover every session

  • ALL env vars and defaults: trialpath/config.py (single source of truth)
  • Key env vars: GEMINI_API_KEY, GEMINI_MODEL (gemini-3-pro), HF_TOKEN, MEDGEMMA_ENDPOINT_URL, MCP_URL (:3000), PARLANT_URL (:8800), SESSION_COST_BUDGET
  • MedGemma retry settings: MEDGEMMA_MAX_RETRIES, MEDGEMMA_RETRY_BACKOFF, MEDGEMMA_MAX_WAIT, MEDGEMMA_COLD_START_TIMEOUT
  • .env file is gitignored β€” never commit it again (API keys were leaked once in commit 53efc3c)
  • Config consumers: gemini_planner, medgemma_extractor, mcp_client, parlant_bridge, agent/tools, direct_pipeline

Don't flip-flop on implementation decisions

  • max_output_tokens was added (65536) to fix truncation, then removed to "use defaults", causing regressions
  • os.environ.get() inline was refactored to config imports, touching 6+ files each time
  • LESSON: Make the decision ONCE with reasoning, document it, stick with it

Remember the project's fallback chain

  • Pipeline has 3-tier fallback: Parlant β†’ direct API (direct_pipeline.py) β†’ mock data
  • Demo mode bypasses file upload and loads MOCK_PATIENT_PROFILE directly
  • Don't re-implement fallback logic β€” it already exists in direct_pipeline.py

Read existing code before writing new code

  • Service instances were re-created per call in agent/tools.py until caching fix
  • This pattern (wasteful instantiation) could have been caught by reading the code first
  • ALWAYS read the file you're about to modify, especially service constructors

Don't lose track of what's stubbed vs real

  • MedGemma: real HF endpoint wired (with retry/cold-start logic)
  • Gemini: real API wired (with rate limiting)
  • MCP/ClinicalTrials: has both MCP client AND direct API fallback
  • Parlant: client ready, agent journey logic NOT yet implemented
  • UI: all 5 pages functional with mock data fallback

Centralize shared state β€” don't scatter it

  • Streamlit state keys: patient_profile, trial_candidates, eligibility_ledgers, parlant_session_id, parlant_session_active, last_event_offset, journey_state
  • Test fixtures: centralized in conftest.py (root level), not per-test-file
  • Mock data: app/services/mock_data.py (single file for all mock objects)