# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview **FDAM AI Pipeline** - Fire Damage Assessment Methodology v4.0.1 implementation. An AI-powered system that generates professional Cleaning Specifications / Scope of Work documents for fire damage restoration. - **Deployment**: HuggingFace Spaces with Nvidia L4 (22GB VRAM per GPU, single GPU used) - **Local Dev**: RTX 4090 (24GB) - can run 4B model; use mock models for faster iteration - **Spec Document**: `FDAM_AI_Pipeline_Technical_Spec.md` is the authoritative technical reference ## Critical Constraints 1. **No External API Calls** - 100% locally-owned models only (no Claude/OpenAI APIs) 2. **Memory Budget** - Single L4 (22GB): ~10GB vision (4B) + ~4GB embedding + ~4GB reranker (~18GB used, ~4GB headroom) 3. **Processing Time** - 60-90 seconds per assessment is acceptable 4. **MVP Scope** - Phase 1 (PRE) and Phase 2 (PRA) only; no lab results processing yet 5. **Static RAG** - Knowledge base is pre-indexed; no user document uploads ## Tech Stack | Component | Technology | |-----------|------------| | UI Framework | Gradio 6.x | | Vision | Qwen/Qwen3-VL-4B-Thinking (via vLLM, single GPU) | | Embeddings | Qwen/Qwen3-VL-Embedding-2B (2048-dim) | | Reranker | Qwen/Qwen3-VL-Reranker-2B | | Inference | vLLM (single GPU, no tensor parallelism) | | Vector Store | ChromaDB 0.4.x | | Validation | Pydantic 2.x | | PDF Generation | Pandoc 3.x | | Package Manager | pip + requirements.txt | ## UI Components (Gradio 6.x) **Simplified 2-Tab UI:** Input + Results/Chat. Single-room workflow with integrated chat for Q&A and document modifications. ### Tab 1: Input Uses `gr.Accordion` for collapsible sections: - **Room Details** (open by default): Name, dimensions, ceiling height, facility classification, construction era - **Images** (open by default): Multi-file upload, gallery preview, image count - **Field Observations** (collapsed by default): 15 qualitative observation fields ### Tab 2: Results + Chat - **Results Display**: Annotated gallery, assessment stats (JSON), SOW document (markdown) - **Downloads**: Markdown and PDF export - **Chat Interface**: Q&A about results, document modifications via `gr.Chatbot(type="messages")` - **Quick Actions**: Pre-defined buttons for common queries The frontend uses optimized input components: | Field | Component | Notes | |-------|-----------|-------| | Room Name | `gr.Textbox` | Required field | | Dimensions | `gr.Number` | Length, Width in feet | | Ceiling Height | `gr.Dropdown` + custom option | 8-20 ft presets | | Facility Classification | `gr.Radio` | operational, non-operational, public-childcare | | Construction Era | `gr.Radio` | pre-1980, 1980-2000, post-2000 | | Image Upload | `gr.Files(file_count="multiple")` | Batch upload, auto-assigned to room | | Chat | `gr.Chatbot(type="messages")` | Gradio 6 messages format | **Keyboard Shortcuts:** - `Ctrl+1`: Navigate to Input tab - `Ctrl+2`: Navigate to Results tab ## Development Commands ```sh # Install dependencies pip install -r requirements.txt # Run locally with mock models MOCK_MODELS=true python app.py # Run with real models (HuggingFace only - requires A100) python app.py # Recommended tooling (install as dev dependencies) ruff check . # Linting ruff format . # Formatting mypy . # Type checking # Note: Tests removed - testing occurs on HuggingFace due to GPU/ChromaDB requirements ``` ## Architecture ### 6-Stage Processing Pipeline 1. **Input Validation** - Pydantic schema validation (schemas/input.py) 2. **Vision Analysis** - Per-image zone/material/condition detection (pipeline/vision.py) 3. **RAG Retrieval** - Disposition lookup, thresholds, methods (rag/retriever.py) 4. **FDAM Logic** - Disposition matrix application (pipeline/main.py) 5. **Calculations** - Surface areas, ACH, labor estimates (pipeline/calculations.py) 6. **Document Generation** - SOW, sampling plan, confidence report (pipeline/generator.py) ### Target Project Structure ``` ├── app.py # Gradio entry point ├── config/ # Inference and app settings ├── models/ # Model loading (mock vs real) ├── rag/ # Chunking, vectorstore, retrieval ├── schemas/ # Pydantic input/output models ├── pipeline/ # Main processing logic + chat handler │ └── chat.py # Chat handler for Q&A and document mods ├── ui/ # Gradio UI components │ └── tabs/ # Tab modules │ ├── input_tab.py # Combined input (room + images + observations) │ └── results_tab.py # Results display + chat interface ├── RAG-KB/ # Knowledge base source files ├── chroma_db/ # ChromaDB persistence (generated) └── sample_images/ # Sample fire damage images for testing ``` ## Domain Knowledge ### Zone Classifications - **Burn Zone**: Direct fire involvement, structural char, exposed/damaged elements - **Near-Field**: Adjacent to burn zone, heavy smoke/heat exposure, visible contamination - **Far-Field**: Smoke migration only, light deposits, no structural damage ### Condition Levels - **Background**: No visible contamination - **Light**: Faint discoloration, minimal deposits - **Moderate**: Visible film/deposits, surface color altered - **Heavy**: Thick deposits, surface texture obscured - **Structural Damage**: Physical damage requiring repair before cleaning ### Dispositions (FDAM §4.3) - **No Action**: Document only - **Clean**: Standard cleaning protocol - **Evaluate**: Requires professional judgment - **Remove**: Material must be removed - **Remove/Repair**: Remove and repair/replace ### Facility Classifications (affects thresholds) - **Operational**: Active workplace (higher thresholds: 500 µg/100cm² lead) - **Non-Operational**: Unoccupied (lower thresholds: 22 µg/100cm² lead) - **Public/Childcare**: Most stringent (EPA/HUD Oct 2024: 0.54 µg/100cm² floors) ### Key Calculations - **ACH Formula**: `Units = (Volume × 4) / (CFM × 60)` per NADCA ACR 2021 - **Sample Density**: Varies by area size per FDAM §2.3 - **Ceiling Deck**: Enhanced sampling (1 per 2,500 SF per FDAM §4.5) ## RAG Knowledge Base Source documents in `/RAG-KB/`: - FDAM v4.0.1 methodology (primary reference) - BNL SOP IH75190 (metals clearance thresholds) - IICRC/RIA/CIRI Technical Guide (wildfire restoration) - Lab method guides (PLM, ICP-MS) **Chunking rules:** - Keep tables intact (never split markdown tables) - Preserve headers with content - Include metadata (source, category, section) ## Confidence Framework | Score | Level | Action | |-------|-------|--------| | ≥90% | Very High | Accept without review | | 70-89% | High | Accept, note in report | | 50-69% | Moderate | Flag for human review | | <50% | Low | Require human verification | ## Model Loading All 3 models are loaded at startup (~18GB total on single L4 GPU): ```python from vllm import LLM, SamplingParams # Vision model via vLLM (single GPU, no tensor parallelism) vision_model = LLM( model="Qwen/Qwen3-VL-4B-Thinking", tensor_parallel_size=1, # Single GPU trust_remote_code=True, gpu_memory_utilization=0.80, max_model_len=16384, ) # Embedding and Reranker use official Qwen3VL loaders from scripts.qwen3_vl import Qwen3VLEmbedder, Qwen3VLReranker embedding_model = Qwen3VLEmbedder("Qwen/Qwen3-VL-Embedding-2B", torch_dtype=torch.bfloat16) reranker_model = Qwen3VLReranker("Qwen/Qwen3-VL-Reranker-2B", torch_dtype=torch.bfloat16) ``` Expected memory usage (~18GB total on single L4): - Vision model (4B BF16): ~10GB - Embedding model (2B): ~4GB - Reranker model (2B): ~4GB - Headroom: ~4GB for KV cache and overhead ## Local Development Strategy The RTX 4090 (24GB VRAM) can run the 4B model stack (~18GB). Two options: **Option A: Real Models Locally** 1. Set `MOCK_MODELS=false` (or omit - defaults to false) 2. Models will download and load (~18GB VRAM) 3. Full inference testing locally **Option B: Mock Models (faster iteration)** 1. Set `MOCK_MODELS=true` environment variable 2. Mock responses return realistic JSON matching vision output schema (2048-dim embeddings) 3. Test pipeline logic, UI, calculations without real inference **Deployment:** 1. Deploy to HuggingFace Spaces for production testing 2. Request build logs after deployment to confirm success 3. After changing embedding dimensions, rebuild ChromaDB: `python -m rag.index_builder --rebuild` ## Code Style - Use `Literal["a", "b", "c"]` unions instead of Enum for simple string choices - Pydantic models for all input/output validation - Explicit return types on public functions - Result types or explicit error returns over thrown exceptions - Group imports: stdlib → third-party → local ## WSL Note Dev servers must be exposed for WSL access. Use `--host 0.0.0.0` with Gradio: ```python app.launch(server_name="0.0.0.0", server_port=7860) ```