# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

**FDAM AI Pipeline** - Fire Damage Assessment Methodology v4.0.1 implementation. An AI-powered system that generates professional Cleaning Specifications / Scope of Work documents for fire damage restoration.

- **Deployment**: HuggingFace Spaces with Nvidia L4 (22GB VRAM per GPU, single GPU used)
- **Local Dev**: RTX 4090 (24GB) - can run 4B model; use mock models for faster iteration
- **Spec Document**: `FDAM_AI_Pipeline_Technical_Spec.md` is the authoritative technical reference

## Critical Constraints

1. **No External API Calls** - 100% locally-owned models only (no Claude/OpenAI APIs)
2. **Memory Budget** - Single L4 (22GB): ~10GB vision (4B) + ~4GB embedding + ~4GB reranker (~18GB used, ~4GB headroom)
3. **Processing Time** - 60-90 seconds per assessment is acceptable
4. **MVP Scope** - Phase 1 (PRE) and Phase 2 (PRA) only; no lab results processing yet
5. **Static RAG** - Knowledge base is pre-indexed; no user document uploads

## Tech Stack

| Component | Technology |
|-----------|------------|
| UI Framework | Gradio 6.x |
| Vision | Qwen/Qwen3-VL-4B-Thinking (via vLLM, single GPU) |
| Embeddings | Qwen/Qwen3-VL-Embedding-2B (2048-dim) |
| Reranker | Qwen/Qwen3-VL-Reranker-2B |
| Inference | vLLM (single GPU, no tensor parallelism) |
| Vector Store | ChromaDB 0.4.x |
| Validation | Pydantic 2.x |
| PDF Generation | Pandoc 3.x |
| Package Manager | pip + requirements.txt |

## UI Components (Gradio 6.x)

**Simplified 2-Tab UI:** Input + Results/Chat.
Single-room workflow with integrated chat for Q&A and document modifications.

### Tab 1: Input
Uses `gr.Accordion` for collapsible sections:
- **Room Details** (open by default): Name, dimensions, ceiling height, facility classification, construction era
- **Images** (open by default): Multi-file upload, gallery preview, image count
- **Field Observations** (collapsed by default): 15 qualitative observation fields

### Tab 2: Results + Chat
- **Results Display**: Annotated gallery, assessment stats (JSON), SOW document (markdown)
- **Downloads**: Markdown and PDF export
- **Chat Interface**: Q&A about results, document modifications via `gr.Chatbot(type="messages")`
- **Quick Actions**: Pre-defined buttons for common queries

The frontend uses optimized input components:

| Field | Component | Notes |
|-------|-----------|-------|
| Room Name | `gr.Textbox` | Required field |
| Dimensions | `gr.Number` | Length, Width in feet |
| Ceiling Height | `gr.Dropdown` + custom option | 8-20 ft presets |
| Facility Classification | `gr.Radio` | operational, non-operational, public-childcare |
| Construction Era | `gr.Radio` | pre-1980, 1980-2000, post-2000 |
| Image Upload | `gr.Files(file_count="multiple")` | Batch upload, auto-assigned to room |
| Chat | `gr.Chatbot(type="messages")` | Gradio 6 messages format |

**Keyboard Shortcuts:**
- `Ctrl+1`: Navigate to Input tab
- `Ctrl+2`: Navigate to Results tab

## Development Commands

```sh
# Install dependencies
pip install -r requirements.txt

# Run locally with mock models
MOCK_MODELS=true python app.py

# Run with real models (HuggingFace only - requires A100)
python app.py

# Recommended tooling (install as dev dependencies)
ruff check .              # Linting
ruff format .             # Formatting
mypy .                    # Type checking
# Note: Tests removed - testing occurs on HuggingFace due to GPU/ChromaDB requirements
```

## Architecture

### 6-Stage Processing Pipeline
1. **Input Validation** - Pydantic schema validation (schemas/input.py)
2. **Vision Analysis** - Per-image zone/material/condition detection (pipeline/vision.py)
3. **RAG Retrieval** - Disposition lookup, thresholds, methods (rag/retriever.py)
4. **FDAM Logic** - Disposition matrix application (pipeline/main.py)
5. **Calculations** - Surface areas, ACH, labor estimates (pipeline/calculations.py)
6. **Document Generation** - SOW, sampling plan, confidence report (pipeline/generator.py)

### Target Project Structure
```
├── app.py                 # Gradio entry point
├── config/                # Inference and app settings
├── models/                # Model loading (mock vs real)
├── rag/                   # Chunking, vectorstore, retrieval
├── schemas/               # Pydantic input/output models
├── pipeline/              # Main processing logic + chat handler
│   └── chat.py            # Chat handler for Q&A and document mods
├── ui/                    # Gradio UI components
│   └── tabs/              # Tab modules
│       ├── input_tab.py   # Combined input (room + images + observations)
│       └── results_tab.py # Results display + chat interface
├── RAG-KB/                # Knowledge base source files
├── chroma_db/             # ChromaDB persistence (generated)
└── sample_images/         # Sample fire damage images for testing
```

## Domain Knowledge

### Zone Classifications
- **Burn Zone**: Direct fire involvement, structural char, exposed/damaged elements
- **Near-Field**: Adjacent to burn zone, heavy smoke/heat exposure, visible contamination
- **Far-Field**: Smoke migration only, light deposits, no structural damage

### Condition Levels
- **Background**: No visible contamination
- **Light**: Faint discoloration, minimal deposits
- **Moderate**: Visible film/deposits, surface color altered
- **Heavy**: Thick deposits, surface texture obscured
- **Structural Damage**: Physical damage requiring repair before cleaning

### Dispositions (FDAM §4.3)
- **No Action**: Document only
- **Clean**: Standard cleaning protocol
- **Evaluate**: Requires professional judgment
- **Remove**: Material must be removed
- **Remove/Repair**: Remove and repair/replace

### Facility Classifications (affects thresholds)
- **Operational**: Active workplace (higher thresholds: 500 µg/100cm² lead)
- **Non-Operational**: Unoccupied (lower thresholds: 22 µg/100cm² lead)
- **Public/Childcare**: Most stringent (EPA/HUD Oct 2024: 0.54 µg/100cm² floors)

### Key Calculations
- **ACH Formula**: `Units = (Volume × 4) / (CFM × 60)` per NADCA ACR 2021
- **Sample Density**: Varies by area size per FDAM §2.3
- **Ceiling Deck**: Enhanced sampling (1 per 2,500 SF per FDAM §4.5)

## RAG Knowledge Base

Source documents in `/RAG-KB/`:
- FDAM v4.0.1 methodology (primary reference)
- BNL SOP IH75190 (metals clearance thresholds)
- IICRC/RIA/CIRI Technical Guide (wildfire restoration)
- Lab method guides (PLM, ICP-MS)

**Chunking rules:**
- Keep tables intact (never split markdown tables)
- Preserve headers with content
- Include metadata (source, category, section)

## Confidence Framework

| Score | Level | Action |
|-------|-------|--------|
| ≥90% | Very High | Accept without review |
| 70-89% | High | Accept, note in report |
| 50-69% | Moderate | Flag for human review |
| <50% | Low | Require human verification |

## Model Loading

All 3 models are loaded at startup (~18GB total on single L4 GPU):

```python
from vllm import LLM, SamplingParams

# Vision model via vLLM (single GPU, no tensor parallelism)
vision_model = LLM(
    model="Qwen/Qwen3-VL-4B-Thinking",
    tensor_parallel_size=1,  # Single GPU
    trust_remote_code=True,
    gpu_memory_utilization=0.80,
    max_model_len=16384,
)

# Embedding and Reranker use official Qwen3VL loaders
from scripts.qwen3_vl import Qwen3VLEmbedder, Qwen3VLReranker
embedding_model = Qwen3VLEmbedder("Qwen/Qwen3-VL-Embedding-2B", torch_dtype=torch.bfloat16)
reranker_model = Qwen3VLReranker("Qwen/Qwen3-VL-Reranker-2B", torch_dtype=torch.bfloat16)
```

Expected memory usage (~18GB total on single L4):
- Vision model (4B BF16): ~10GB
- Embedding model (2B): ~4GB
- Reranker model (2B): ~4GB
- Headroom: ~4GB for KV cache and overhead

## Local Development Strategy

The RTX 4090 (24GB VRAM) can run the 4B model stack (~18GB). Two options:

**Option A: Real Models Locally**
1. Set `MOCK_MODELS=false` (or omit - defaults to false)
2. Models will download and load (~18GB VRAM)
3. Full inference testing locally

**Option B: Mock Models (faster iteration)**
1. Set `MOCK_MODELS=true` environment variable
2. Mock responses return realistic JSON matching vision output schema (2048-dim embeddings)
3. Test pipeline logic, UI, calculations without real inference

**Deployment:**
1. Deploy to HuggingFace Spaces for production testing
2. Request build logs after deployment to confirm success
3. After changing embedding dimensions, rebuild ChromaDB: `python -m rag.index_builder --rebuild`

## Code Style

- Use `Literal["a", "b", "c"]` unions instead of Enum for simple string choices
- Pydantic models for all input/output validation
- Explicit return types on public functions
- Result types or explicit error returns over thrown exceptions
- Group imports: stdlib → third-party → local

## WSL Note

Dev servers must be exposed for WSL access. Use `--host 0.0.0.0` with Gradio:
```python
app.launch(server_name="0.0.0.0", server_port=7860)
```