Speed up CPU inference: halve token limits, pre-download models, fix OMP threads 4af4003 cgoodmaker Claude Opus 4.6 commited on Mar 2
Use bfloat16 on CPU to halve memory (8GB vs 16GB float32) 0989643 cgoodmaker Claude Opus 4.6 commited on Feb 26
Fix MCP subprocess deadlock: use stderr=None instead of PIPE da343a7 cgoodmaker Claude Opus 4.6 commited on Feb 23
Add timeout and stderr logging to MCP subprocess to debug tool hangs c376e14 cgoodmaker Claude Opus 4.6 commited on Feb 23
Remove unused files: old Gradio frontend, dead model code, test artifacts 672ed11 cgoodmaker Claude Opus 4.6 commited on Feb 23
Force MCP tool models to CPU to avoid GPU VRAM contention with MedGemma 1a97904 cgoodmaker Claude Opus 4.6 commited on Feb 23
Add RAG Phase 4 management guidance, rebuild guidelines index (286 chunks), post-analysis hint UI 5241b71 cgoodmaker Claude Opus 4.6 commited on Feb 23
Use dtype instead of deprecated torch_dtype in model_kwargs 82f82ac cgoodmaker Claude Opus 4.6 commited on Feb 23
Redesign chat UI and fix MedGemma generation config issues 58a4476 cgoodmaker Claude Opus 4.6 commited on Feb 23