File size: 22,015 Bytes
c11a2f8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
# CORTEX β€” Agentic Graph RAG Platform

> **CORTEX** is a production-grade, agentic Knowledge Graph platform that transforms unstructured documents and web content into an intelligent, queryable knowledge graph β€” with a full-featured React UI, streaming AI chat, real-time graph visualization, simulation personas, and deep ontology governance.

---

## ✨ What's Been Built

### πŸ–₯️ Full-Stack Application

| Layer | Stack |
|---|---|
| **Backend API** | FastAPI (async) + Python 3.12 |
| **Task Queue** | Celery + Redis |
| **Graph + Vector DB** | Neo4j 5.x (unified) |
| **LLM Layer** | OpenAI, Anthropic, Google Gemini, Ollama |
| **Frontend** | React 18 + TypeScript + Vite |
| **Unified Start** | `npm run rag` (concurrently launches all 3 processes) |

---

## πŸš€ Features

### πŸ“₯ Document Ingestion Pipeline

- **Multi-format ingestion**: PDF, TXT, MD, DOCX, CSV, XLSX, PPTX, JSON
- **Web scraping**: Single-page scrape via `POST /api/documents/scrape`
- **Deep web crawling**: Multi-depth Playwright-powered crawler (`POST /api/documents/crawl`) via Crawl4AI
- **Async Celery workers**: Upload returns instantly with a `task_id`; background workers build the graph
- **Re-ingest**: Admin can trigger re-processing of any stored document
- **Document preview & download**: In-browser preview of text/Markdown; PDF download via API

### πŸ”­ Ontology Management

- **Auto-generation**: LLM analyzes document chunks to propose entity types & relationship types
- **LLM-powered refinement**: `POST /api/ontology/refine` β€” refine schema with optional human feedback
- **Versioning**: Each schema change bumps the version (`v1.0` β†’ `v1.1`, etc.)
- **Document-scoped stats**: `/api/ontology/stats?document_id=...` returns entity/relationship breakdowns for a specific document
- **Visual editor**: Ontology view in UI with editable entity types and relationship types
- **Ontology Drift Detection**: Automated drift detection compares live graph against new chunk samples; exposes pending/approved/rejected drift reports with admin approve/reject workflow

### πŸ€– Agentic Retrieval System

- **LangGraph orchestration**: State-machine ReACT agent with multi-step reasoning and fallback mechanisms
- **Tool routing**: Dynamically selects from Vector Search, Graph Traversal, Cypher Generation, Metadata Filtering, Community Search, and Temporal Queries
- **Streaming responses**: Server-Sent Events (SSE) with real-time reasoning steps surfaced in the UI
- **Multi-turn conversations**: Persistent conversation threads stored in Neo4j, per-user
- **Document-scoped queries**: Filter retrieval to a specific document via `document_id`
- **Graph of Thoughts (GoT)**: Optional GoT reasoning mode for complex multi-hop queries
- **LLM-as-a-Judge (inline)**: Optional per-response quality scoring with hallucination risk, grounded/ungrounded claims, and confidence reasoning displayed in chat
- **Confidence display**: Confidence score, hallucination risk, and judge reasoning shown directly in the chat bubble

### πŸ“Š RAGAS Evaluation & Quality Dashboard

- **`POST /api/eval/score`**: Run RAGAS-style evaluation on any Q&A pair (faithfulness, relevancy, context precision, hallucination detection)
- **`GET /api/eval/dashboard`**: Aggregate evaluation history β€” avg scores, hallucination rate, trend timeline
- Results persisted in Neo4j for longitudinal quality tracking

### πŸ—ΊοΈ Graph Intelligence

- **D3 force-directed visualization**: Interactive knowledge graph with zoom, pan, node selection, and a details modal
- **Graph Export**: Export full or document-scoped graph as JSON, Cypher, or GraphML
- **Community Detection**: Weakly-connected-components (WCC) community assignment with `POST /api/graph/communities/assign`
- **Community listing**: `GET /api/graph/communities` β€” top communities by entity count
- **Temporal Queries**: `GET /api/entities/{entity_name}/at-time` β€” retrieve entity relationships at a historical point in time
- **Semantic Entity Deduplication**: Multi-stage entity resolution with configurable similarity thresholds (`POST /api/entities/deduplicate`)
- **Entity Enrichment**: LLM-synthesized profile summaries for every entity, stored as `e.summary` (`POST /api/entities/enrich`)
- **Entity Chat (scoped)**: `POST /api/entities/{entity_name}/chat` β€” multi-turn conversation scoped entirely to a single entity's graph neighborhood
- **Graph Memory Updater**: Push raw text directly into the live knowledge graph without re-ingesting a document (`POST /api/graph/update`)

### πŸ“ Analytical Report Agent (ReACT)

- **`POST /api/report`**: ReACT multi-step report agent using InsightForge / PanoramaSearch / QuickSearch tools
- Decomposes topic into sub-questions β†’ retrieves graph data β†’ synthesizes sections β†’ compiles structured markdown report
- Exposed in the **Insights** view (copy/download report as Markdown)

### 🎭 Simulation & Persona Engine

- **Persona generation**: Celery task that generates personas from graph entities (`POST /api/v1/simulation/generate_personas`)
- **Simulation ticks**: Background tick loop (`POST /api/v1/simulation/tick`)
- **Live persona interview**: `POST /api/v1/simulation/interview` β€” roleplay chat with any graph entity injecting their Neo4j memory as system context
- **SimulationRunView**: Dedicated UI view for managing and interacting with simulation personas

### πŸ›‘οΈ Admin Dashboard

- **System statistics**: Node count, relationship count, LLM provider, environment
- **User management**: List users, update scopes/roles (RBAC)
- **Document vault**: View and delete all ingested documents
- **Graph CRUD**: Search, inspect, and delete graph nodes from the admin panel
- **Ontology governance**: Review and approve/reject pending ontology proposals
- **Celery task monitor**: View active and reserved tasks from the admin panel
- **Self-demotion guard**: Admins cannot demote their own account
- **Re-ingest button**: Re-queue any stored document from the document vault
- **User activity metrics**: Per-user conversation count, message count, last active timestamp

### πŸ” Authentication & Security

- **JWT authentication**: Token-based auth with configurable expiry
- **RBAC scopes**: `read`, `write`, `admin` scopes enforced per endpoint
- **User registration**: `POST /api/auth/register`
- **Pydantic validation**: All API inputs validated at the model layer
- **Cypher injection prevention**: Schema validation and query whitelisting
- **File upload limits**: File size and MIME type enforcement

### 🌐 Frontend (React/TypeScript)

Seven fully implemented views accessible from the `CORTEX` top navigation bar:

| Route | View | Description |
|---|---|---|
| `/` | **Home** | Animated stats dashboard β€” documents, entities, relationships, graph health |
| `/process` | **Process** | Upload files or scrape/crawl URLs; view ingestion queue and document list |
| `/ontology` | **Ontology** | View/edit the live ontology schema; run LLM refinement; inspect entity/relationship stats per doc |
| `/interact` | **Interact** | Streaming AI chat with reasoning steps, confidence, hallucination risk; conversation history |
| `/simulate` | **Simulate** | Simulation persona management and live interview interface |
| `/insights` | **Insights** | Topic-driven analytical report generation with copy/download |
| `/admin` | **Admin** _(admin-only)_ | Full admin panel for users, docs, tasks, ontology governance |

### πŸ”­ Observability

- **OpenTelemetry**: Distributed tracing (silenced from console; configured for export)
- **Health check**: `GET /api/system/health` β€” Neo4j, Redis, Celery worker status
- **System stats**: `GET /api/system/stats` β€” document, entity, relationship, chunk counts
- **User stats**: `GET /api/system/my-stats` β€” per-user conversation and message activity

---

## πŸ—οΈ Architecture

```
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                          React Frontend (CORTEX)                             β”‚
β”‚    Home β”‚ Process β”‚ Ontology β”‚ Interact β”‚ Simulate β”‚ Insights β”‚ Admin        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β”‚ HTTP / SSE
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     FastAPI Gateway (port 8000)                              β”‚
β”‚          JWT Auth Β· RBAC Scopes Β· CORS Β· OpenTelemetry                      β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚                      β”‚                      β”‚
β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Ingestion  β”‚   β”‚  ReACT Agent System  β”‚  β”‚  Report Agent (ReACT)      β”‚
β”‚  Pipeline   β”‚   β”‚  - Vector Search     β”‚  β”‚  - InsightForge            β”‚
β”‚  - Parser   β”‚   β”‚  - Graph Traversal   β”‚  β”‚  - PanoramaSearch          β”‚
β”‚  - Ontology β”‚   β”‚  - Cypher Gen (GoT)  β”‚  β”‚  - QuickSearch             β”‚
β”‚  - Extractorβ”‚   β”‚  - Community Search  β”‚  β”‚  - Markdown output         β”‚
β”‚  - Web      β”‚   β”‚  - Temporal Queries  β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚    Crawler  β”‚   β”‚  - LLM-as-a-Judge    β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚                    β”‚
β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Neo4j 5.x Database               β”‚
β”‚  Entities Β· Chunks Β· Relationships Β·          β”‚
β”‚  Vector Index Β· Conversations Β·               β”‚
β”‚  EvalResults Β· DriftReports Β· Users           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
       β”‚
β”Œβ”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Celery Workers (Redis)     β”‚
β”‚  - Async document ingestion β”‚
β”‚  - Persona generation       β”‚
β”‚  - Simulation ticks         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
```

---

## πŸ“¦ Project Structure

```
graph-RAG/
β”œβ”€β”€ src/graph_rag_service/
β”‚   β”œβ”€β”€ api/
β”‚   β”‚   β”œβ”€β”€ server.py          # Main FastAPI app + all API routes (1900 lines)
β”‚   β”‚   β”œβ”€β”€ auth.py            # JWT auth + RBAC helpers
β”‚   β”‚   β”œβ”€β”€ admin.py           # Admin sub-router
β”‚   β”‚   β”œβ”€β”€ simulation.py      # Simulation / persona interview router
β”‚   β”‚   └── models.py          # All Pydantic request/response models
β”‚   β”œβ”€β”€ core/
β”‚   β”‚   β”œβ”€β”€ abstractions.py    # Abstract base classes (GraphStore, VectorStore, LLMProvider)
β”‚   β”‚   β”œβ”€β”€ models.py          # Domain data models
β”‚   β”‚   β”œβ”€β”€ neo4j_store.py     # Full Neo4j implementation (graph + vector)
β”‚   β”‚   β”œβ”€β”€ llm_factory.py     # Multi-LLM provider factory + UnifiedLLMProvider
β”‚   β”‚   β”œβ”€β”€ entity_resolver.py # Semantic entity deduplication
β”‚   β”‚   └── storage.py         # File storage abstraction
β”‚   β”œβ”€β”€ ingestion/
β”‚   β”‚   β”œβ”€β”€ pipeline.py        # End-to-end ingestion orchestrator
β”‚   β”‚   β”œβ”€β”€ document_processor.py  # Multi-format document parsing
β”‚   β”‚   β”œβ”€β”€ ontology_generator.py  # LLM ontology generation + refinement
β”‚   β”‚   β”œβ”€β”€ extractor.py       # Entity + relationship extraction
β”‚   β”‚   β”œβ”€β”€ web_crawler.py     # Playwright-based deep web crawler (Crawl4AI)
β”‚   β”‚   └── persona_generator.py   # Simulation persona generation
β”‚   β”œβ”€β”€ retrieval/
β”‚   β”‚   β”œβ”€β”€ agent.py           # LangGraph ReACT retrieval agent
β”‚   β”‚   β”œβ”€β”€ tools.py           # Retrieval tools + RAGEvaluator (RAGAS)
β”‚   β”‚   └── report_agent.py    # ReACT analytical report agent
β”‚   β”œβ”€β”€ services/
β”‚   β”‚   β”œβ”€β”€ graph_memory_updater.py   # Push raw text β†’ live graph
β”‚   β”‚   β”œβ”€β”€ entity_enricher.py        # LLM entity profile summaries
β”‚   β”‚   └── ontology_drift_detector.py # Automated schema drift detection
β”‚   β”œβ”€β”€ workers/
β”‚   β”‚   └── celery_worker.py   # Celery app + ingest_document_task
β”‚   β”œβ”€β”€ observability/
β”‚   β”‚   └── tracing.py         # OpenTelemetry setup (console suppressed)
β”‚   β”œβ”€β”€ config.py              # Pydantic settings (all env vars)
β”‚   └── main.py                # Uvicorn entry point
β”œβ”€β”€ frontend-react/
β”‚   └── src/
β”‚       β”œβ”€β”€ views/
β”‚       β”‚   β”œβ”€β”€ Home.tsx            # Animated stats dashboard
β”‚       β”‚   β”œβ”€β”€ Process.tsx         # Document upload + URL scrape/crawl
β”‚       β”‚   β”œβ”€β”€ Ontology.tsx        # Schema editor + stats
β”‚       β”‚   β”œβ”€β”€ InteractionView.tsx # Streaming chat + conversation history
β”‚       β”‚   β”œβ”€β”€ SimulationRunView.tsx # Persona simulation UI
β”‚       β”‚   β”œβ”€β”€ InsightsView.tsx    # Report generation + copy/download
β”‚       β”‚   β”œβ”€β”€ AdminDashboard.tsx  # Full admin panel
β”‚       β”‚   └── Login.tsx           # Login page
β”‚       β”œβ”€β”€ components/
β”‚       β”‚   └── GraphCanvas.tsx     # D3 force-directed graph + node modal
β”‚       β”œβ”€β”€ context/
β”‚       β”‚   └── AuthContext.tsx     # JWT auth context + hooks
β”‚       └── App.tsx                 # Router + top-nav (CORTEX branding)
β”œβ”€β”€ tests/                      # Test suite
β”œβ”€β”€ data/uploads/               # Uploaded documents (local storage)
β”œβ”€β”€ .env.example                # All configurable environment variables
β”œβ”€β”€ pyproject.toml              # Python project + uv dependencies
β”œβ”€β”€ package.json                # Unified start scripts (npm run rag)
β”œβ”€β”€ ARCHITECTURE.md             # Detailed architecture design doc
└── QUICKSTART.md               # 5-minute quick start guide
```

---

## ⚑ Quick Start

### Prerequisites

- Python 3.12+
- Node.js 18+
- Neo4j 5.x (running) with **APOC** and **Graph Data Science (GDS)** plugins installed
- Redis (running)
- Ollama *(optional, for local LLMs)*

### 1. Clone & Install

```bash
git clone <repository-url>
cd graph-RAG

# Installs Python deps (uv), frontend (npm), and Playwright Chromium
npm install
```

### 2. Configure Environment

```bash
cp .env.example .env
# Fill in NEO4J_URI, NEO4J_PASSWORD, and your LLM API keys
```

### 3. Start Neo4j

```bash
docker run -d --name neo4j \
  -p 7474:7474 -p 7687:7687 \
  -e NEO4J_AUTH=neo4j/password \
  neo4j:latest
```

### 4. Start Redis

```bash
docker run -d --name redis -p 6379:6379 redis:alpine
```

### 5. Launch Everything

```bash
npm run rag
```

This starts three color-coded processes concurrently:

| Process | URL |
|---|---|
| **API Server** | `http://localhost:8000` |
| **API Docs** | `http://localhost:8000/docs` |
| **React Frontend** | `http://localhost:5173` |

> Default credentials: `admin` / `admin`

---

## πŸ”‘ Environment Variables

Copy `.env.example` to `.env` and configure:

```env
# Neo4j
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=password

# Redis
REDIS_HOST=localhost
REDIS_PORT=6379

# LLM Provider (openai | anthropic | gemini | ollama)
DEFAULT_LLM_PROVIDER=gemini
GOOGLE_API_KEY=your-key-here

# Optional: OpenAI / Anthropic
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

# Optional: Ollama (local)
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=deepseek-r1:7b
OLLAMA_EMBEDDING_MODEL=nomic-embed-text

# Feature flags
ENABLE_LLM_JUDGE=true

# Security
SECRET_KEY=change-this-in-production
ACCESS_TOKEN_EXPIRE_MINUTES=1440
```

---

## 🌐 API Reference

### Authentication
| Method | Endpoint | Description |
|---|---|---|
| `POST` | `/api/auth/register` | Register new user |
| `POST` | `/api/auth/login` | Login β†’ JWT token |
| `GET` | `/api/auth/me` | Get current user info |

### Documents
| Method | Endpoint | Description |
|---|---|---|
| `POST` | `/api/documents/upload` | Upload file (PDF, DOCX, TXT, MD, CSV, XLSX, PPTX, JSON) |
| `POST` | `/api/documents/scrape` | Scrape single URL β†’ ingest |
| `POST` | `/api/documents/crawl` | Deep multi-page Playwright crawl β†’ ingest *(API Only)* |
| `GET` | `/api/documents` | List all ingested documents |
| `DELETE` | `/api/documents/{id}` | Delete document + graph chunks |
| `GET` | `/api/documents/{id}/download` | Download source file |
| `GET` | `/api/documents/{id}/preview` | Preview text content |
| `GET` | `/api/documents/status/{task_id}` | Ingestion task status |

### Query & Chat
| Method | Endpoint | Description |
|---|---|---|
| `POST` | `/api/query` | Agentic query (streaming or JSON); supports `document_id`, `use_got` |
| `GET` | `/api/conversations` | List conversation threads |
| `GET` | `/api/conversations/{id}` | Get conversation + messages |
| `DELETE` | `/api/conversations/{id}` | Delete conversation |

### Ontology
| Method | Endpoint | Description |
|---|---|---|
| `GET` | `/api/ontology` | Get current ontology |
| `PUT` | `/api/ontology` | Update ontology (admin) |
| `POST` | `/api/ontology/refine` | LLM-powered ontology refinement |
| `GET` | `/api/ontology/stats` | Entity/relationship counts (optional doc filter) |
| `POST` | `/api/ontology/drift/detect` | Trigger drift detection |
| `GET` | `/api/ontology/drift` | List drift reports |
| `POST` | `/api/ontology/drift/{id}/approve` | Approve drift β†’ merge into ontology |
| `POST` | `/api/ontology/drift/{id}/reject` | Reject drift report |

### Graph
| Method | Endpoint | Description |
|---|---|---|
| `GET` | `/api/graph/visualization` | Graph nodes + edges for D3 rendering |
| `GET` | `/api/graph/export` | Export graph (json \| cypher \| graphml) |
| `POST` | `/api/graph/update` | Push raw text β†’ merge into live graph |
| `POST` | `/api/graph/communities/assign` | Run WCC community detection |
| `GET` | `/api/graph/communities` | List top communities |

### Entities
| Method | Endpoint | Description |
|---|---|---|
| `POST` | `/api/entities/deduplicate` | Semantic entity resolution + merge |
| `POST` | `/api/entities/enrich` | Generate LLM summaries for all entities |
| `GET` | `/api/entities/{name}/summary` | Get enriched entity profile |
| `POST` | `/api/entities/{name}/chat` | Multi-turn entity-scoped chat |
| `GET` | `/api/entities/{name}/at-time` | Temporal query (ISO 8601 date) |

### Reports & Evaluation
| Method | Endpoint | Description |
|---|---|---|
| `POST` | `/api/report` | Generate ReACT analytical report (markdown) |
| `POST` | `/api/eval/score` | RAGAS evaluation of a Q&A pair |
| `GET` | `/api/eval/dashboard` | Evaluation history dashboard |

### Simulation
| Method | Endpoint | Description |
|---|---|---|
| `POST` | `/api/v1/simulation/interview` | Live persona interview (in-character LLM) |
| `GET` | `/api/v1/simulation/report` | Sandbox analytical report *(API Only)* |
| `POST` | `/api/v1/simulation/generate_personas` | Queue persona generation task *(API Only)* |
| `POST` | `/api/v1/simulation/tick` | Advance simulation tick *(API Only)* |

### System & Admin
| Method | Endpoint | Description |
|---|---|---|
| `GET` | `/api/system/health` | Neo4j + Redis + Celery health |
| `GET` | `/api/system/stats` | Document, entity, relationship counts |
| `GET` | `/api/system/my-stats` | Current user's activity stats |
| `GET` | `/api/system/formats` | Supported ingestion file formats |
| `GET` | `/api/admin/stats` | Admin-only system stats |
| `GET` | `/api/admin/users` | List all users |
| `PUT` | `/api/admin/users/{username}/role` | Update user scopes |
| `GET` | `/api/admin/tasks` | View Celery tasks |
| `GET` | `/api/admin/documents` | Admin document vault |
| `POST` | `/api/admin/documents/{id}/reingest` | Re-queue document for ingestion |
| `GET` | `/api/admin/graph/nodes` | Search graph nodes |
| `DELETE` | `/api/admin/graph/nodes/{id}` | Delete a graph node |

---

## πŸ§ͺ Testing

```bash
# Run tests
uv run pytest

# With coverage
uv run pytest --cov=src/graph_rag_service
```

---

## πŸš€ Production Deployment

| Process | Command |
|---|---|
| **API Server** | `uv run python main.py` |
| **Celery Worker** | `uv run celery -A src.graph_rag_service.workers.celery_worker worker --loglevel=info --concurrency=4 --pool=threads` |
| **React Build** | `cd frontend-react && npm run build` |

The built React assets can be served directly by FastAPI (static file mount), or deployed to a CDN separately. Neo4j and Redis can be run via Docker, managed cloud services (AuraDB, Redis Cloud), or self-hosted.

---

## πŸ“„ Additional Documentation

- **[ARCHITECTURE.md](./ARCHITECTURE.md)** β€” Deep dive into the system design, data flow, and component interactions
- **[QUICKSTART.md](./QUICKSTART.md)** β€” 5-minute environment setup guide
- **`/docs`** β€” Interactive Swagger UI (auto-generated from FastAPI)

---

**Project Status**: Production-grade MVP Β· Actively developed  
**License**: Proprietary β€” all rights reserved