| # CORTEX β Agentic Graph RAG Platform |
|
|
| > **CORTEX** is a production-grade, agentic Knowledge Graph platform that transforms unstructured documents and web content into an intelligent, queryable knowledge graph β with a full-featured React UI, streaming AI chat, real-time graph visualization, simulation personas, and deep ontology governance. |
|
|
| --- |
|
|
| ## β¨ What's Been Built |
|
|
| ### π₯οΈ Full-Stack Application |
|
|
| | Layer | Stack | |
| |---|---| |
| | **Backend API** | FastAPI (async) + Python 3.12 | |
| | **Task Queue** | Celery + Redis | |
| | **Graph + Vector DB** | Neo4j 5.x (unified) | |
| | **LLM Layer** | OpenAI, Anthropic, Google Gemini, Ollama | |
| | **Frontend** | React 18 + TypeScript + Vite | |
| | **Unified Start** | `npm run rag` (concurrently launches all 3 processes) | |
|
|
| --- |
|
|
| ## π Features |
|
|
| ### π₯ Document Ingestion Pipeline |
|
|
| - **Multi-format ingestion**: PDF, TXT, MD, DOCX, CSV, XLSX, PPTX, JSON |
| - **Web scraping**: Single-page scrape via `POST /api/documents/scrape` |
| - **Deep web crawling**: Multi-depth Playwright-powered crawler (`POST /api/documents/crawl`) via Crawl4AI |
| - **Async Celery workers**: Upload returns instantly with a `task_id`; background workers build the graph |
| - **Re-ingest**: Admin can trigger re-processing of any stored document |
| - **Document preview & download**: In-browser preview of text/Markdown; PDF download via API |
|
|
| ### π Ontology Management |
|
|
| - **Auto-generation**: LLM analyzes document chunks to propose entity types & relationship types |
| - **LLM-powered refinement**: `POST /api/ontology/refine` β refine schema with optional human feedback |
| - **Versioning**: Each schema change bumps the version (`v1.0` β `v1.1`, etc.) |
| - **Document-scoped stats**: `/api/ontology/stats?document_id=...` returns entity/relationship breakdowns for a specific document |
| - **Visual editor**: Ontology view in UI with editable entity types and relationship types |
| - **Ontology Drift Detection**: Automated drift detection compares live graph against new chunk samples; exposes pending/approved/rejected drift reports with admin approve/reject workflow |
|
|
| ### π€ Agentic Retrieval System |
|
|
| - **LangGraph orchestration**: State-machine ReACT agent with multi-step reasoning and fallback mechanisms |
| - **Tool routing**: Dynamically selects from Vector Search, Graph Traversal, Cypher Generation, Metadata Filtering, Community Search, and Temporal Queries |
| - **Streaming responses**: Server-Sent Events (SSE) with real-time reasoning steps surfaced in the UI |
| - **Multi-turn conversations**: Persistent conversation threads stored in Neo4j, per-user |
| - **Document-scoped queries**: Filter retrieval to a specific document via `document_id` |
| - **Graph of Thoughts (GoT)**: Optional GoT reasoning mode for complex multi-hop queries |
| - **LLM-as-a-Judge (inline)**: Optional per-response quality scoring with hallucination risk, grounded/ungrounded claims, and confidence reasoning displayed in chat |
| - **Confidence display**: Confidence score, hallucination risk, and judge reasoning shown directly in the chat bubble |
|
|
| ### π RAGAS Evaluation & Quality Dashboard |
|
|
| - **`POST /api/eval/score`**: Run RAGAS-style evaluation on any Q&A pair (faithfulness, relevancy, context precision, hallucination detection) |
| - **`GET /api/eval/dashboard`**: Aggregate evaluation history β avg scores, hallucination rate, trend timeline |
| - Results persisted in Neo4j for longitudinal quality tracking |
|
|
| ### πΊοΈ Graph Intelligence |
|
|
| - **D3 force-directed visualization**: Interactive knowledge graph with zoom, pan, node selection, and a details modal |
| - **Graph Export**: Export full or document-scoped graph as JSON, Cypher, or GraphML |
| - **Community Detection**: Weakly-connected-components (WCC) community assignment with `POST /api/graph/communities/assign` |
| - **Community listing**: `GET /api/graph/communities` β top communities by entity count |
| - **Temporal Queries**: `GET /api/entities/{entity_name}/at-time` β retrieve entity relationships at a historical point in time |
| - **Semantic Entity Deduplication**: Multi-stage entity resolution with configurable similarity thresholds (`POST /api/entities/deduplicate`) |
| - **Entity Enrichment**: LLM-synthesized profile summaries for every entity, stored as `e.summary` (`POST /api/entities/enrich`) |
| - **Entity Chat (scoped)**: `POST /api/entities/{entity_name}/chat` β multi-turn conversation scoped entirely to a single entity's graph neighborhood |
| - **Graph Memory Updater**: Push raw text directly into the live knowledge graph without re-ingesting a document (`POST /api/graph/update`) |
|
|
| ### π Analytical Report Agent (ReACT) |
|
|
| - **`POST /api/report`**: ReACT multi-step report agent using InsightForge / PanoramaSearch / QuickSearch tools |
| - Decomposes topic into sub-questions β retrieves graph data β synthesizes sections β compiles structured markdown report |
| - Exposed in the **Insights** view (copy/download report as Markdown) |
|
|
| ### π Simulation & Persona Engine |
|
|
| - **Persona generation**: Celery task that generates personas from graph entities (`POST /api/v1/simulation/generate_personas`) |
| - **Simulation ticks**: Background tick loop (`POST /api/v1/simulation/tick`) |
| - **Live persona interview**: `POST /api/v1/simulation/interview` β roleplay chat with any graph entity injecting their Neo4j memory as system context |
| - **SimulationRunView**: Dedicated UI view for managing and interacting with simulation personas |
|
|
| ### π‘οΈ Admin Dashboard |
|
|
| - **System statistics**: Node count, relationship count, LLM provider, environment |
| - **User management**: List users, update scopes/roles (RBAC) |
| - **Document vault**: View and delete all ingested documents |
| - **Graph CRUD**: Search, inspect, and delete graph nodes from the admin panel |
| - **Ontology governance**: Review and approve/reject pending ontology proposals |
| - **Celery task monitor**: View active and reserved tasks from the admin panel |
| - **Self-demotion guard**: Admins cannot demote their own account |
| - **Re-ingest button**: Re-queue any stored document from the document vault |
| - **User activity metrics**: Per-user conversation count, message count, last active timestamp |
|
|
| ### π Authentication & Security |
|
|
| - **JWT authentication**: Token-based auth with configurable expiry |
| - **RBAC scopes**: `read`, `write`, `admin` scopes enforced per endpoint |
| - **User registration**: `POST /api/auth/register` |
| - **Pydantic validation**: All API inputs validated at the model layer |
| - **Cypher injection prevention**: Schema validation and query whitelisting |
| - **File upload limits**: File size and MIME type enforcement |
|
|
| ### π Frontend (React/TypeScript) |
|
|
| Seven fully implemented views accessible from the `CORTEX` top navigation bar: |
|
|
| | Route | View | Description | |
| |---|---|---| |
| | `/` | **Home** | Animated stats dashboard β documents, entities, relationships, graph health | |
| | `/process` | **Process** | Upload files or scrape/crawl URLs; view ingestion queue and document list | |
| | `/ontology` | **Ontology** | View/edit the live ontology schema; run LLM refinement; inspect entity/relationship stats per doc | |
| | `/interact` | **Interact** | Streaming AI chat with reasoning steps, confidence, hallucination risk; conversation history | |
| | `/simulate` | **Simulate** | Simulation persona management and live interview interface | |
| | `/insights` | **Insights** | Topic-driven analytical report generation with copy/download | |
| | `/admin` | **Admin** _(admin-only)_ | Full admin panel for users, docs, tasks, ontology governance | |
|
|
| ### π Observability |
|
|
| - **OpenTelemetry**: Distributed tracing (silenced from console; configured for export) |
| - **Health check**: `GET /api/system/health` β Neo4j, Redis, Celery worker status |
| - **System stats**: `GET /api/system/stats` β document, entity, relationship, chunk counts |
| - **User stats**: `GET /api/system/my-stats` β per-user conversation and message activity |
|
|
| --- |
|
|
| ## ποΈ Architecture |
|
|
| ``` |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ |
| β React Frontend (CORTEX) β |
| β Home β Process β Ontology β Interact β Simulate β Insights β Admin β |
| βββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββββββββββ |
| β HTTP / SSE |
| βββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββββββ |
| β FastAPI Gateway (port 8000) β |
| β JWT Auth Β· RBAC Scopes Β· CORS Β· OpenTelemetry β |
| ββββββββ¬βββββββββββββββββββββββ¬βββββββββββββββββββββββ¬βββββββββββββββββββββββββ |
| β β β |
| ββββββββΌβββββββ βββββββββββββΌβββββββββββ βββββββββΌβββββββββββββββββββββ |
| β Ingestion β β ReACT Agent System β β Report Agent (ReACT) β |
| β Pipeline β β - Vector Search β β - InsightForge β |
| β - Parser β β - Graph Traversal β β - PanoramaSearch β |
| β - Ontology β β - Cypher Gen (GoT) β β - QuickSearch β |
| β - Extractorβ β - Community Search β β - Markdown output β |
| β - Web β β - Temporal Queries β ββββββββββββββββββββββββββββββ |
| β Crawler β β - LLM-as-a-Judge β |
| ββββββββ¬βββββββ βββββββββββ¬βββββββββββββ |
| β β |
| ββββββββΌβββββββββββββββββββββΌβββββββββββββββββββ |
| β Neo4j 5.x Database β |
| β Entities Β· Chunks Β· Relationships Β· β |
| β Vector Index Β· Conversations Β· β |
| β EvalResults Β· DriftReports Β· Users β |
| βββββββββββββββββββββββββββββββββββββββββββββββββ |
| β |
| ββββββββΌβββββββββββββββββββββββ |
| β Celery Workers (Redis) β |
| β - Async document ingestion β |
| β - Persona generation β |
| β - Simulation ticks β |
| βββββββββββββββββββββββββββββββ |
| ``` |
|
|
| --- |
|
|
| ## π¦ Project Structure |
|
|
| ``` |
| graph-RAG/ |
| βββ src/graph_rag_service/ |
| β βββ api/ |
| β β βββ server.py # Main FastAPI app + all API routes (1900 lines) |
| β β βββ auth.py # JWT auth + RBAC helpers |
| β β βββ admin.py # Admin sub-router |
| β β βββ simulation.py # Simulation / persona interview router |
| β β βββ models.py # All Pydantic request/response models |
| β βββ core/ |
| β β βββ abstractions.py # Abstract base classes (GraphStore, VectorStore, LLMProvider) |
| β β βββ models.py # Domain data models |
| β β βββ neo4j_store.py # Full Neo4j implementation (graph + vector) |
| β β βββ llm_factory.py # Multi-LLM provider factory + UnifiedLLMProvider |
| β β βββ entity_resolver.py # Semantic entity deduplication |
| β β βββ storage.py # File storage abstraction |
| β βββ ingestion/ |
| β β βββ pipeline.py # End-to-end ingestion orchestrator |
| β β βββ document_processor.py # Multi-format document parsing |
| β β βββ ontology_generator.py # LLM ontology generation + refinement |
| β β βββ extractor.py # Entity + relationship extraction |
| β β βββ web_crawler.py # Playwright-based deep web crawler (Crawl4AI) |
| β β βββ persona_generator.py # Simulation persona generation |
| β βββ retrieval/ |
| β β βββ agent.py # LangGraph ReACT retrieval agent |
| β β βββ tools.py # Retrieval tools + RAGEvaluator (RAGAS) |
| β β βββ report_agent.py # ReACT analytical report agent |
| β βββ services/ |
| β β βββ graph_memory_updater.py # Push raw text β live graph |
| β β βββ entity_enricher.py # LLM entity profile summaries |
| β β βββ ontology_drift_detector.py # Automated schema drift detection |
| β βββ workers/ |
| β β βββ celery_worker.py # Celery app + ingest_document_task |
| β βββ observability/ |
| β β βββ tracing.py # OpenTelemetry setup (console suppressed) |
| β βββ config.py # Pydantic settings (all env vars) |
| β βββ main.py # Uvicorn entry point |
| βββ frontend-react/ |
| β βββ src/ |
| β βββ views/ |
| β β βββ Home.tsx # Animated stats dashboard |
| β β βββ Process.tsx # Document upload + URL scrape/crawl |
| β β βββ Ontology.tsx # Schema editor + stats |
| β β βββ InteractionView.tsx # Streaming chat + conversation history |
| β β βββ SimulationRunView.tsx # Persona simulation UI |
| β β βββ InsightsView.tsx # Report generation + copy/download |
| β β βββ AdminDashboard.tsx # Full admin panel |
| β β βββ Login.tsx # Login page |
| β βββ components/ |
| β β βββ GraphCanvas.tsx # D3 force-directed graph + node modal |
| β βββ context/ |
| β β βββ AuthContext.tsx # JWT auth context + hooks |
| β βββ App.tsx # Router + top-nav (CORTEX branding) |
| βββ tests/ # Test suite |
| βββ data/uploads/ # Uploaded documents (local storage) |
| βββ .env.example # All configurable environment variables |
| βββ pyproject.toml # Python project + uv dependencies |
| βββ package.json # Unified start scripts (npm run rag) |
| βββ ARCHITECTURE.md # Detailed architecture design doc |
| βββ QUICKSTART.md # 5-minute quick start guide |
| ``` |
|
|
| --- |
|
|
| ## β‘ Quick Start |
|
|
| ### Prerequisites |
|
|
| - Python 3.12+ |
| - Node.js 18+ |
| - Neo4j 5.x (running) with **APOC** and **Graph Data Science (GDS)** plugins installed |
| - Redis (running) |
| - Ollama *(optional, for local LLMs)* |
|
|
| ### 1. Clone & Install |
|
|
| ```bash |
| git clone <repository-url> |
| cd graph-RAG |
| |
| # Installs Python deps (uv), frontend (npm), and Playwright Chromium |
| npm install |
| ``` |
|
|
| ### 2. Configure Environment |
|
|
| ```bash |
| cp .env.example .env |
| # Fill in NEO4J_URI, NEO4J_PASSWORD, and your LLM API keys |
| ``` |
|
|
| ### 3. Start Neo4j |
|
|
| ```bash |
| docker run -d --name neo4j \ |
| -p 7474:7474 -p 7687:7687 \ |
| -e NEO4J_AUTH=neo4j/password \ |
| neo4j:latest |
| ``` |
|
|
| ### 4. Start Redis |
|
|
| ```bash |
| docker run -d --name redis -p 6379:6379 redis:alpine |
| ``` |
|
|
| ### 5. Launch Everything |
|
|
| ```bash |
| npm run rag |
| ``` |
|
|
| This starts three color-coded processes concurrently: |
|
|
| | Process | URL | |
| |---|---| |
| | **API Server** | `http://localhost:8000` | |
| | **API Docs** | `http://localhost:8000/docs` | |
| | **React Frontend** | `http://localhost:5173` | |
|
|
| > Default credentials: `admin` / `admin` |
|
|
| --- |
|
|
| ## π Environment Variables |
|
|
| Copy `.env.example` to `.env` and configure: |
|
|
| ```env |
| # Neo4j |
| NEO4J_URI=bolt://localhost:7687 |
| NEO4J_USER=neo4j |
| NEO4J_PASSWORD=password |
| |
| # Redis |
| REDIS_HOST=localhost |
| REDIS_PORT=6379 |
| |
| # LLM Provider (openai | anthropic | gemini | ollama) |
| DEFAULT_LLM_PROVIDER=gemini |
| GOOGLE_API_KEY=your-key-here |
| |
| # Optional: OpenAI / Anthropic |
| OPENAI_API_KEY=sk-... |
| ANTHROPIC_API_KEY=sk-ant-... |
| |
| # Optional: Ollama (local) |
| OLLAMA_BASE_URL=http://localhost:11434 |
| OLLAMA_MODEL=deepseek-r1:7b |
| OLLAMA_EMBEDDING_MODEL=nomic-embed-text |
| |
| # Feature flags |
| ENABLE_LLM_JUDGE=true |
| |
| # Security |
| SECRET_KEY=change-this-in-production |
| ACCESS_TOKEN_EXPIRE_MINUTES=1440 |
| ``` |
|
|
| --- |
|
|
| ## π API Reference |
|
|
| ### Authentication |
| | Method | Endpoint | Description | |
| |---|---|---| |
| | `POST` | `/api/auth/register` | Register new user | |
| | `POST` | `/api/auth/login` | Login β JWT token | |
| | `GET` | `/api/auth/me` | Get current user info | |
|
|
| ### Documents |
| | Method | Endpoint | Description | |
| |---|---|---| |
| | `POST` | `/api/documents/upload` | Upload file (PDF, DOCX, TXT, MD, CSV, XLSX, PPTX, JSON) | |
| | `POST` | `/api/documents/scrape` | Scrape single URL β ingest | |
| | `POST` | `/api/documents/crawl` | Deep multi-page Playwright crawl β ingest *(API Only)* | |
| | `GET` | `/api/documents` | List all ingested documents | |
| | `DELETE` | `/api/documents/{id}` | Delete document + graph chunks | |
| | `GET` | `/api/documents/{id}/download` | Download source file | |
| | `GET` | `/api/documents/{id}/preview` | Preview text content | |
| | `GET` | `/api/documents/status/{task_id}` | Ingestion task status | |
|
|
| ### Query & Chat |
| | Method | Endpoint | Description | |
| |---|---|---| |
| | `POST` | `/api/query` | Agentic query (streaming or JSON); supports `document_id`, `use_got` | |
| | `GET` | `/api/conversations` | List conversation threads | |
| | `GET` | `/api/conversations/{id}` | Get conversation + messages | |
| | `DELETE` | `/api/conversations/{id}` | Delete conversation | |
|
|
| ### Ontology |
| | Method | Endpoint | Description | |
| |---|---|---| |
| | `GET` | `/api/ontology` | Get current ontology | |
| | `PUT` | `/api/ontology` | Update ontology (admin) | |
| | `POST` | `/api/ontology/refine` | LLM-powered ontology refinement | |
| | `GET` | `/api/ontology/stats` | Entity/relationship counts (optional doc filter) | |
| | `POST` | `/api/ontology/drift/detect` | Trigger drift detection | |
| | `GET` | `/api/ontology/drift` | List drift reports | |
| | `POST` | `/api/ontology/drift/{id}/approve` | Approve drift β merge into ontology | |
| | `POST` | `/api/ontology/drift/{id}/reject` | Reject drift report | |
|
|
| ### Graph |
| | Method | Endpoint | Description | |
| |---|---|---| |
| | `GET` | `/api/graph/visualization` | Graph nodes + edges for D3 rendering | |
| | `GET` | `/api/graph/export` | Export graph (json \| cypher \| graphml) | |
| | `POST` | `/api/graph/update` | Push raw text β merge into live graph | |
| | `POST` | `/api/graph/communities/assign` | Run WCC community detection | |
| | `GET` | `/api/graph/communities` | List top communities | |
|
|
| ### Entities |
| | Method | Endpoint | Description | |
| |---|---|---| |
| | `POST` | `/api/entities/deduplicate` | Semantic entity resolution + merge | |
| | `POST` | `/api/entities/enrich` | Generate LLM summaries for all entities | |
| | `GET` | `/api/entities/{name}/summary` | Get enriched entity profile | |
| | `POST` | `/api/entities/{name}/chat` | Multi-turn entity-scoped chat | |
| | `GET` | `/api/entities/{name}/at-time` | Temporal query (ISO 8601 date) | |
|
|
| ### Reports & Evaluation |
| | Method | Endpoint | Description | |
| |---|---|---| |
| | `POST` | `/api/report` | Generate ReACT analytical report (markdown) | |
| | `POST` | `/api/eval/score` | RAGAS evaluation of a Q&A pair | |
| | `GET` | `/api/eval/dashboard` | Evaluation history dashboard | |
|
|
| ### Simulation |
| | Method | Endpoint | Description | |
| |---|---|---| |
| | `POST` | `/api/v1/simulation/interview` | Live persona interview (in-character LLM) | |
| | `GET` | `/api/v1/simulation/report` | Sandbox analytical report *(API Only)* | |
| | `POST` | `/api/v1/simulation/generate_personas` | Queue persona generation task *(API Only)* | |
| | `POST` | `/api/v1/simulation/tick` | Advance simulation tick *(API Only)* | |
|
|
| ### System & Admin |
| | Method | Endpoint | Description | |
| |---|---|---| |
| | `GET` | `/api/system/health` | Neo4j + Redis + Celery health | |
| | `GET` | `/api/system/stats` | Document, entity, relationship counts | |
| | `GET` | `/api/system/my-stats` | Current user's activity stats | |
| | `GET` | `/api/system/formats` | Supported ingestion file formats | |
| | `GET` | `/api/admin/stats` | Admin-only system stats | |
| | `GET` | `/api/admin/users` | List all users | |
| | `PUT` | `/api/admin/users/{username}/role` | Update user scopes | |
| | `GET` | `/api/admin/tasks` | View Celery tasks | |
| | `GET` | `/api/admin/documents` | Admin document vault | |
| | `POST` | `/api/admin/documents/{id}/reingest` | Re-queue document for ingestion | |
| | `GET` | `/api/admin/graph/nodes` | Search graph nodes | |
| | `DELETE` | `/api/admin/graph/nodes/{id}` | Delete a graph node | |
|
|
| --- |
|
|
| ## π§ͺ Testing |
|
|
| ```bash |
| # Run tests |
| uv run pytest |
| |
| # With coverage |
| uv run pytest --cov=src/graph_rag_service |
| ``` |
|
|
| --- |
|
|
| ## π Production Deployment |
|
|
| | Process | Command | |
| |---|---| |
| | **API Server** | `uv run python main.py` | |
| | **Celery Worker** | `uv run celery -A src.graph_rag_service.workers.celery_worker worker --loglevel=info --concurrency=4 --pool=threads` | |
| | **React Build** | `cd frontend-react && npm run build` | |
|
|
| The built React assets can be served directly by FastAPI (static file mount), or deployed to a CDN separately. Neo4j and Redis can be run via Docker, managed cloud services (AuraDB, Redis Cloud), or self-hosted. |
|
|
| --- |
|
|
| ## π Additional Documentation |
|
|
| - **[ARCHITECTURE.md](./ARCHITECTURE.md)** β Deep dive into the system design, data flow, and component interactions |
| - **[QUICKSTART.md](./QUICKSTART.md)** β 5-minute environment setup guide |
| - **`/docs`** β Interactive Swagger UI (auto-generated from FastAPI) |
|
|
| --- |
|
|
| **Project Status**: Production-grade MVP Β· Actively developed |
| **License**: Proprietary β all rights reserved |
|
|