| --- |
| title: "Agentic Reliability Framework MVP" |
| emoji: "🧠" |
| colorFrom: "indigo" |
| colorTo: "blue" |
| sdk: "gradio" |
| sdk_version: "5.49.1" |
| app_file: "app.py" |
| pinned: true |
| python_version: "3.10" |
| license: "mit" |
| --- |
| |
| # 🧠 Agentic Reliability Framework MVP |
|
|
| **Adaptive anomaly detection + AI-driven self-healing + persistent FAISS memory.** |
|
|
| This project explores **agentic reliability systems** — blending observability, vector-based persistence, and AI inference to create self-healing cloud operations. |
|
|
| Built with: |
| - ⚡ **Gradio 5.49.1** for live visualization & dashboard UI |
| - 🧩 **FastAPI** for REST endpoints (`/add-event`) with API key support |
| - 🧠 **Sentence Transformers** (`all-MiniLM-L6-v2`) for embedding-based anomaly memory |
| - 🔍 **FAISS** for similarity search across past incidents |
| - 🔒 **FileLock** for safe concurrent saves in multi-user environments |
| - 🤖 **Hugging Face Router Inference API** for adaptive reliability insights |
| - ☁️ **Python 3.10** runtime |
|
|
| --- |
|
|
| ## 🚀 Features |
|
|
| | Capability | Description | |
| |-------------|--------------| |
| | **Adaptive Anomaly Detection** | Detects anomalies dynamically based on latency and error-rate thresholds | |
| | **AI Root Cause Analysis** | Uses the Hugging Face Inference API for contextual one-line incident summaries | |
| | **Self-Healing Actions** | Simulates healing actions (scale-up, restart, etc.) | |
| | **Persistent Memory (FAISS)** | Learns from prior incidents, clusters patterns, and retrieves similar cases | |
| | **Secure REST API** | `/add-event` endpoint secured by `X-API-Key` header | |
| | **Interactive Gradio UI** | Visualize, test, and analyze events live in your browser | |
|
|
| --- |
|
|
| ## 🧠 Example Output |
|
|
| ✅ **Event Processed (Anomaly)** |
|
|
| Component: api-service |
| Latency: 224 ms |
| Error Rate: 0.062 |
| Status: Anomaly |
| Analysis: Error 404: Not Found |
| Healing Action: Restarted container (Found 3 similar incidents) |
|
|
|
|
| --- |
|
|
| ## 🧩 Architecture Overview |
|
|
| ┌──────────────────────┐ |
| │ Gradio Frontend UI │ |
| └─────────┬────────────┘ |
| │ (submit telemetry) |
| ▼ |
| ┌──────────────────────┐ |
| │ FastAPI /add-event │ |
| │ + API Key validation │ |
| └─────────┬────────────┘ |
| │ (call) |
| ▼ |
| ┌─────────────────────────────┐ |
| │ Hugging Face Inference API │ |
| │ → Reliability insight text │ |
| └─────────┬───────────────────┘ |
| │ |
| ▼ |
| ┌─────────────────────────────┐ |
| │ FAISS + Sentence Transformers│ |
| │ → Embedding + similarity map │ |
| └─────────────────────────────┘ |
|
|
| --- |
|
|
| ## 🧾 API Usage |
|
|
| **Endpoint:** |
| `POST /add-event` |
|
|
| **Headers:** |
| `X-API-Key: <your_api_key>` |
|
|
| **Body:** |
| ```json |
| { |
| "component": "api-service", |
| "latency": 200, |
| "error_rate": 0.04 |
| } |
| |
| { |
| "status": "ok", |
| "event": { |
| "timestamp": "2025-11-08 23:29:03", |
| "component": "api-service", |
| "status": "Anomaly", |
| "analysis": "Error 404: Not Found", |
| "healing_action": "Restarted container Found 3 similar incidents ..." |
| } |
| } |
| |
| git clone https://github.com/petterjuan/agentic-reliability-framework.git |
| cd agentic-reliability-framework |
| pip install -r requirements.txt |
| python app.py |
| |
| Then open http://localhost:7860 |
| |
| 🌍 Live Space & Collaboration |
| |
| 👉 Launch Live Demo on Hugging Face |
| |
| 👉 Contribute or Fork on GitHub |
| |
| 🧭 Author |
| |
| Juan D. Petter |
| AI Engineer & Cloud Architect |
| Building Agentic Systems for Scalable Automation | ex-NetApp |
| 🔗 LinkedIn |
| • GitHub |
| |
| 🪪 License |
| |
| MIT License © 2025 Juan D. Petter |
| |
| |
| |