petter2025's picture
Update README.md
220196d verified
|
raw
history blame
3.82 kB
---
title: "Agentic Reliability Framework MVP"
emoji: "🧠"
colorFrom: "indigo"
colorTo: "blue"
sdk: "gradio"
sdk_version: "5.49.1"
app_file: "app.py"
pinned: true
python_version: "3.10"
license: "mit"
---
# 🧠 Agentic Reliability Framework MVP
**Adaptive anomaly detection + AI-driven self-healing + persistent FAISS memory.**
This project explores **agentic reliability systems** — blending observability, vector-based persistence, and AI inference to create self-healing cloud operations.
Built with:
-**Gradio 5.49.1** for live visualization & dashboard UI
- 🧩 **FastAPI** for REST endpoints (`/add-event`) with API key support
- 🧠 **Sentence Transformers** (`all-MiniLM-L6-v2`) for embedding-based anomaly memory
- 🔍 **FAISS** for similarity search across past incidents
- 🔒 **FileLock** for safe concurrent saves in multi-user environments
- 🤖 **Hugging Face Router Inference API** for adaptive reliability insights
- ☁️ **Python 3.10** runtime
---
## 🚀 Features
| Capability | Description |
|-------------|--------------|
| **Adaptive Anomaly Detection** | Detects anomalies dynamically based on latency and error-rate thresholds |
| **AI Root Cause Analysis** | Uses the Hugging Face Inference API for contextual one-line incident summaries |
| **Self-Healing Actions** | Simulates healing actions (scale-up, restart, etc.) |
| **Persistent Memory (FAISS)** | Learns from prior incidents, clusters patterns, and retrieves similar cases |
| **Secure REST API** | `/add-event` endpoint secured by `X-API-Key` header |
| **Interactive Gradio UI** | Visualize, test, and analyze events live in your browser |
---
## 🧠 Example Output
**Event Processed (Anomaly)**
Component: api-service
Latency: 224 ms
Error Rate: 0.062
Status: Anomaly
Analysis: Error 404: Not Found
Healing Action: Restarted container (Found 3 similar incidents)
---
## 🧩 Architecture Overview
┌──────────────────────┐
│ Gradio Frontend UI │
└─────────┬────────────┘
│ (submit telemetry)
┌──────────────────────┐
│ FastAPI /add-event │
│ + API Key validation │
└─────────┬────────────┘
│ (call)
┌─────────────────────────────┐
│ Hugging Face Inference API │
│ → Reliability insight text │
└─────────┬───────────────────┘
┌─────────────────────────────┐
│ FAISS + Sentence Transformers│
│ → Embedding + similarity map │
└─────────────────────────────┘
---
## 🧾 API Usage
**Endpoint:**
`POST /add-event`
**Headers:**
`X-API-Key: <your_api_key>`
**Body:**
```json
{
"component": "api-service",
"latency": 200,
"error_rate": 0.04
}
{
"status": "ok",
"event": {
"timestamp": "2025-11-08 23:29:03",
"component": "api-service",
"status": "Anomaly",
"analysis": "Error 404: Not Found",
"healing_action": "Restarted container Found 3 similar incidents ..."
}
}
git clone https://github.com/petterjuan/agentic-reliability-framework.git
cd agentic-reliability-framework
pip install -r requirements.txt
python app.py
Then open http://localhost:7860
🌍 Live Space & Collaboration
👉 Launch Live Demo on Hugging Face
👉 Contribute or Fork on GitHub
🧭 Author
Juan D. Petter
AI Engineer & Cloud Architect
Building Agentic Systems for Scalable Automation | ex-NetApp
🔗 LinkedIn
• GitHub
🪪 License
MIT License © 2025 Juan D. Petter