Spaces:
Sleeping
title: Enterprise-AI-Gateway
emoji: π
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
short_description: Resilient AI Mesh - Secure, Cost-Aware, Speed-Optimized
Enterprise-AI-Gateway
Resilient AI mesh: secure, cost-aware, speed-optimized gateway for LLM applications.
| Resource | Link |
|---|---|
| Live Demo | huggingface.co/spaces/vn6295337/Enterprise-AI-Gateway |
| Demo Video | github.com/vn6295337/Enterprise-AI-Gateway/issues/4 |
| Business Guide | BUSINESS_README.md |
The Problem
Enterprise AI adoption faces three critical barriers:
- Reliability Risk β Single-provider dependencies create unacceptable downtime. When your LLM provider goes down, operations halt.
- Security Exposure β LLM applications are vulnerable to prompt injection, PII leaks, and harmful content generation.
- Compliance Uncertainty β Regulated industries need audit trails, content moderation, and demonstrable safety controls.
The Solution
A security-first API gateway that sits between your applications and LLM providers:
- Multi-provider failover β Automatic cascade through 3 providers ensures 99.8% uptime
- 4-layer security pipeline β Auth, input validation, AI safety, and rate limiting
- Compliance-ready β Full audit trails with cascade paths, latency, and cost tracking
Why This Matters
Most enterprise AI deployments fail not from bad models, but from lack of reliability and security controls. This architecture demonstrates how to build production-grade AI infrastructureβa pattern applicable to any domain requiring consistent, safe LLM access.
Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β USER REQUEST β
βββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LAYER 1: AUTH & RATE LIMITING β
β β’ API Key validation (X-API-Key header) β
β β’ DDoS protection (configurable rate limits) β
β β’ Token limit enforcement (4096 max) β
βββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LAYER 2: INPUT GUARD β
β β’ Prompt injection detection (regex patterns) β
β β’ PII detection (SSN, credit cards, emails, API keys) β
β β’ SQL/Command injection patterns β
βββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LAYER 3: AI SAFETY β
β Primary: Gemini 2.5 Flash classification β
β Fallback: Lakera Guard API β
β Categories: Sexual, Hate, Harassment, Dangerous, Civic β
βββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LAYER 4: LLM ROUTER (CASCADE FAILOVER) β
β βββββββββββββββ βββββββββββββββ βββββββββββββββββββ β
β β Gemini βββββΆβ Groq βββββΆβ OpenRouter β β
β β (Primary) β β (Fallback 1)β β (Fallback 2) β β
β βββββββββββββββ βββββββββββββββ βββββββββββββββββββ β
βββββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β AI RESPONSE β
β + provider, latency_ms, cascade_path, cost_estimate_usd β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Data Flow
Request β Auth β Rate Limit β Input Guard β AI Safety β LLM Router
β
Response β Gemini β fails? β Groq β fails? β OpenRouter
Features
| Component | Role | Implementation |
|---|---|---|
| Auth | API key validation | Constant-time comparison, env-based secrets |
| Rate Limiter | DDoS protection | SlowAPI, configurable per-minute limits |
| Input Guard | Injection/PII detection | Regex patterns for known attack vectors |
| AI Safety | Content moderation | Gemini classification + Lakera Guard fallback |
| LLM Router | Provider orchestration | Cascade failover with latency tracking |
| Metrics | Observability | Thread-safe store, real-time /metrics endpoint |
Providers
| Provider | Role | Free Tier | Avg Latency | Context Window |
|---|---|---|---|---|
| Gemini | Primary | 15 RPM | ~120ms | 1M tokens |
| Groq | Fallback 1 | 30 RPM | ~87ms | 128K tokens |
| OpenRouter | Fallback 2 | Varies | ~200ms | Model-dependent |
API Endpoints
| Endpoint | Method | Auth | Description |
|---|---|---|---|
/ |
GET | No | Interactive dashboard |
/health |
GET | No | Health check |
/query |
POST | Yes | LLM query with cascade fallback |
/check-toxicity |
POST | No | Content safety classification |
/metrics |
GET | No | Gateway performance metrics |
/providers |
GET | No | Provider config and pricing |
/batch/resilience |
POST | Yes | Batch resilience testing (up to 10 prompts) |
/batch/security |
POST | No | Batch PII/injection testing (up to 20 prompts) |
Query Example
curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-H "X-API-Key: YOUR_API_KEY" \
-d '{"prompt": "What is machine learning?", "max_tokens": 150}'
Response:
{
"response": "Machine learning is...",
"provider": "gemini",
"latency_ms": 120,
"cascade_path": [{"provider": "gemini", "status": "success", "latency_ms": 120}],
"cost_estimate_usd": 0.000015
}
Configuration
Required: SERVICE_API_KEY, GEMINI_API_KEY
Optional: GROQ_API_KEY, OPENROUTER_API_KEY, LAKERA_API_KEY, TOXICITY_THRESHOLD, RATE_LIMIT
Copy .env.example to .env and configure your keys. See Configuration Guide for full details.
Quick Start
git clone https://github.com/vn6295337/Enterprise-AI-Gateway.git
cd Enterprise-AI-Gateway
pip install -r requirements.txt
# Set at least one provider API key
export GEMINI_API_KEY="your-key" # or
export GROQ_API_KEY="your-key" # or
export OPENROUTER_API_KEY="your-key"
./start-app.sh
Development
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
uvicorn src.main:app --reload
Testing
python -m pytest tests/
Deployment
Docker
docker build -t llm-secure-gateway .
docker run -p 8000:8000 \
-e SERVICE_API_KEY=your-key \
-e GEMINI_API_KEY=your-gemini-key \
llm-secure-gateway
Hugging Face Spaces
- Create Space at huggingface.co/new-space
- Select "Docker" SDK
- Add repository as source
- Configure Secrets with API keys
Roadmap
- Streaming responses via Server-Sent Events
- Redis-based rate limiting for horizontal scaling
- Custom safety policies per organization
- Provider performance analytics dashboard
- Webhook notifications for blocked requests
Contributing
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Implement changes with tests
- Submit a pull request
Documentation
| Doc | Description |
|---|---|
| API Reference | Complete endpoint documentation |
| Architecture | System design deep dive |
| Security Overview | Security layers and threat model |
| Configuration | Environment variables reference |
| Deployment | Docker and cloud deployment |
| FAQ | Frequently asked questions |
License
MIT License