vn6295337's picture
Initial commit: Enterprise-AI-Gateway - Secure LLM gateway
bb0c63f
metadata
title: Enterprise-AI-Gateway
emoji: πŸ”
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
short_description: Resilient AI Mesh - Secure, Cost-Aware, Speed-Optimized

Enterprise-AI-Gateway

Resilient AI mesh: secure, cost-aware, speed-optimized gateway for LLM applications.

Python 3.8+ License: MIT


The Problem

Enterprise AI adoption faces three critical barriers:

  • Reliability Risk β€” Single-provider dependencies create unacceptable downtime. When your LLM provider goes down, operations halt.
  • Security Exposure β€” LLM applications are vulnerable to prompt injection, PII leaks, and harmful content generation.
  • Compliance Uncertainty β€” Regulated industries need audit trails, content moderation, and demonstrable safety controls.

The Solution

A security-first API gateway that sits between your applications and LLM providers:

  • Multi-provider failover β€” Automatic cascade through 3 providers ensures 99.8% uptime
  • 4-layer security pipeline β€” Auth, input validation, AI safety, and rate limiting
  • Compliance-ready β€” Full audit trails with cascade paths, latency, and cost tracking

Why This Matters

Most enterprise AI deployments fail not from bad models, but from lack of reliability and security controls. This architecture demonstrates how to build production-grade AI infrastructureβ€”a pattern applicable to any domain requiring consistent, safe LLM access.


Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                         USER REQUEST                            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  LAYER 1: AUTH & RATE LIMITING                                  β”‚
β”‚  β€’ API Key validation (X-API-Key header)                        β”‚
β”‚  β€’ DDoS protection (configurable rate limits)                   β”‚
β”‚  β€’ Token limit enforcement (4096 max)                           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  LAYER 2: INPUT GUARD                                           β”‚
β”‚  β€’ Prompt injection detection (regex patterns)                  β”‚
β”‚  β€’ PII detection (SSN, credit cards, emails, API keys)          β”‚
β”‚  β€’ SQL/Command injection patterns                               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  LAYER 3: AI SAFETY                                             β”‚
β”‚  Primary: Gemini 2.5 Flash classification                       β”‚
β”‚  Fallback: Lakera Guard API                                     β”‚
β”‚  Categories: Sexual, Hate, Harassment, Dangerous, Civic         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  LAYER 4: LLM ROUTER (CASCADE FAILOVER)                         β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
β”‚  β”‚   Gemini    │───▢│    Groq     │───▢│   OpenRouter    β”‚     β”‚
β”‚  β”‚  (Primary)  β”‚    β”‚ (Fallback 1)β”‚    β”‚  (Fallback 2)   β”‚     β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                              β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                        AI RESPONSE                              β”‚
β”‚  + provider, latency_ms, cascade_path, cost_estimate_usd        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Data Flow

Request β†’ Auth β†’ Rate Limit β†’ Input Guard β†’ AI Safety β†’ LLM Router
                                                            ↓
Response ← Gemini ← fails? β†’ Groq ← fails? β†’ OpenRouter

Features

Component Role Implementation
Auth API key validation Constant-time comparison, env-based secrets
Rate Limiter DDoS protection SlowAPI, configurable per-minute limits
Input Guard Injection/PII detection Regex patterns for known attack vectors
AI Safety Content moderation Gemini classification + Lakera Guard fallback
LLM Router Provider orchestration Cascade failover with latency tracking
Metrics Observability Thread-safe store, real-time /metrics endpoint

Providers

Provider Role Free Tier Avg Latency Context Window
Gemini Primary 15 RPM ~120ms 1M tokens
Groq Fallback 1 30 RPM ~87ms 128K tokens
OpenRouter Fallback 2 Varies ~200ms Model-dependent

API Endpoints

Endpoint Method Auth Description
/ GET No Interactive dashboard
/health GET No Health check
/query POST Yes LLM query with cascade fallback
/check-toxicity POST No Content safety classification
/metrics GET No Gateway performance metrics
/providers GET No Provider config and pricing
/batch/resilience POST Yes Batch resilience testing (up to 10 prompts)
/batch/security POST No Batch PII/injection testing (up to 20 prompts)

Query Example

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_API_KEY" \
  -d '{"prompt": "What is machine learning?", "max_tokens": 150}'

Response:

{
  "response": "Machine learning is...",
  "provider": "gemini",
  "latency_ms": 120,
  "cascade_path": [{"provider": "gemini", "status": "success", "latency_ms": 120}],
  "cost_estimate_usd": 0.000015
}

Configuration

Required: SERVICE_API_KEY, GEMINI_API_KEY

Optional: GROQ_API_KEY, OPENROUTER_API_KEY, LAKERA_API_KEY, TOXICITY_THRESHOLD, RATE_LIMIT

Copy .env.example to .env and configure your keys. See Configuration Guide for full details.


Quick Start

git clone https://github.com/vn6295337/Enterprise-AI-Gateway.git
cd Enterprise-AI-Gateway
pip install -r requirements.txt

# Set at least one provider API key
export GEMINI_API_KEY="your-key"      # or
export GROQ_API_KEY="your-key"        # or
export OPENROUTER_API_KEY="your-key"

./start-app.sh

Development

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
uvicorn src.main:app --reload

Testing

python -m pytest tests/

Deployment

Docker

docker build -t llm-secure-gateway .
docker run -p 8000:8000 \
  -e SERVICE_API_KEY=your-key \
  -e GEMINI_API_KEY=your-gemini-key \
  llm-secure-gateway

Hugging Face Spaces

  1. Create Space at huggingface.co/new-space
  2. Select "Docker" SDK
  3. Add repository as source
  4. Configure Secrets with API keys

Roadmap

  • Streaming responses via Server-Sent Events
  • Redis-based rate limiting for horizontal scaling
  • Custom safety policies per organization
  • Provider performance analytics dashboard
  • Webhook notifications for blocked requests

Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Implement changes with tests
  4. Submit a pull request

Documentation

Doc Description
API Reference Complete endpoint documentation
Architecture System design deep dive
Security Overview Security layers and threat model
Configuration Environment variables reference
Deployment Docker and cloud deployment
FAQ Frequently asked questions

License

MIT License