Spaces:

vn6295337
/

Enterprise-AI-Gateway

Sleeping

App Files Files Community

Enterprise-AI-Gateway / README.md

vn6295337

Initial commit: Enterprise-AI-Gateway - Secure LLM gateway

bb0c63f 5 months ago

preview code

raw

history blame contribute delete

10.7 kB

metadata

title: Enterprise-AI-Gateway
emoji: 🔐
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
short_description: Resilient AI Mesh - Secure, Cost-Aware, Speed-Optimized

Enterprise-AI-Gateway

Resilient AI mesh: secure, cost-aware, speed-optimized gateway for LLM applications.

Resource	Link
Live Demo	huggingface.co/spaces/vn6295337/Enterprise-AI-Gateway
Demo Video	github.com/vn6295337/Enterprise-AI-Gateway/issues/4
Business Guide	BUSINESS_README.md

The Problem

Enterprise AI adoption faces three critical barriers:

Reliability Risk — Single-provider dependencies create unacceptable downtime. When your LLM provider goes down, operations halt.
Security Exposure — LLM applications are vulnerable to prompt injection, PII leaks, and harmful content generation.
Compliance Uncertainty — Regulated industries need audit trails, content moderation, and demonstrable safety controls.

The Solution

A security-first API gateway that sits between your applications and LLM providers:

Multi-provider failover — Automatic cascade through 3 providers ensures 99.8% uptime
4-layer security pipeline — Auth, input validation, AI safety, and rate limiting
Compliance-ready — Full audit trails with cascade paths, latency, and cost tracking

Why This Matters

Most enterprise AI deployments fail not from bad models, but from lack of reliability and security controls. This architecture demonstrates how to build production-grade AI infrastructure—a pattern applicable to any domain requiring consistent, safe LLM access.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         USER REQUEST                            │
└─────────────────────────────┬───────────────────────────────────┘
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│  LAYER 1: AUTH & RATE LIMITING                                  │
│  • API Key validation (X-API-Key header)                        │
│  • DDoS protection (configurable rate limits)                   │
│  • Token limit enforcement (4096 max)                           │
└─────────────────────────────┬───────────────────────────────────┘
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│  LAYER 2: INPUT GUARD                                           │
│  • Prompt injection detection (regex patterns)                  │
│  • PII detection (SSN, credit cards, emails, API keys)          │
│  • SQL/Command injection patterns                               │
└─────────────────────────────┬───────────────────────────────────┘
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│  LAYER 3: AI SAFETY                                             │
│  Primary: Gemini 2.5 Flash classification                       │
│  Fallback: Lakera Guard API                                     │
│  Categories: Sexual, Hate, Harassment, Dangerous, Civic         │
└─────────────────────────────┬───────────────────────────────────┘
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│  LAYER 4: LLM ROUTER (CASCADE FAILOVER)                         │
│  ┌─────────────┐    ┌─────────────┐    ┌─────────────────┐     │
│  │   Gemini    │───▶│    Groq     │───▶│   OpenRouter    │     │
│  │  (Primary)  │    │ (Fallback 1)│    │  (Fallback 2)   │     │
│  └─────────────┘    └─────────────┘    └─────────────────┘     │
└─────────────────────────────┬───────────────────────────────────┘
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                        AI RESPONSE                              │
│  + provider, latency_ms, cascade_path, cost_estimate_usd        │
└─────────────────────────────────────────────────────────────────┘

Data Flow

Request → Auth → Rate Limit → Input Guard → AI Safety → LLM Router
                                                            ↓
Response ← Gemini ← fails? → Groq ← fails? → OpenRouter

Features

Component	Role	Implementation
Auth	API key validation	Constant-time comparison, env-based secrets
Rate Limiter	DDoS protection	SlowAPI, configurable per-minute limits
Input Guard	Injection/PII detection	Regex patterns for known attack vectors
AI Safety	Content moderation	Gemini classification + Lakera Guard fallback
LLM Router	Provider orchestration	Cascade failover with latency tracking
Metrics	Observability	Thread-safe store, real-time /metrics endpoint

Providers

Provider	Role	Free Tier	Avg Latency	Context Window
Gemini	Primary	15 RPM	~120ms	1M tokens
Groq	Fallback 1	30 RPM	~87ms	128K tokens
OpenRouter	Fallback 2	Varies	~200ms	Model-dependent

API Endpoints

Endpoint	Method	Auth	Description
`/`	GET	No	Interactive dashboard
`/health`	GET	No	Health check
`/query`	POST	Yes	LLM query with cascade fallback
`/check-toxicity`	POST	No	Content safety classification
`/metrics`	GET	No	Gateway performance metrics
`/providers`	GET	No	Provider config and pricing
`/batch/resilience`	POST	Yes	Batch resilience testing (up to 10 prompts)
`/batch/security`	POST	No	Batch PII/injection testing (up to 20 prompts)

Query Example

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -H "X-API-Key: YOUR_API_KEY" \
  -d '{"prompt": "What is machine learning?", "max_tokens": 150}'

Response:

{
  "response": "Machine learning is...",
  "provider": "gemini",
  "latency_ms": 120,
  "cascade_path": [{"provider": "gemini", "status": "success", "latency_ms": 120}],
  "cost_estimate_usd": 0.000015
}

Configuration

Required: SERVICE_API_KEY, GEMINI_API_KEY

Optional: GROQ_API_KEY, OPENROUTER_API_KEY, LAKERA_API_KEY, TOXICITY_THRESHOLD, RATE_LIMIT

Copy .env.example to .env and configure your keys. See Configuration Guide for full details.

Quick Start

git clone https://github.com/vn6295337/Enterprise-AI-Gateway.git
cd Enterprise-AI-Gateway
pip install -r requirements.txt

# Set at least one provider API key
export GEMINI_API_KEY="your-key"      # or
export GROQ_API_KEY="your-key"        # or
export OPENROUTER_API_KEY="your-key"

./start-app.sh

Development

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
uvicorn src.main:app --reload

Testing

python -m pytest tests/

Deployment

Docker

docker build -t llm-secure-gateway .
docker run -p 8000:8000 \
  -e SERVICE_API_KEY=your-key \
  -e GEMINI_API_KEY=your-gemini-key \
  llm-secure-gateway

Hugging Face Spaces

Create Space at huggingface.co/new-space
Select "Docker" SDK
Add repository as source
Configure Secrets with API keys

Roadmap

Streaming responses via Server-Sent Events
Redis-based rate limiting for horizontal scaling
Custom safety policies per organization
Provider performance analytics dashboard
Webhook notifications for blocked requests

Contributing

Contributions welcome! Please:

Fork the repository
Create a feature branch
Implement changes with tests
Submit a pull request

Documentation

Doc	Description
API Reference	Complete endpoint documentation
Architecture	System design deep dive
Security Overview	Security layers and threat model
Configuration	Environment variables reference
Deployment	Docker and cloud deployment
FAQ	Frequently asked questions

License

MIT License