Spaces:

vn6295337
/

Enterprise-AI-Gateway

Sleeping

App Files Files Community

Enterprise-AI-Gateway / README.md

vn6295337

Initial commit: Enterprise-AI-Gateway - Secure LLM gateway

bb0c63f 5 months ago

preview code

raw

history blame contribute delete

10.7 kB

	---
	title: Enterprise-AI-Gateway
	emoji: 🔐
	colorFrom: blue
	colorTo: purple
	sdk: docker
	pinned: false
	short_description: "Resilient AI Mesh - Secure, Cost-Aware, Speed-Optimized"
	---

	# Enterprise-AI-Gateway

	Resilient AI mesh: secure, cost-aware, speed-optimized gateway for LLM applications.

	[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
	[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

	\| Resource \| Link \|
	\|----------\|------\|
	\| Live Demo \| [huggingface.co/spaces/vn6295337/Enterprise-AI-Gateway](https://huggingface.co/spaces/vn6295337/Enterprise-AI-Gateway) \|
	\| Demo Video \| [github.com/vn6295337/Enterprise-AI-Gateway/issues/4](http://github.com/vn6295337/Enterprise-AI-Gateway/issues/4) \|
	\| Business Guide \| [BUSINESS_README.md](BUSINESS_README.md) \|

	---

	## The Problem

	Enterprise AI adoption faces three critical barriers:

	- Reliability Risk — Single-provider dependencies create unacceptable downtime. When your LLM provider goes down, operations halt.
	- Security Exposure — LLM applications are vulnerable to prompt injection, PII leaks, and harmful content generation.
	- Compliance Uncertainty — Regulated industries need audit trails, content moderation, and demonstrable safety controls.

	## The Solution

	A security-first API gateway that sits between your applications and LLM providers:

	- Multi-provider failover — Automatic cascade through 3 providers ensures 99.8% uptime
	- 4-layer security pipeline — Auth, input validation, AI safety, and rate limiting
	- Compliance-ready — Full audit trails with cascade paths, latency, and cost tracking

	## Why This Matters

	Most enterprise AI deployments fail not from bad models, but from lack of reliability and security controls. This architecture demonstrates how to build production-grade AI infrastructure—a pattern applicable to any domain requiring consistent, safe LLM access.

	---

	## Architecture

	```
	┌─────────────────────────────────────────────────────────────────┐
	│ USER REQUEST │
	└─────────────────────────────┬───────────────────────────────────┘
	▼
	┌─────────────────────────────────────────────────────────────────┐
	│ LAYER 1: AUTH & RATE LIMITING │
	│ • API Key validation (X-API-Key header) │
	│ • DDoS protection (configurable rate limits) │
	│ • Token limit enforcement (4096 max) │
	└─────────────────────────────┬───────────────────────────────────┘
	▼
	┌─────────────────────────────────────────────────────────────────┐
	│ LAYER 2: INPUT GUARD │
	│ • Prompt injection detection (regex patterns) │
	│ • PII detection (SSN, credit cards, emails, API keys) │
	│ • SQL/Command injection patterns │
	└─────────────────────────────┬───────────────────────────────────┘
	▼
	┌─────────────────────────────────────────────────────────────────┐
	│ LAYER 3: AI SAFETY │
	│ Primary: Gemini 2.5 Flash classification │
	│ Fallback: Lakera Guard API │
	│ Categories: Sexual, Hate, Harassment, Dangerous, Civic │
	└─────────────────────────────┬───────────────────────────────────┘
	▼
	┌─────────────────────────────────────────────────────────────────┐
	│ LAYER 4: LLM ROUTER (CASCADE FAILOVER) │
	│ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐ │
	│ │ Gemini │───▶│ Groq │───▶│ OpenRouter │ │
	│ │ (Primary) │ │ (Fallback 1)│ │ (Fallback 2) │ │
	│ └─────────────┘ └─────────────┘ └─────────────────┘ │
	└─────────────────────────────┬───────────────────────────────────┘
	▼
	┌─────────────────────────────────────────────────────────────────┐
	│ AI RESPONSE │
	│ + provider, latency_ms, cascade_path, cost_estimate_usd │
	└─────────────────────────────────────────────────────────────────┘
	```

	### Data Flow

	```
	Request → Auth → Rate Limit → Input Guard → AI Safety → LLM Router
	↓
	Response ← Gemini ← fails? → Groq ← fails? → OpenRouter
	```

	---

	## Features

	\| Component \| Role \| Implementation \|
	\|-----------\|------\|----------------\|
	\| Auth \| API key validation \| Constant-time comparison, env-based secrets \|
	\| Rate Limiter \| DDoS protection \| SlowAPI, configurable per-minute limits \|
	\| Input Guard \| Injection/PII detection \| Regex patterns for known attack vectors \|
	\| AI Safety \| Content moderation \| Gemini classification + Lakera Guard fallback \|
	\| LLM Router \| Provider orchestration \| Cascade failover with latency tracking \|
	\| Metrics \| Observability \| Thread-safe store, real-time /metrics endpoint \|

	---

	## Providers

	\| Provider \| Role \| Free Tier \| Avg Latency \| Context Window \|
	\|----------\|------\|-----------\|-------------\|----------------\|
	\| Gemini \| Primary \| 15 RPM \| ~120ms \| 1M tokens \|
	\| Groq \| Fallback 1 \| 30 RPM \| ~87ms \| 128K tokens \|
	\| OpenRouter \| Fallback 2 \| Varies \| ~200ms \| Model-dependent \|

	---

	## API Endpoints

	\| Endpoint \| Method \| Auth \| Description \|
	\|----------\|--------\|------\|-------------\|
	\| `/` \| GET \| No \| Interactive dashboard \|
	\| `/health` \| GET \| No \| Health check \|
	\| `/query` \| POST \| Yes \| LLM query with cascade fallback \|
	\| `/check-toxicity` \| POST \| No \| Content safety classification \|
	\| `/metrics` \| GET \| No \| Gateway performance metrics \|
	\| `/providers` \| GET \| No \| Provider config and pricing \|
	\| `/batch/resilience` \| POST \| Yes \| Batch resilience testing (up to 10 prompts) \|
	\| `/batch/security` \| POST \| No \| Batch PII/injection testing (up to 20 prompts) \|

	### Query Example

	```bash
	curl -X POST http://localhost:8000/query \
	-H "Content-Type: application/json" \
	-H "X-API-Key: YOUR_API_KEY" \
	-d '{"prompt": "What is machine learning?", "max_tokens": 150}'
	```

	Response:
	```json
	{
	"response": "Machine learning is...",
	"provider": "gemini",
	"latency_ms": 120,
	"cascade_path": [{"provider": "gemini", "status": "success", "latency_ms": 120}],
	"cost_estimate_usd": 0.000015
	}
	```

	---

	## Configuration

	Required: `SERVICE_API_KEY`, `GEMINI_API_KEY`

	Optional: `GROQ_API_KEY`, `OPENROUTER_API_KEY`, `LAKERA_API_KEY`, `TOXICITY_THRESHOLD`, `RATE_LIMIT`

	Copy `.env.example` to `.env` and configure your keys. See [Configuration Guide](docs/configuration.md) for full details.

	---

	## Quick Start

	```bash
	git clone https://github.com/vn6295337/Enterprise-AI-Gateway.git
	cd Enterprise-AI-Gateway
	pip install -r requirements.txt

	# Set at least one provider API key
	export GEMINI_API_KEY="your-key" # or
	export GROQ_API_KEY="your-key" # or
	export OPENROUTER_API_KEY="your-key"

	./start-app.sh
	```

	---

	## Development

	```bash
	python3 -m venv venv
	source venv/bin/activate
	pip install -r requirements.txt
	uvicorn src.main:app --reload
	```

	---

	## Testing

	```bash
	python -m pytest tests/
	```

	---

	## Deployment

	### Docker

	```bash
	docker build -t llm-secure-gateway .
	docker run -p 8000:8000 \
	-e SERVICE_API_KEY=your-key \
	-e GEMINI_API_KEY=your-gemini-key \
	llm-secure-gateway
	```

	### Hugging Face Spaces

	1. Create Space at [huggingface.co/new-space](https://huggingface.co/new-space)
	2. Select "Docker" SDK
	3. Add repository as source
	4. Configure Secrets with API keys

	---

	## Roadmap

	- [ ] Streaming responses via Server-Sent Events
	- [ ] Redis-based rate limiting for horizontal scaling
	- [ ] Custom safety policies per organization
	- [ ] Provider performance analytics dashboard
	- [ ] Webhook notifications for blocked requests

	---

	## Contributing

	Contributions welcome! Please:

	1. Fork the repository
	2. Create a feature branch
	3. Implement changes with tests
	4. Submit a pull request

	---

	## Documentation

	\| Doc \| Description \|
	\|-----\|-------------\|
	\| [API Reference](docs/api_reference.md) \| Complete endpoint documentation \|
	\| [Architecture](docs/architecture.md) \| System design deep dive \|
	\| [Security Overview](docs/security_overview.md) \| Security layers and threat model \|
	\| [Configuration](docs/configuration.md) \| Environment variables reference \|
	\| [Deployment](docs/deployment.md) \| Docker and cloud deployment \|
	\| [FAQ](docs/faq.md) \| Frequently asked questions \|

	---

	## License

	MIT License