Spaces:

vn6295337
/

Enterprise-AI-Gateway

Sleeping

App Files Files Community

Enterprise-AI-Gateway / docs /faq.md

vn6295337

Initial commit: Enterprise-AI-Gateway - Secure LLM gateway

bb0c63f 4 months ago

preview code

raw

history blame contribute delete

7.81 kB

Frequently Asked Questions (FAQ)

Primary Responsibility: Q&A format help with cross-references to detailed docs

This document answers common questions about the Enterprise AI Gateway.

General Questions
Technical Questions
Security Questions
Deployment Questions
Usage Questions
Troubleshooting

General Questions

What is the Enterprise AI Gateway?

The Enterprise AI Gateway is a production-ready API gateway that provides secure, reliable access to multiple Large Language Model (LLM) providers with built-in failover, security controls, and governance features. It acts as an intelligent intermediary between your applications and LLM providers.

Why should I use the Enterprise AI Gateway instead of calling LLM providers directly?

The Enterprise AI Gateway provides several advantages over direct LLM calls:

High Availability: Automatic failover between multiple providers ensures 99.8% uptime
Security: Built-in authentication, rate limiting, input validation, and prompt injection protection
Cost Optimization: Intelligent routing and token budget enforcement reduce costs
Simplified Integration: Single API endpoint instead of managing multiple provider APIs
Governance: Centralized control over AI access and usage
Performance: Optimized response times and caching capabilities

Is the Enterprise AI Gateway free to use?

Yes, the Enterprise AI Gateway is open-source and free to use. You only pay for the LLM provider APIs you use through the gateway. The gateway itself can be deployed on free-tier infrastructure like Hugging Face Spaces.

Which LLM providers does the gateway support?

The gateway currently supports:

Google Gemini (primary provider)
Groq (secondary fallback)
OpenRouter (tertiary fallback)

Additional providers can be added through custom integration.

Technical Questions

What programming language and framework are used?

The Enterprise AI Gateway is built with:

Python 3.8+ as the primary language
FastAPI as the web framework
Uvicorn as the ASGI server
Pydantic for data validation
SlowAPI for rate limiting

How does the multi-provider failover work?

The gateway implements a sequential cascade approach:

Try primary provider (Gemini)
If failed, try secondary provider (Groq)
If failed, try tertiary provider (OpenRouter)
Return error if all providers fail

Each provider attempt includes timeout handling and error recovery.

What are the system requirements?

Minimum requirements:

Python 3.8 or higher
512 MB RAM
100 MB disk space
Internet connectivity

Recommended for production:

Python 3.11+
1 GB RAM
500 MB disk space
Stable broadband connection

How is the gateway configured?

Configuration is done through environment variables. See Configuration Guide for the complete list of required and optional settings.

What APIs does the gateway provide?

See API Reference for complete documentation.

Main endpoints:

POST /query - LLM query with security checks
POST /check-toxicity - Content safety check
GET /health - Service health status
GET /metrics - Performance metrics

Security Questions

For detailed security documentation, see Security Overview.

How secure is the Enterprise AI Gateway?

The gateway implements 4 security layers (Authentication, Input Guard, AI Safety, Transport Security). See Security Overview for the complete security architecture.

How are API keys protected?

Stored as environment variables (never in code)
Constant-time comparison (prevents timing attacks)
Easy rotation process

What prompt injection attacks are detected?

The gateway uses a dual-layer detection system:

Pattern-Based Detection (Layer 2 - Input Guard):

ignore all previous instructions
disregard all prior instructions
you are now
system:

AI-Based Detection (Layer 3 - AI Safety):

Gemini 2.5 Flash content classification
Lakera Guard fallback for advanced jailbreak attempts

What harmful content is blocked?

The AI Safety layer blocks sexually explicit content, hate speech, harassment, dangerous content, and civic integrity violations. See Security Overview for the complete list of harm categories and how the Gemini + Lakera Guard fallback works.

How does rate limiting work?

By default, the gateway implements IP-based rate limiting:

10 requests per minute per IP address
Configurable through RATE_LIMIT environment variable
Returns 429 status code when limit exceeded

Note: In cloud environments with proxies, all requests may appear from the same IP.

Deployment Questions

Where can I deploy the Enterprise AI Gateway?

The gateway can be deployed on:

Hugging Face Spaces (recommended for getting started)
Docker Containers (any cloud platform)
Traditional Cloud Providers (AWS, GCP, Azure)
On-premises Servers
Kubernetes Clusters

How do I deploy to Hugging Face Spaces?

See Deployment Guide - Hugging Face Spaces for step-by-step instructions.

Can I run the gateway in a serverless environment?

Yes, the gateway can be adapted for serverless deployment, though it works best with stateful deployments due to rate limiting requirements. For serverless, consider:

Using API key-based rate limiting instead of IP-based
Implementing external state management for rate limiting
Adapting the cascade logic for short-lived executions

How do I build a Docker image?

Build the Docker image from source:

git clone https://github.com/vn6295337/Enterprise-AI-Gateway.git
cd Enterprise-AI-Gateway
docker build -t enterprise-ai-gateway .

See Deployment Guide for complete Docker deployment instructions.

Usage Questions

How do I get started quickly?

See the Quick Start in the documentation index.

What's the format for querying the LLM?

Send a POST request to /query with this JSON format:

{
  "prompt": "Your question here",
  "max_tokens": 256,
  "temperature": 0.7
}

Include your API key in the X-API-Key header.

How do I handle errors from the gateway?

The gateway returns standard HTTP status codes:

200: Success
401: Invalid or missing API key
422: Invalid request parameters or blocked prompt
429: Rate limit exceeded
500: All LLM providers failed

Check the response body for detailed error messages.

Can I customize the prompt injection detection patterns?

Yes, the injection patterns are defined in src/main.py:

INJECTION_PATTERNS = [
    r"ignore\s+(all\s+)?(previous|above|prior)\s+instructions?",
    r"disregard\s+(all\s+)?(previous|above|prior)\s+instructions?",
    r"you\s+are\s+now",
    r"system\s*:\s*",
]

You can modify these patterns or add new ones as needed.

Troubleshooting

For detailed troubleshooting, see Troubleshooting Guide.

Quick fixes:

401 errors → Check X-API-Key header and SERVICE_API_KEY env var
422 errors → Check prompt length (1-4000 chars) and injection patterns
500 errors → Check provider API keys and quotas
Slow responses → Check provider status and network latency

Getting More Help

Check the Troubleshooting Guide
Review GitHub Issues
Open a new issue with your question