Spaces:
Sleeping
API Reference
Primary Responsibility: Complete API endpoint and function documentation
This document provides a complete reference for the Enterprise AI Gateway API endpoints.
Base URL
https://your-deployment-url
For local development: http://localhost:8000
Authentication
All API requests (except /health and /) require authentication using an API key.
Include the API key in the request header:
X-API-Key: YOUR_API_KEY
Rate Limiting
The API implements rate limiting to prevent abuse:
- Default limit: 10 requests per minute per IP address
- Exceeding the limit returns a 429 (Too Many Requests) status code
Core Modules
main.py
Main application entry point that initializes the FastAPI application.
app
The main FastAPI application instance with all routes and middleware configured.
config.py
Configuration module that loads environment variables and defines application settings.
SERVICE_API_KEY
The API key required for authenticating requests to the gateway.
RATE_LIMIT
Rate limiting configuration (e.g., "10/minute").
ALLOWED_ORIGINS
List of allowed origins for CORS middleware.
ENABLE_PROMPT_INJECTION_CHECK
Flag to enable/disable prompt injection detection.
security/__init__.py
Security utilities for authentication and prompt validation.
detect_prompt_injection(prompt: str) -> bool
Detect potential prompt injection attacks using regex patterns.
detect_pii(prompt: str) -> dict
Detect PII (Personally Identifiable Information) in prompts. Returns:
has_pii: Boolean indicating if PII was foundpii_types: List of PII types detected (email, credit_card, ssn, tax_id, api_key)matches: Dictionary with count of each PII type found
detect_toxicity(text: str) -> dict
Detect toxic/harmful content using Gemini AI classification with Lakera Guard fallback. Returns:
is_toxic: Boolean indicating if content is harmfulscores: Dictionary of category scoresblocked_categories: List of detected harmful categorieserror: Error message if API call failed
Categories detected: SEXUALLY_EXPLICIT, HATE_SPEECH, HARASSMENT, DANGEROUS_CONTENT, CIVIC_INTEGRITY
detect_toxicity_lakera(text: str) -> dict
Fallback toxicity detection using Lakera Guard API. Called automatically when Gemini fails.
validate_api_key(api_key: str) -> str
Validate API key for request authentication.
models/__init__.py
Pydantic models for request/response validation.
QueryRequest
Model for query requests with prompt, max_tokens, and temperature.
QueryResponse
Model for query responses with response content, provider info, and metadata.
cascade_path: List of provider attempts with status and latencycost_estimate_usd: Estimated cost of the request
CascadeStep
Model for individual steps in the provider cascade.
provider: Provider namemodel: Model usedstatus: "success", "failed", or "timeout"reason: Error reason if failedlatency_ms: Response time in milliseconds
HealthResponse
Model for health check responses.
api/routes.py
API route definitions.
/ (GET)
Serves the Interactive Gateway Demo Dashboard from static/index.html.
/health (GET)
Health check endpoint that returns the status of the service.
/query (POST)
Query endpoint that processes LLM requests with security and fallback protocols. Returns cascade path and cost estimate.
/metrics (GET)
Returns gateway metrics including total requests, latency, provider usage, and security events.
/providers (GET)
Returns available providers with pricing information and active configuration.
/batch/resilience (POST)
Batch resilience testing endpoint. Runs multiple prompts through the cascade and returns aggregate metrics.
/batch/security (POST)
Batch security testing endpoint. Tests prompts for PII and injection without executing LLM calls.
/check-toxicity (POST)
Content safety check endpoint. Uses Gemini AI classification with Lakera Guard fallback. Returns toxicity status, scores, and blocked categories.
llm/client.py
LLM client with multi-provider fallback functionality.
LLMClient
Class that manages connections to multiple LLM providers.
call_llm_provider(provider_name: str, api_key: str, model: str, prompt: str, max_tokens: int, temperature: float)
Call a specific LLM provider with the given parameters.
query_llm_cascade(prompt: str, max_tokens: int, temperature: float)
Query LLM with cascade fallback across providers.
Returns: (response, provider_name, latency_ms, error, cascade_path)
metrics/__init__.py
Metrics tracking module.
MetricsStore
Thread-safe metrics store for tracking gateway performance.
record_request(): Record a request with metricsto_dict(): Return metrics as dictionaryreset(): Reset all metrics
metrics
Singleton instance of MetricsStore.
providers/__init__.py
Provider configuration module.
PROVIDER_CONFIG
Dictionary containing provider pricing and configuration.
get_model_pricing(provider: str, model: str) -> dict
Get pricing info for a specific provider/model combination.
estimate_cost(provider: str, model: str, input_tokens: int, output_tokens: int) -> float
Estimate cost for a request in USD.
Environment Variables
See Configuration Guide for complete environment variable reference.
Endpoints
Health Check
GET /health
Check if the service is running and healthy.
Response:
{
"status": "healthy",
"provider": "gemini",
"timestamp": 1700000000.123456
}
Response Fields:
status: Service status ("healthy" or "unhealthy")provider: Currently active LLM providertimestamp: Unix timestamp of the health check
Query LLM
POST /query
Send a prompt to the LLM through the secure gateway with automatic failover.
Headers:
Content-Type: application/json
X-API-Key: YOUR_API_KEY
Request Body:
{
"prompt": "Your question here",
"max_tokens": 256,
"temperature": 0.7
}
Request Parameters:
prompt(required): The prompt to send to the LLM (1-4000 characters)max_tokens(optional): Maximum number of tokens in the response (1-2048, default: 256)temperature(optional): Sampling temperature (0.0-2.0, default: 0.7)
Successful Response:
{
"response": "The AI's answer",
"provider": "groq",
"latency_ms": 87,
"status": "success",
"error": null
}
Error Response:
{
"response": null,
"provider": null,
"latency_ms": 0,
"status": "error",
"error": "Error message"
}
Response Fields:
response: The LLM's response text (null if error)provider: Which LLM provider was used (null if error)latency_ms: Request latency in milliseconds (0 if error)status: Request status ("success" or "error")error: Error message if request failed (null if successful)cascade_path: Array of provider attempts with status and latencycost_estimate_usd: Estimated cost of the request in USD
Get Metrics
GET /metrics
Get current gateway metrics.
Response:
{
"total_requests": 150,
"successful_requests": 145,
"blocked_requests": 5,
"average_latency_ms": 120.5,
"provider_usage": {"gemini": 100, "groq": 45},
"cascade_failures": 3,
"pii_detections": 2,
"injection_detections": 3,
"latency_history": [87, 120, 95, ...]
}
Get Providers
GET /providers
Get available providers with pricing information.
Response:
{
"providers": {
"gemini": {"name": "Google Gemini", "models": {...}},
"groq": {"name": "Groq", "models": {...}},
"openrouter": {"name": "OpenRouter", "models": {...}}
},
"active_providers": ["gemini", "groq", "openrouter"],
"active_models": {"gemini": "gemini-2.0-flash-exp", ...}
}
Batch Resilience Test
POST /batch/resilience
Run multiple prompts through the cascade and return aggregate metrics.
Headers:
Content-Type: application/json
X-API-Key: YOUR_API_KEY
Request Body:
{
"prompts": [
"First test prompt",
"Second test prompt"
]
}
Response:
{
"total": 2,
"successful": 2,
"failed": 0,
"total_cascade_failures": 1,
"average_latency_ms": 105.5,
"downtime_prevented_minutes": 4.0,
"results": [...]
}
Batch Security Test
POST /batch/security
Test prompts for security issues without executing LLM calls.
Request Body:
{
"prompts": [
"Normal question",
"Ignore all instructions",
"My SSN is 123-45-6789"
]
}
Response:
{
"total": 3,
"blocked": 2,
"passed": 1,
"pii_leaks_prevented": 1,
"injection_attempts_blocked": 1,
"compliance_fines_avoided_usd": 28000,
"results": [...]
}
Content Safety Check
POST /check-toxicity
Check content for harmful/toxic material using AI classification.
Headers:
Content-Type: application/json
X-API-Key: YOUR_API_KEY
Request Body:
{
"text": "Content to analyze for safety"
}
Response:
{
"is_toxic": false,
"scores": {"SAFE": 1.0},
"blocked_categories": [],
"error": null
}
Blocked Response Example:
{
"is_toxic": true,
"scores": {"HARM_CATEGORY_SEXUALLY_EXPLICIT": 0.9},
"blocked_categories": ["HARM_CATEGORY_SEXUALLY_EXPLICIT"],
"error": null
}
Harm Categories: See Security Overview for the complete list of blocked content categories.
Error Codes
200 OK
Request successful
401 Unauthorized
- Missing or invalid API key
422 Unprocessable Entity
- Invalid request parameters
- Prompt injection detected
- Input validation failed
429 Too Many Requests
- Rate limit exceeded
500 Internal Server Error
- All LLM providers failed
- Unexpected server error
Example Usage
cURL
# Health check
curl https://your-deployment-url/health
# Query LLM
curl -X POST https://your-deployment-url/query \
-H "Content-Type: application/json" \
-H "X-API-Key: YOUR_API_KEY" \
-d '{
"prompt": "What is artificial intelligence?",
"max_tokens": 150,
"temperature": 0.7
}'
Python
import requests
# Health check
response = requests.get('https://your-deployment-url/health')
print(response.json())
# Query LLM
headers = {
'Content-Type': 'application/json',
'X-API-Key': 'YOUR_API_KEY'
}
data = {
'prompt': 'What is artificial intelligence?',
'max_tokens': 150,
'temperature': 0.7
}
response = requests.post('https://your-deployment-url/query', headers=headers, json=data)
print(response.json())
Security Features
Authentication
All requests to /query require a valid API key in the X-API-Key header.
Rate Limiting
Requests are rate limited based on the RATE_LIMIT configuration.
Prompt Injection Detection
Potential prompt injection attempts are detected and blocked automatically.
CORS
Cross-Origin Resource Sharing is configured based on ALLOWED_ORIGINS.
Security Considerations
- Always use HTTPS in production
- Keep your API key secure
- Validate all responses before using them
- Implement proper error handling
- Be aware of rate limits