File size: 2,634 Bytes
03126cc | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 | # ml-intern Production System
Production-grade deployment of ml-intern with horizontal scaling, distributed rate limiting, circuit breakers, caching, multi-tenancy, and comprehensive observability.
## Architecture
```
Clients (CLI / Web / API) -> Nginx (SSL, Rate Limit) -> FastAPI (xN) -> Redis + Postgres
|
-> Background Workers
|
Prometheus + Grafana
```
## Production Features
| Feature | Technology | Benefit |
|---------|-----------|---------|
| **Distributed Rate Limiting** | Redis Token Bucket | Per-tenant, per-provider RPM limits |
| **Circuit Breaker** | Redis-backed | Prevents cascade failures |
| **Request Caching** | Redis TTL | Reduces LLM costs and latency |
| **Multi-Tenancy** | PostgreSQL RLS | Isolated sessions |
| **Cost Tracking** | Per-session budget | Spending limits and alerts |
| **Connection Pooling** | AsyncPG + HTTPX | Efficient DB and API connections |
| **Health Checks** | /health endpoint | Self-healing and load balancing |
| **Graceful Shutdown** | Signal handlers | Drain in-flight requests |
| **Metrics** | Prometheus + Grafana | Full observability |
| **Distributed Tracing** | Jaeger + Correlation IDs | Debug across services |
## Quick Start
```bash
# 1. Configure environment
cp .env.example .env
# Edit .env with your API keys
# 2. Start infrastructure
docker-compose up -d redis postgres nginx prometheus grafana
# 3. Start application
docker-compose up -d api worker
# 4. Verify
curl http://localhost/health
curl http://localhost/v1/models
```
## Dashboards
- Grafana: http://localhost:3000 (admin/admin)
- Prometheus: http://localhost:9090
- Jaeger: http://localhost:16686
- pgAdmin: http://localhost:5050
## API Usage
```bash
curl -X POST http://localhost/v1/chat/completions \
-H "Content-Type: application/json" \
-H "X-Correlation-ID: $(uuidgen)" \
-d '{
"model": "groq/llama-3.3-70b-versatile",
"messages": [{"role": "user", "content": "Hello"}],
"session_id": "my-session-123"
}'
```
## Kubernetes
```bash
cd k8s && chmod +x deploy.sh && ./deploy.sh
```
## Helm
```bash
cd helm/ml-intern
helm dependency update
helm install ml-intern . --namespace ml-intern --create-namespace
```
## Scaling
```bash
# Horizontal
docker-compose up -d --scale api=4
kubectl -n ml-intern scale deployment ml-intern-api --replicas=10
# HPA (auto-scale)
kubectl apply -f k8s/deployment-api.yml # Includes HPA
```
|