# ml-intern Production System Production-grade deployment of ml-intern with horizontal scaling, distributed rate limiting, circuit breakers, caching, multi-tenancy, and comprehensive observability. ## Architecture ``` Clients (CLI / Web / API) -> Nginx (SSL, Rate Limit) -> FastAPI (xN) -> Redis + Postgres | -> Background Workers | Prometheus + Grafana ``` ## Production Features | Feature | Technology | Benefit | |---------|-----------|---------| | **Distributed Rate Limiting** | Redis Token Bucket | Per-tenant, per-provider RPM limits | | **Circuit Breaker** | Redis-backed | Prevents cascade failures | | **Request Caching** | Redis TTL | Reduces LLM costs and latency | | **Multi-Tenancy** | PostgreSQL RLS | Isolated sessions | | **Cost Tracking** | Per-session budget | Spending limits and alerts | | **Connection Pooling** | AsyncPG + HTTPX | Efficient DB and API connections | | **Health Checks** | /health endpoint | Self-healing and load balancing | | **Graceful Shutdown** | Signal handlers | Drain in-flight requests | | **Metrics** | Prometheus + Grafana | Full observability | | **Distributed Tracing** | Jaeger + Correlation IDs | Debug across services | ## Quick Start ```bash # 1. Configure environment cp .env.example .env # Edit .env with your API keys # 2. Start infrastructure docker-compose up -d redis postgres nginx prometheus grafana # 3. Start application docker-compose up -d api worker # 4. Verify curl http://localhost/health curl http://localhost/v1/models ``` ## Dashboards - Grafana: http://localhost:3000 (admin/admin) - Prometheus: http://localhost:9090 - Jaeger: http://localhost:16686 - pgAdmin: http://localhost:5050 ## API Usage ```bash curl -X POST http://localhost/v1/chat/completions \ -H "Content-Type: application/json" \ -H "X-Correlation-ID: $(uuidgen)" \ -d '{ "model": "groq/llama-3.3-70b-versatile", "messages": [{"role": "user", "content": "Hello"}], "session_id": "my-session-123" }' ``` ## Kubernetes ```bash cd k8s && chmod +x deploy.sh && ./deploy.sh ``` ## Helm ```bash cd helm/ml-intern helm dependency update helm install ml-intern . --namespace ml-intern --create-namespace ``` ## Scaling ```bash # Horizontal docker-compose up -d --scale api=4 kubectl -n ml-intern scale deployment ml-intern-api --replicas=10 # HPA (auto-scale) kubectl apply -f k8s/deployment-api.yml # Includes HPA ```