# ml-intern Production System

Production-grade deployment of ml-intern with horizontal scaling, distributed rate limiting, circuit breakers, caching, multi-tenancy, and comprehensive observability.

## Architecture

```
Clients (CLI / Web / API) -> Nginx (SSL, Rate Limit) -> FastAPI (xN) -> Redis + Postgres
                                                              |
                                                              -> Background Workers
                                                              |
                                                        Prometheus + Grafana
```

## Production Features

| Feature | Technology | Benefit |
|---------|-----------|---------|
| **Distributed Rate Limiting** | Redis Token Bucket | Per-tenant, per-provider RPM limits |
| **Circuit Breaker** | Redis-backed | Prevents cascade failures |
| **Request Caching** | Redis TTL | Reduces LLM costs and latency |
| **Multi-Tenancy** | PostgreSQL RLS | Isolated sessions |
| **Cost Tracking** | Per-session budget | Spending limits and alerts |
| **Connection Pooling** | AsyncPG + HTTPX | Efficient DB and API connections |
| **Health Checks** | /health endpoint | Self-healing and load balancing |
| **Graceful Shutdown** | Signal handlers | Drain in-flight requests |
| **Metrics** | Prometheus + Grafana | Full observability |
| **Distributed Tracing** | Jaeger + Correlation IDs | Debug across services |

## Quick Start

```bash
# 1. Configure environment
cp .env.example .env
# Edit .env with your API keys

# 2. Start infrastructure
docker-compose up -d redis postgres nginx prometheus grafana

# 3. Start application
docker-compose up -d api worker

# 4. Verify
curl http://localhost/health
curl http://localhost/v1/models
```

## Dashboards

- Grafana: http://localhost:3000 (admin/admin)
- Prometheus: http://localhost:9090
- Jaeger: http://localhost:16686
- pgAdmin: http://localhost:5050

## API Usage

```bash
curl -X POST http://localhost/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-Correlation-ID: $(uuidgen)" \
  -d '{
    "model": "groq/llama-3.3-70b-versatile",
    "messages": [{"role": "user", "content": "Hello"}],
    "session_id": "my-session-123"
  }'
```

## Kubernetes

```bash
cd k8s && chmod +x deploy.sh && ./deploy.sh
```

## Helm

```bash
cd helm/ml-intern
helm dependency update
helm install ml-intern . --namespace ml-intern --create-namespace
```

## Scaling

```bash
# Horizontal
docker-compose up -d --scale api=4
kubectl -n ml-intern scale deployment ml-intern-api --replicas=10

# HPA (auto-scale)
kubectl apply -f k8s/deployment-api.yml  # Includes HPA
```