Upload production/README.md
Browse files- production/README.md +91 -0
production/README.md
ADDED
|
@@ -0,0 +1,91 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# ml-intern Production System
|
| 2 |
+
|
| 3 |
+
Production-grade deployment of ml-intern with horizontal scaling, distributed rate limiting, circuit breakers, caching, multi-tenancy, and comprehensive observability.
|
| 4 |
+
|
| 5 |
+
## Architecture
|
| 6 |
+
|
| 7 |
+
```
|
| 8 |
+
Clients (CLI / Web / API) -> Nginx (SSL, Rate Limit) -> FastAPI (xN) -> Redis + Postgres
|
| 9 |
+
|
|
| 10 |
+
-> Background Workers
|
| 11 |
+
|
|
| 12 |
+
Prometheus + Grafana
|
| 13 |
+
```
|
| 14 |
+
|
| 15 |
+
## Production Features
|
| 16 |
+
|
| 17 |
+
| Feature | Technology | Benefit |
|
| 18 |
+
|---------|-----------|---------|
|
| 19 |
+
| **Distributed Rate Limiting** | Redis Token Bucket | Per-tenant, per-provider RPM limits |
|
| 20 |
+
| **Circuit Breaker** | Redis-backed | Prevents cascade failures |
|
| 21 |
+
| **Request Caching** | Redis TTL | Reduces LLM costs and latency |
|
| 22 |
+
| **Multi-Tenancy** | PostgreSQL RLS | Isolated sessions |
|
| 23 |
+
| **Cost Tracking** | Per-session budget | Spending limits and alerts |
|
| 24 |
+
| **Connection Pooling** | AsyncPG + HTTPX | Efficient DB and API connections |
|
| 25 |
+
| **Health Checks** | /health endpoint | Self-healing and load balancing |
|
| 26 |
+
| **Graceful Shutdown** | Signal handlers | Drain in-flight requests |
|
| 27 |
+
| **Metrics** | Prometheus + Grafana | Full observability |
|
| 28 |
+
| **Distributed Tracing** | Jaeger + Correlation IDs | Debug across services |
|
| 29 |
+
|
| 30 |
+
## Quick Start
|
| 31 |
+
|
| 32 |
+
```bash
|
| 33 |
+
# 1. Configure environment
|
| 34 |
+
cp .env.example .env
|
| 35 |
+
# Edit .env with your API keys
|
| 36 |
+
|
| 37 |
+
# 2. Start infrastructure
|
| 38 |
+
docker-compose up -d redis postgres nginx prometheus grafana
|
| 39 |
+
|
| 40 |
+
# 3. Start application
|
| 41 |
+
docker-compose up -d api worker
|
| 42 |
+
|
| 43 |
+
# 4. Verify
|
| 44 |
+
curl http://localhost/health
|
| 45 |
+
curl http://localhost/v1/models
|
| 46 |
+
```
|
| 47 |
+
|
| 48 |
+
## Dashboards
|
| 49 |
+
|
| 50 |
+
- Grafana: http://localhost:3000 (admin/admin)
|
| 51 |
+
- Prometheus: http://localhost:9090
|
| 52 |
+
- Jaeger: http://localhost:16686
|
| 53 |
+
- pgAdmin: http://localhost:5050
|
| 54 |
+
|
| 55 |
+
## API Usage
|
| 56 |
+
|
| 57 |
+
```bash
|
| 58 |
+
curl -X POST http://localhost/v1/chat/completions \
|
| 59 |
+
-H "Content-Type: application/json" \
|
| 60 |
+
-H "X-Correlation-ID: $(uuidgen)" \
|
| 61 |
+
-d '{
|
| 62 |
+
"model": "groq/llama-3.3-70b-versatile",
|
| 63 |
+
"messages": [{"role": "user", "content": "Hello"}],
|
| 64 |
+
"session_id": "my-session-123"
|
| 65 |
+
}'
|
| 66 |
+
```
|
| 67 |
+
|
| 68 |
+
## Kubernetes
|
| 69 |
+
|
| 70 |
+
```bash
|
| 71 |
+
cd k8s && chmod +x deploy.sh && ./deploy.sh
|
| 72 |
+
```
|
| 73 |
+
|
| 74 |
+
## Helm
|
| 75 |
+
|
| 76 |
+
```bash
|
| 77 |
+
cd helm/ml-intern
|
| 78 |
+
helm dependency update
|
| 79 |
+
helm install ml-intern . --namespace ml-intern --create-namespace
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
## Scaling
|
| 83 |
+
|
| 84 |
+
```bash
|
| 85 |
+
# Horizontal
|
| 86 |
+
docker-compose up -d --scale api=4
|
| 87 |
+
kubectl -n ml-intern scale deployment ml-intern-api --replicas=10
|
| 88 |
+
|
| 89 |
+
# HPA (auto-scale)
|
| 90 |
+
kubectl apply -f k8s/deployment-api.yml # Includes HPA
|
| 91 |
+
```
|