raazkumar commited on
Commit
03126cc
·
verified ·
1 Parent(s): a8c86bb

Upload production/README.md

Browse files
Files changed (1) hide show
  1. production/README.md +91 -0
production/README.md ADDED
@@ -0,0 +1,91 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # ml-intern Production System
2
+
3
+ Production-grade deployment of ml-intern with horizontal scaling, distributed rate limiting, circuit breakers, caching, multi-tenancy, and comprehensive observability.
4
+
5
+ ## Architecture
6
+
7
+ ```
8
+ Clients (CLI / Web / API) -> Nginx (SSL, Rate Limit) -> FastAPI (xN) -> Redis + Postgres
9
+ |
10
+ -> Background Workers
11
+ |
12
+ Prometheus + Grafana
13
+ ```
14
+
15
+ ## Production Features
16
+
17
+ | Feature | Technology | Benefit |
18
+ |---------|-----------|---------|
19
+ | **Distributed Rate Limiting** | Redis Token Bucket | Per-tenant, per-provider RPM limits |
20
+ | **Circuit Breaker** | Redis-backed | Prevents cascade failures |
21
+ | **Request Caching** | Redis TTL | Reduces LLM costs and latency |
22
+ | **Multi-Tenancy** | PostgreSQL RLS | Isolated sessions |
23
+ | **Cost Tracking** | Per-session budget | Spending limits and alerts |
24
+ | **Connection Pooling** | AsyncPG + HTTPX | Efficient DB and API connections |
25
+ | **Health Checks** | /health endpoint | Self-healing and load balancing |
26
+ | **Graceful Shutdown** | Signal handlers | Drain in-flight requests |
27
+ | **Metrics** | Prometheus + Grafana | Full observability |
28
+ | **Distributed Tracing** | Jaeger + Correlation IDs | Debug across services |
29
+
30
+ ## Quick Start
31
+
32
+ ```bash
33
+ # 1. Configure environment
34
+ cp .env.example .env
35
+ # Edit .env with your API keys
36
+
37
+ # 2. Start infrastructure
38
+ docker-compose up -d redis postgres nginx prometheus grafana
39
+
40
+ # 3. Start application
41
+ docker-compose up -d api worker
42
+
43
+ # 4. Verify
44
+ curl http://localhost/health
45
+ curl http://localhost/v1/models
46
+ ```
47
+
48
+ ## Dashboards
49
+
50
+ - Grafana: http://localhost:3000 (admin/admin)
51
+ - Prometheus: http://localhost:9090
52
+ - Jaeger: http://localhost:16686
53
+ - pgAdmin: http://localhost:5050
54
+
55
+ ## API Usage
56
+
57
+ ```bash
58
+ curl -X POST http://localhost/v1/chat/completions \
59
+ -H "Content-Type: application/json" \
60
+ -H "X-Correlation-ID: $(uuidgen)" \
61
+ -d '{
62
+ "model": "groq/llama-3.3-70b-versatile",
63
+ "messages": [{"role": "user", "content": "Hello"}],
64
+ "session_id": "my-session-123"
65
+ }'
66
+ ```
67
+
68
+ ## Kubernetes
69
+
70
+ ```bash
71
+ cd k8s && chmod +x deploy.sh && ./deploy.sh
72
+ ```
73
+
74
+ ## Helm
75
+
76
+ ```bash
77
+ cd helm/ml-intern
78
+ helm dependency update
79
+ helm install ml-intern . --namespace ml-intern --create-namespace
80
+ ```
81
+
82
+ ## Scaling
83
+
84
+ ```bash
85
+ # Horizontal
86
+ docker-compose up -d --scale api=4
87
+ kubectl -n ml-intern scale deployment ml-intern-api --replicas=10
88
+
89
+ # HPA (auto-scale)
90
+ kubectl apply -f k8s/deployment-api.yml # Includes HPA
91
+ ```