raazkumar
/

ml-intern-local-fork

Model card Files Files and versions

raazkumar commited on 2 days ago

Commit

03126cc

·

verified ·

1 Parent(s): a8c86bb

Upload production/README.md

Files changed (1) hide show

production/README.md +91 -0

production/README.md ADDED Viewed

	@@ -0,0 +1,91 @@

+# ml-intern Production System
+Production-grade deployment of ml-intern with horizontal scaling, distributed rate limiting, circuit breakers, caching, multi-tenancy, and comprehensive observability.
+## Architecture
+```
+Clients (CLI / Web / API) -> Nginx (SSL, Rate Limit) -> FastAPI (xN) -> Redis + Postgres
+                                                              |
+                                                              -> Background Workers
+                                                              |
+                                                        Prometheus + Grafana
+```
+## Production Features
+| Feature | Technology | Benefit |
+|---------|-----------|---------|
+| **Distributed Rate Limiting** | Redis Token Bucket | Per-tenant, per-provider RPM limits |
+| **Circuit Breaker** | Redis-backed | Prevents cascade failures |
+| **Request Caching** | Redis TTL | Reduces LLM costs and latency |
+| **Multi-Tenancy** | PostgreSQL RLS | Isolated sessions |
+| **Cost Tracking** | Per-session budget | Spending limits and alerts |
+| **Connection Pooling** | AsyncPG + HTTPX | Efficient DB and API connections |
+| **Health Checks** | /health endpoint | Self-healing and load balancing |
+| **Graceful Shutdown** | Signal handlers | Drain in-flight requests |
+| **Metrics** | Prometheus + Grafana | Full observability |
+| **Distributed Tracing** | Jaeger + Correlation IDs | Debug across services |
+## Quick Start
+```bash
+# 1. Configure environment
+cp .env.example .env
+# Edit .env with your API keys
+# 2. Start infrastructure
+docker-compose up -d redis postgres nginx prometheus grafana
+# 3. Start application
+docker-compose up -d api worker
+# 4. Verify
+curl http://localhost/health
+curl http://localhost/v1/models
+```
+## Dashboards
+- Grafana: http://localhost:3000 (admin/admin)
+- Prometheus: http://localhost:9090
+- Jaeger: http://localhost:16686
+- pgAdmin: http://localhost:5050
+## API Usage
+```bash
+curl -X POST http://localhost/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "X-Correlation-ID: $(uuidgen)" \
+  -d '{
+    "model": "groq/llama-3.3-70b-versatile",
+    "messages": [{"role": "user", "content": "Hello"}],
+    "session_id": "my-session-123"
+  }'
+```
+## Kubernetes
+```bash
+cd k8s && chmod +x deploy.sh && ./deploy.sh
+```
+## Helm
+```bash
+cd helm/ml-intern
+helm dependency update
+helm install ml-intern . --namespace ml-intern --create-namespace
+```
+## Scaling
+```bash
+# Horizontal
+docker-compose up -d --scale api=4
+kubectl -n ml-intern scale deployment ml-intern-api --replicas=10
+# HPA (auto-scale)
+kubectl apply -f k8s/deployment-api.yml  # Includes HPA
+```