raazkumar
/

ml-intern-local-fork

Model card Files Files and versions

ml-intern-local-fork / production /README.md

raazkumar's picture

Upload production/README.md

03126cc verified 2 days ago

|

history blame contribute delete

2.63 kB

	# ml-intern Production System

	Production-grade deployment of ml-intern with horizontal scaling, distributed rate limiting, circuit breakers, caching, multi-tenancy, and comprehensive observability.

	## Architecture

	```
	Clients (CLI / Web / API) -> Nginx (SSL, Rate Limit) -> FastAPI (xN) -> Redis + Postgres
	\|
	-> Background Workers
	\|
	Prometheus + Grafana
	```

	## Production Features

	\| Feature \| Technology \| Benefit \|
	\|---------\|-----------\|---------\|
	\| Distributed Rate Limiting \| Redis Token Bucket \| Per-tenant, per-provider RPM limits \|
	\| Circuit Breaker \| Redis-backed \| Prevents cascade failures \|
	\| Request Caching \| Redis TTL \| Reduces LLM costs and latency \|
	\| Multi-Tenancy \| PostgreSQL RLS \| Isolated sessions \|
	\| Cost Tracking \| Per-session budget \| Spending limits and alerts \|
	\| Connection Pooling \| AsyncPG + HTTPX \| Efficient DB and API connections \|
	\| Health Checks \| /health endpoint \| Self-healing and load balancing \|
	\| Graceful Shutdown \| Signal handlers \| Drain in-flight requests \|
	\| Metrics \| Prometheus + Grafana \| Full observability \|
	\| Distributed Tracing \| Jaeger + Correlation IDs \| Debug across services \|

	## Quick Start

	```bash
	# 1. Configure environment
	cp .env.example .env
	# Edit .env with your API keys

	# 2. Start infrastructure
	docker-compose up -d redis postgres nginx prometheus grafana

	# 3. Start application
	docker-compose up -d api worker

	# 4. Verify
	curl http://localhost/health
	curl http://localhost/v1/models
	```

	## Dashboards

	- Grafana: http://localhost:3000 (admin/admin)
	- Prometheus: http://localhost:9090
	- Jaeger: http://localhost:16686
	- pgAdmin: http://localhost:5050

	## API Usage

	```bash
	curl -X POST http://localhost/v1/chat/completions \
	-H "Content-Type: application/json" \
	-H "X-Correlation-ID: $(uuidgen)" \
	-d '{
	"model": "groq/llama-3.3-70b-versatile",
	"messages": [{"role": "user", "content": "Hello"}],
	"session_id": "my-session-123"
	}'
	```

	## Kubernetes

	```bash
	cd k8s && chmod +x deploy.sh && ./deploy.sh
	```

	## Helm

	```bash
	cd helm/ml-intern
	helm dependency update
	helm install ml-intern . --namespace ml-intern --create-namespace
	```

	## Scaling

	```bash
	# Horizontal
	docker-compose up -d --scale api=4
	kubectl -n ml-intern scale deployment ml-intern-api --replicas=10

	# HPA (auto-scale)
	kubectl apply -f k8s/deployment-api.yml # Includes HPA
	```