AudioForge / QUICK_START_AGENTS.md
OnyxlMunkey's picture
c618549

Quick Start: Agent Architecture

TL;DR

Problem: Python 3.13 doesn't have wheels for AudioCraft dependencies
Solution: Run ML services as separate agents with Python 3.11

Architecture

Main API (Python 3.13, Port 8001)
    ↓ HTTP calls
Music Agent (Python 3.11, Port 8002) ← Handles MusicGen
Vocal Agent (Python 3.11, Port 8003) ← Handles Bark
Processing Agent (Python 3.11, Port 8004) ← Handles Demucs

Setup Music Agent (5 minutes)

Step 1: Create Python 3.11 Environment

cd agents\music
py -3.11 -m venv venv
venv\Scripts\activate

Step 2: Install Dependencies

# Install PyTorch first (CPU version)
pip install torch==2.1.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cpu

# Install other dependencies
pip install -r requirements.txt

Step 3: Run the Agent

python main.py

Agent runs on http://localhost:8002

Step 4: Test the Agent

# Health check
curl http://localhost:8002/health

# Generate music
curl -X POST http://localhost:8002/generate `
  -H "Content-Type: application/json" `
  -d '{"prompt": "Epic orchestral soundtrack", "duration": 10}'

Update Main API to Use Agent

Option A: Direct HTTP Calls

# backend/app/services/music_generation.py
import httpx

class MusicGenerationService:
    def __init__(self):
        self.agent_url = "http://localhost:8002"
    
    async def generate(self, prompt: str, duration: int):
        async with httpx.AsyncClient() as client:
            response = await client.post(
                f"{self.agent_url}/generate",
                json={"prompt": prompt, "duration": duration},
                timeout=300.0
            )
            return response.json()

Option B: Celery Tasks (Recommended for Production)

# backend/app/tasks/music_tasks.py
from celery import Celery
import httpx

celery_app = Celery('audioforge', broker='redis://localhost:6379/0')

@celery_app.task
async def generate_music_task(generation_id: str, prompt: str, duration: int):
    async with httpx.AsyncClient() as client:
        response = await client.post(
            "http://music-agent:8002/generate",
            json={
                "prompt": prompt,
                "duration": duration,
                "callback_url": f"http://api:8001/callbacks/generation/{generation_id}"
            }
        )
    return response.json()

Docker Compose (Production)

version: '3.8'

services:
  # Main API - Python 3.13
  api:
    build: ./backend
    ports: ["8001:8001"]
    environment:
      - MUSIC_AGENT_URL=http://music-agent:8002
    depends_on:
      - postgres
      - redis
      - music-agent
  
  # Music Agent - Python 3.11
  music-agent:
    build: ./agents/music
    ports: ["8002:8002"]
    volumes:
      - audio_storage:/app/storage
    environment:
      - MUSICGEN_DEVICE=cpu
  
  postgres:
    image: postgres:16-alpine
    
  redis:
    image: redis:7-alpine

volumes:
  audio_storage:

Start everything:

docker-compose up -d

Benefits

βœ… No Python version conflicts - Each service uses the right Python version
βœ… Independent scaling - Scale music generation separately from API
βœ… Fault isolation - If music agent crashes, API stays up
βœ… Easy updates - Update ML models without touching API
βœ… Resource control - Allocate GPU to specific agents
βœ… Development speed - Teams work on different agents independently

Migration Path

Phase 1: Run Agent Alongside (This Week)

  • Keep existing backend code
  • Start music agent on port 8002
  • Route new requests to agent
  • Old requests still use monolithic service

Phase 2: Switch Traffic (Next Week)

  • Update orchestrator to call agent
  • Monitor performance
  • Rollback if issues

Phase 3: Remove Old Code (Week 3)

  • Delete monolithic ML code
  • Keep only orchestrator
  • Full agent architecture

Performance Comparison

Monolithic (Current)

  • Startup: 30-60 seconds (load all models)
  • Memory: 4-8 GB (all models loaded)
  • Scaling: Vertical only (bigger server)

Agent Architecture

  • Startup: 5 seconds (API), 30 seconds (agents)
  • Memory: 1 GB (API), 2-4 GB per agent
  • Scaling: Horizontal (more agent instances)

Cost Analysis

Development

  • Initial: +2 weeks (build agents)
  • Ongoing: -50% (easier maintenance)

Infrastructure

  • Development: Same (run locally)
  • Production: -30% (scale only what's needed)

Monitoring

Each agent exposes metrics:

# GET /metrics
{
  "requests_total": 1234,
  "requests_failed": 12,
  "avg_generation_time": 45.2,
  "model_loaded": true,
  "memory_usage_mb": 2048
}

Aggregate in Grafana dashboard.

Troubleshooting

Agent won't start

# Check Python version
python --version  # Should be 3.11.x

# Check dependencies
pip list | findstr torch

Can't connect to agent

# Check if running
curl http://localhost:8002/health

# Check firewall
netstat -ano | findstr :8002

Generation fails

# Check agent logs
# Look for model loading errors
# Verify storage directory exists

Next Steps

  1. βœ… Read AGENT_ARCHITECTURE.md for full design
  2. ⏳ Set up Music Agent (follow steps above)
  3. ⏳ Test generation end-to-end
  4. ⏳ Update main API orchestrator
  5. ⏳ Deploy to staging
  6. ⏳ Create Vocal and Processing agents

Questions?

This architecture is industry-standard for ML services:

  • OpenAI uses it (separate models as services)
  • Hugging Face Inference API uses it
  • Stable Diffusion deployments use it

You're in good company! πŸŽ‰