AudioForge / NEXT_STEPS.md
OnyxlMunkey's picture
c618549

Next Steps: Get Music Generation Working

TL;DR

Run these commands to get music generation working in 30 minutes:

cd agents\music
py -3.11 -m venv venv
.\venv\Scripts\activate
pip install torch==2.1.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cpu
pip install fastapi uvicorn pydantic httpx python-dotenv
pip install transformers librosa soundfile numpy
pip install git+https://github.com/facebookresearch/audiocraft.git
python main.py

Then test:

curl http://localhost:8002/health

Detailed Steps

Step 1: Navigate to Music Agent (1 minute)

cd C:\Users\Keith\AudioForge\agents\music

Step 2: Create Python 3.11 Environment (2 minutes)

# Create virtual environment with Python 3.11
py -3.11 -m venv venv

# Activate it
.\venv\Scripts\activate

# Verify Python version
python --version
# Should show: Python 3.11.9

Step 3: Install PyTorch (5-10 minutes)

# Install PyTorch 2.1.0 CPU version
pip install torch==2.1.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cpu

This downloads ~200MB. Wait for completion.

Step 4: Install Web Framework (1 minute)

pip install fastapi uvicorn[standard] pydantic httpx python-dotenv

Step 5: Install Audio Libraries (2 minutes)

pip install transformers librosa soundfile "numpy<2.0.0"

Step 6: Install AudioCraft (5-10 minutes)

# This clones and installs from GitHub
pip install git+https://github.com/facebookresearch/audiocraft.git

Note: This may show warnings about version conflicts. That's okay - AudioCraft will work.

Step 7: Create Storage Directory (10 seconds)

mkdir -p storage\audio\music

Step 8: Start the Agent (5 seconds)

python main.py

You should see:

INFO:     Started server process [12345]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8002

Step 9: Test the Agent (1 minute)

Open a NEW PowerShell window (keep the agent running):

# Health check
curl http://localhost:8002/health

# Should return:
# {
#   "status": "healthy",
#   "python_version": "3.11.9",
#   "torch_available": true,
#   "audiocraft_available": true,
#   "device": "cpu"
# }

Step 10: Generate Music! (1-2 minutes)

# Generate 10 seconds of music
curl -X POST http://localhost:8002/generate `
  -H "Content-Type: application/json" `
  -d '{"prompt": "Epic orchestral soundtrack", "duration": 10}'

First time: Downloads model (~1.5GB) - takes 5-10 minutes
After that: Generates in 30-60 seconds

Response:

{
  "task_id": "music_abc123",
  "status": "completed",
  "audio_path": "./storage/audio/music/music_abc123.wav",
  "metadata": {
    "duration": 10,
    "sample_rate": 32000,
    "model": "facebook/musicgen-small"
  }
}

Step 11: Listen to Your Music! 🎡

# Open the generated file
start .\storage\audio\music\music_abc123.wav

Troubleshooting

Error: "py -3.11 not found"

Python 3.11 not installed. Install from: https://www.python.org/downloads/release/python-3119/

Error: "torch not found" when running

You forgot to activate the virtual environment:

.\venv\Scripts\activate

Error: "audiocraft not found"

Installation might have failed. Try:

pip install --no-cache-dir git+https://github.com/facebookresearch/audiocraft.git

Error: "CUDA out of memory"

You're on CPU mode, this shouldn't happen. But if it does:

# Set environment variable
$env:MUSICGEN_DEVICE="cpu"
python main.py

Agent starts but health check fails

Check if port 8002 is already in use:

netstat -ano | findstr :8002

If yes, kill the process or change port in main.py.

What's Next?

Option A: Integrate with Main API

Update backend/app/services/orchestrator.py:

import httpx

class Orchestrator:
    def __init__(self):
        self.music_agent_url = "http://localhost:8002"
    
    async def generate_music(self, prompt: str, duration: int):
        async with httpx.AsyncClient() as client:
            response = await client.post(
                f"{self.music_agent_url}/generate",
                json={"prompt": prompt, "duration": duration},
                timeout=300.0
            )
            return response.json()

Option B: Test from Frontend

The frontend already has the generation form. Just make sure:

  1. Backend is running (port 8001)
  2. Music Agent is running (port 8002)
  3. Backend calls agent

Option C: Build More Agents

Repeat this process for:

  • Vocal Agent (port 8003) - Bark for vocals
  • Processing Agent (port 8004) - Demucs for stems

Performance Tips

Speed Up Generation

  1. Use smaller model:

    {"model": "facebook/musicgen-small"}  // Faster
    {"model": "facebook/musicgen-medium"} // Better quality
    {"model": "facebook/musicgen-large"}  // Best quality, slowest
    
  2. Shorter duration:

    {"duration": 10}  // 30 seconds generation
    {"duration": 30}  // 90 seconds generation
    
  3. Use GPU (if available):

    # Install CUDA version of PyTorch
    pip install torch==2.1.0+cu118 torchaudio==2.1.0+cu118 --index-url https://download.pytorch.org/whl/cu118
    

Reduce Memory Usage

  1. Use smaller model (see above)
  2. Generate shorter clips
  3. Close other applications

Production Deployment

Docker (Recommended)

# Build image
docker build -t audioforge-music-agent ./agents/music

# Run container
docker run -p 8002:8002 -v ${PWD}/storage:/app/storage audioforge-music-agent

Docker Compose (Best)

# Start all services
docker-compose up -d

# View logs
docker-compose logs -f music-agent

# Stop services
docker-compose down

Success Criteria

You'll know it's working when:

  1. βœ… Health check returns "status": "healthy"
  2. βœ… Generate request returns "status": "completed"
  3. βœ… Audio file exists in storage/audio/music/
  4. βœ… Audio file plays and sounds like music
  5. βœ… Subsequent generations are faster (model cached)

Timeline

Task Time Cumulative
Setup environment 2 min 2 min
Install PyTorch 10 min 12 min
Install dependencies 5 min 17 min
Install AudioCraft 10 min 27 min
Start agent 1 min 28 min
Test & verify 2 min 30 min
First generation 10 min 40 min
Subsequent generations 1 min -

Total to first music: ~40 minutes (including model download)

Resources

  • Architecture: AGENT_ARCHITECTURE.md
  • Quick Start: QUICK_START_AGENTS.md
  • Solution Overview: SOLUTION_SUMMARY.md
  • Test Results: TEST_RESULTS.md

Questions?

The agent architecture solves:

  • βœ… Python version conflicts
  • βœ… Dependency hell
  • βœ… Scalability issues
  • βœ… Deployment complexity

You're implementing the same pattern used by OpenAI, Hugging Face, and Stability AI!


Ready? Let's forge some audio! 🎡

cd agents\music
py -3.11 -m venv venv
.\venv\Scripts\activate
pip install torch==2.1.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cpu
pip install -r requirements.txt
python main.py