Upload folder using huggingface_hub

Browse files

Files changed (3) hide show

NVIDIA_NIM_GUIDE.md +233 -0
reverie/common/llm.py +16 -0
reverie/config/config.py +3 -0

NVIDIA_NIM_GUIDE.md ADDED Viewed

	@@ -0,0 +1,233 @@

+# Using NVIDIA NIM with StoryBox
+NVIDIA NIM provides optimized inference for LLMs via an OpenAI-compatible API. This guide shows how to use NIM with StoryBox.
+## What is NVIDIA NIM?
+NVIDIA NIM (NVIDIA Inference Microservices) is a set of easy-to-use microservices for deploying AI models. It exposes an OpenAI-compatible API, so it works seamlessly with StoryBox's existing `ChatOpenAI` integration.
+## Setup Options
+### Option 1: NVIDIA AI Enterprise (Cloud)
+Use NVIDIA-hosted models via the NIM API.
+#### Step 1: Get API Key
+1. Go to https://build.nvidia.com
+2. Sign in with your NVIDIA account
+3. Generate an API key
+#### Step 2: Set Environment Variables
+```bash
+export NIM_API_KEY="nvapi-xxxxxxxxxxxxxxxxxxxxxxxx"
+# Optional: override the default endpoint
+export NIM_BASE_URL="https://integrate.api.nvidia.com/v1"
+```
+#### Step 3: Configure StoryBox
+Edit `reverie/config/config.py`:
+```python
+# Use NVIDIA NIM model
+# Format: nvidia/<model-name>
+# The "nvidia/" prefix tells StoryBox to route to NIM
+llm_model_name = 'nvidia/meta/llama-3.1-8b-instruct'
+# llm_model_name = 'nvidia/meta/llama-3.1-70b-instruct'
+# llm_model_name = 'nvidia/mistralai/mistral-7b-instruct-v0.3'
+# llm_model_name = 'nvidia/nvidia/nemotron-4-340b-instruct'
+# llm_model_name = 'nvidia/google/gemma-2-9b-it'
+# llm_model_name = 'nvidia/microsoft/phi-3-mini-128k-instruct'
+# NIM settings (reads from env vars by default)
+nim_base_url = os.getenv('NIM_BASE_URL', 'https://integrate.api.nvidia.com/v1')
+nim_api_key = os.getenv('NIM_API_KEY', '<YOUR_NIM_API_KEY>')
+```
+#### Step 4: Run
+```bash
+cd /app/storybox/reverie
+python run.py
+```
+---
+### Option 2: Self-Hosted NIM (Local/Docker)
+Run NIM on your own GPU infrastructure.
+#### Step 1: Prerequisites
+- NVIDIA GPU with at least 24GB VRAM (for 8B models)
+- Docker with NVIDIA Container Toolkit
+- NVIDIA driver 535+ and CUDA 12.2+
+#### Step 2: Pull and Run NIM Container
+```bash
+# Login to NVIDIA Container Registry
+docker login nvcr.io
+# Username: $oauthtoken
+# Password: <YOUR_NGC_API_KEY>
+# Run Llama 3.1 8B NIM
+docker run --gpus all --rm \
+  -p 8000:8000 \
+  -e NGC_API_KEY=<YOUR_NGC_API_KEY> \
+  nvcr.io/nim/meta/llama-3.1-8b-instruct:latest
+# Or run Mistral 7B
+docker run --gpus all --rm \
+  -p 8000:8000 \
+  -e NGC_API_KEY=<YOUR_NGC_API_KEY> \
+  nvcr.io/nim/mistralai/mistral-7b-instruct-v0.3:latest
+```
+#### Step 3: Configure StoryBox for Local NIM
+```python
+# In reverie/config/config.py
+llm_model_name = 'nvidia/meta/llama-3.1-8b-instruct'
+# Point to your local NIM instance
+nim_base_url = 'http://localhost:8000/v1'
+nim_api_key = 'not-needed-for-local'  # Local NIM doesn't require auth by default
+```
+#### Step 4: Run
+```bash
+cd /app/storybox/reverie
+python run.py
+```
+---
+### Option 3: NIM on Kubernetes / Cloud
+For production deployments, run NIM on Kubernetes or cloud GPU instances.
+#### Example: AWS EC2 g5.xlarge (A10G GPU)
+```bash
+# SSH into your GPU instance
+ssh -i key.pem ubuntu@<instance-ip>
+# Install Docker and NVIDIA Container Toolkit
+# ... (standard setup)
+# Run NIM
+docker run --gpus all --rm \
+  -p 8000:8000 \
+  -e NGC_API_KEY=$NGC_API_KEY \
+  nvcr.io/nim/meta/llama-3.1-8b-instruct:latest
+# From your local machine, configure StoryBox:
+# nim_base_url = 'http://<instance-ip>:8000/v1'
+```
+---
+## Available NIM Models
+| Model | NIM Name | VRAM (self-hosted) | Context |
+|-------|----------|-------------------|---------|
+| Llama 3.1 8B | `meta/llama-3.1-8b-instruct` | ~24 GB | 128K |
+| Llama 3.1 70B | `meta/llama-3.1-70b-instruct` | ~140 GB | 128K |
+| Mistral 7B | `mistralai/mistral-7b-instruct-v0.3` | ~24 GB | 32K |
+| Mixtral 8x7B | `mistralai/mixtral-8x7b-instruct-v0.1` | ~100 GB | 32K |
+| Nemotron-4 340B | `nvidia/nemotron-4-340b-instruct` | ~700 GB | 4K |
+| Gemma 2 9B | `google/gemma-2-9b-it` | ~24 GB | 8K |
+| Gemma 2 27B | `google/gemma-2-27b-it` | ~80 GB | 8K |
+| Phi-3 Mini | `microsoft/phi-3-mini-128k-instruct` | ~16 GB | 128K |
+| Phi-3 Medium | `microsoft/phi-3-medium-128k-instruct` | ~48 GB | 128K |
+| Qwen2.5 7B | `qwen/qwen2.5-7b-instruct` | ~24 GB | 128K |
+**Note:** For cloud NIM, check https://build.nvidia.com for the latest available models.
+---
+## Configuration Summary
+```python
+# reverie/config/config.py
+# NVIDIA NIM (cloud)
+llm_model_name = 'nvidia/meta/llama-3.1-8b-instruct'
+nim_base_url = 'https://integrate.api.nvidia.com/v1'
+nim_api_key = os.getenv('NIM_API_KEY')
+# NVIDIA NIM (self-hosted local)
+llm_model_name = 'nvidia/meta/llama-3.1-8b-instruct'
+nim_base_url = 'http://localhost:8000/v1'
+nim_api_key = 'not-needed'
+# NVIDIA NIM (self-hosted remote)
+llm_model_name = 'nvidia/meta/llama-3.1-8b-instruct'
+nim_base_url = 'http://your-server-ip:8000/v1'
+nim_api_key = 'not-needed'
+```
+---
+## Environment Variables
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `NIM_API_KEY` | Your NVIDIA API key | `<YOUR_NIM_API_KEY>` |
+| `NIM_BASE_URL` | NIM endpoint URL | `https://integrate.api.nvidia.com/v1` |
+---
+## Troubleshooting
+### "Authentication failed"
+- Check your `NIM_API_KEY` is set correctly
+- For cloud NIM, ensure your key is active at https://build.nvidia.com
+### "Model not found"
+- Verify the model name format: `nvidia/<org>/<model-name>`
+- Check available models at https://build.nvidia.com
+### Connection timeout
+- For self-hosted: ensure the container is running and port is exposed
+- Check firewall rules for port 8000
+### Out of memory (self-hosted)
+- Use a smaller model (e.g., Phi-3 Mini instead of Llama 70B)
+- Enable quantization: add `--env QUANTIZATION=int8` to docker run
+- Use tensor parallelism for large models: `--gpus all` with multiple GPUs
+---
+## Performance Comparison
+| Setup | Tokens/sec | Latency | Cost |
+|-------|-----------|---------|------|
+| OpenAI GPT-4o-mini | ~150 | Low | $0.60/M tokens |
+| NVIDIA NIM Cloud (8B) | ~100 | Low | ~$0.10/M tokens |
+| Self-hosted NIM (A100) | ~80 | Very Low | Hardware cost only |
+| Self-hosted NIM (A10G) | ~40 | Low | Hardware cost only |
+| Ollama (local) | ~30 | Very Low | Free |
+---
+## Quick Reference
+```bash
+# 1. Set API key (for cloud NIM)
+export NIM_API_KEY="nvapi-..."
+# 2. Edit config
+# llm_model_name = 'nvidia/meta/llama-3.1-8b-instruct'
+# 3. Run
+python run.py
+```
+For more details, visit:
+- https://build.nvidia.com (Cloud NIM)
+- https://docs.nvidia.com/nim/ (Self-hosted NIM)

reverie/common/llm.py CHANGED Viewed

@@ -29,6 +29,22 @@ def get_chat_model(
             timeout=Config.timeout
         )
     # Huggingface
     elif model_name in {'mistralai/Mistral-7B-Instruct-v0.3'}:
         llm = HuggingFacePipeline.from_model_id(

             timeout=Config.timeout
         )
+    # NVIDIA NIM (OpenAI-compatible API)
+    elif model_name.startswith('nvidia/'):
+        nim_model = model_name.replace('nvidia/', '')
+        nim_url = getattr(Config, 'nim_base_url', 'https://integrate.api.nvidia.com/v1')
+        nim_key = getattr(Config, 'nim_api_key', os.getenv('NIM_API_KEY'))
+        logger.info(f"Using NVIDIA NIM model: {nim_model} at {nim_url}")
+        chat_model = ChatOpenAI(
+            model=nim_model,
+            temperature=temperature,
+            max_retries=Config.max_retries,
+            timeout=Config.timeout,
+            base_url=nim_url,
+            api_key=nim_key,
+            max_tokens=Config.max_tokens
+        )
     # Huggingface
     elif model_name in {'mistralai/Mistral-7B-Instruct-v0.3'}:
         llm = HuggingFacePipeline.from_model_id(

reverie/config/config.py CHANGED Viewed

@@ -47,6 +47,9 @@ class Config:
     api_key = os.getenv('OPENAI_API_KEY', '<YOUR_API_KEY>')
     ## Ollama base URL (for local models)
     ollama_base_url = 'http://localhost:11434'
     # plan
     ## Random values in choose_reaction

     api_key = os.getenv('OPENAI_API_KEY', '<YOUR_API_KEY>')
     ## Ollama base URL (for local models)
     ollama_base_url = 'http://localhost:11434'
+    ## NVIDIA NIM settings
+    nim_base_url = os.getenv('NIM_BASE_URL', 'https://integrate.api.nvidia.com/v1')
+    nim_api_key = os.getenv('NIM_API_KEY', '<YOUR_NIM_API_KEY>')
     # plan
     ## Random values in choose_reaction