raazkumar commited on
Commit
7d850d4
·
verified ·
1 Parent(s): 88346c6

Upload folder using huggingface_hub

Browse files
NVIDIA_NIM_GUIDE.md ADDED
@@ -0,0 +1,233 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Using NVIDIA NIM with StoryBox
2
+
3
+ NVIDIA NIM provides optimized inference for LLMs via an OpenAI-compatible API. This guide shows how to use NIM with StoryBox.
4
+
5
+ ## What is NVIDIA NIM?
6
+
7
+ NVIDIA NIM (NVIDIA Inference Microservices) is a set of easy-to-use microservices for deploying AI models. It exposes an OpenAI-compatible API, so it works seamlessly with StoryBox's existing `ChatOpenAI` integration.
8
+
9
+ ## Setup Options
10
+
11
+ ### Option 1: NVIDIA AI Enterprise (Cloud)
12
+
13
+ Use NVIDIA-hosted models via the NIM API.
14
+
15
+ #### Step 1: Get API Key
16
+
17
+ 1. Go to https://build.nvidia.com
18
+ 2. Sign in with your NVIDIA account
19
+ 3. Generate an API key
20
+
21
+ #### Step 2: Set Environment Variables
22
+
23
+ ```bash
24
+ export NIM_API_KEY="nvapi-xxxxxxxxxxxxxxxxxxxxxxxx"
25
+ # Optional: override the default endpoint
26
+ export NIM_BASE_URL="https://integrate.api.nvidia.com/v1"
27
+ ```
28
+
29
+ #### Step 3: Configure StoryBox
30
+
31
+ Edit `reverie/config/config.py`:
32
+
33
+ ```python
34
+ # Use NVIDIA NIM model
35
+ # Format: nvidia/<model-name>
36
+ # The "nvidia/" prefix tells StoryBox to route to NIM
37
+ llm_model_name = 'nvidia/meta/llama-3.1-8b-instruct'
38
+ # llm_model_name = 'nvidia/meta/llama-3.1-70b-instruct'
39
+ # llm_model_name = 'nvidia/mistralai/mistral-7b-instruct-v0.3'
40
+ # llm_model_name = 'nvidia/nvidia/nemotron-4-340b-instruct'
41
+ # llm_model_name = 'nvidia/google/gemma-2-9b-it'
42
+ # llm_model_name = 'nvidia/microsoft/phi-3-mini-128k-instruct'
43
+
44
+ # NIM settings (reads from env vars by default)
45
+ nim_base_url = os.getenv('NIM_BASE_URL', 'https://integrate.api.nvidia.com/v1')
46
+ nim_api_key = os.getenv('NIM_API_KEY', '<YOUR_NIM_API_KEY>')
47
+ ```
48
+
49
+ #### Step 4: Run
50
+
51
+ ```bash
52
+ cd /app/storybox/reverie
53
+ python run.py
54
+ ```
55
+
56
+ ---
57
+
58
+ ### Option 2: Self-Hosted NIM (Local/Docker)
59
+
60
+ Run NIM on your own GPU infrastructure.
61
+
62
+ #### Step 1: Prerequisites
63
+
64
+ - NVIDIA GPU with at least 24GB VRAM (for 8B models)
65
+ - Docker with NVIDIA Container Toolkit
66
+ - NVIDIA driver 535+ and CUDA 12.2+
67
+
68
+ #### Step 2: Pull and Run NIM Container
69
+
70
+ ```bash
71
+ # Login to NVIDIA Container Registry
72
+ docker login nvcr.io
73
+ # Username: $oauthtoken
74
+ # Password: <YOUR_NGC_API_KEY>
75
+
76
+ # Run Llama 3.1 8B NIM
77
+ docker run --gpus all --rm \
78
+ -p 8000:8000 \
79
+ -e NGC_API_KEY=<YOUR_NGC_API_KEY> \
80
+ nvcr.io/nim/meta/llama-3.1-8b-instruct:latest
81
+
82
+ # Or run Mistral 7B
83
+ docker run --gpus all --rm \
84
+ -p 8000:8000 \
85
+ -e NGC_API_KEY=<YOUR_NGC_API_KEY> \
86
+ nvcr.io/nim/mistralai/mistral-7b-instruct-v0.3:latest
87
+ ```
88
+
89
+ #### Step 3: Configure StoryBox for Local NIM
90
+
91
+ ```python
92
+ # In reverie/config/config.py
93
+ llm_model_name = 'nvidia/meta/llama-3.1-8b-instruct'
94
+
95
+ # Point to your local NIM instance
96
+ nim_base_url = 'http://localhost:8000/v1'
97
+ nim_api_key = 'not-needed-for-local' # Local NIM doesn't require auth by default
98
+ ```
99
+
100
+ #### Step 4: Run
101
+
102
+ ```bash
103
+ cd /app/storybox/reverie
104
+ python run.py
105
+ ```
106
+
107
+ ---
108
+
109
+ ### Option 3: NIM on Kubernetes / Cloud
110
+
111
+ For production deployments, run NIM on Kubernetes or cloud GPU instances.
112
+
113
+ #### Example: AWS EC2 g5.xlarge (A10G GPU)
114
+
115
+ ```bash
116
+ # SSH into your GPU instance
117
+ ssh -i key.pem ubuntu@<instance-ip>
118
+
119
+ # Install Docker and NVIDIA Container Toolkit
120
+ # ... (standard setup)
121
+
122
+ # Run NIM
123
+ docker run --gpus all --rm \
124
+ -p 8000:8000 \
125
+ -e NGC_API_KEY=$NGC_API_KEY \
126
+ nvcr.io/nim/meta/llama-3.1-8b-instruct:latest
127
+
128
+ # From your local machine, configure StoryBox:
129
+ # nim_base_url = 'http://<instance-ip>:8000/v1'
130
+ ```
131
+
132
+ ---
133
+
134
+ ## Available NIM Models
135
+
136
+ | Model | NIM Name | VRAM (self-hosted) | Context |
137
+ |-------|----------|-------------------|---------|
138
+ | Llama 3.1 8B | `meta/llama-3.1-8b-instruct` | ~24 GB | 128K |
139
+ | Llama 3.1 70B | `meta/llama-3.1-70b-instruct` | ~140 GB | 128K |
140
+ | Mistral 7B | `mistralai/mistral-7b-instruct-v0.3` | ~24 GB | 32K |
141
+ | Mixtral 8x7B | `mistralai/mixtral-8x7b-instruct-v0.1` | ~100 GB | 32K |
142
+ | Nemotron-4 340B | `nvidia/nemotron-4-340b-instruct` | ~700 GB | 4K |
143
+ | Gemma 2 9B | `google/gemma-2-9b-it` | ~24 GB | 8K |
144
+ | Gemma 2 27B | `google/gemma-2-27b-it` | ~80 GB | 8K |
145
+ | Phi-3 Mini | `microsoft/phi-3-mini-128k-instruct` | ~16 GB | 128K |
146
+ | Phi-3 Medium | `microsoft/phi-3-medium-128k-instruct` | ~48 GB | 128K |
147
+ | Qwen2.5 7B | `qwen/qwen2.5-7b-instruct` | ~24 GB | 128K |
148
+
149
+ **Note:** For cloud NIM, check https://build.nvidia.com for the latest available models.
150
+
151
+ ---
152
+
153
+ ## Configuration Summary
154
+
155
+ ```python
156
+ # reverie/config/config.py
157
+
158
+ # NVIDIA NIM (cloud)
159
+ llm_model_name = 'nvidia/meta/llama-3.1-8b-instruct'
160
+ nim_base_url = 'https://integrate.api.nvidia.com/v1'
161
+ nim_api_key = os.getenv('NIM_API_KEY')
162
+
163
+ # NVIDIA NIM (self-hosted local)
164
+ llm_model_name = 'nvidia/meta/llama-3.1-8b-instruct'
165
+ nim_base_url = 'http://localhost:8000/v1'
166
+ nim_api_key = 'not-needed'
167
+
168
+ # NVIDIA NIM (self-hosted remote)
169
+ llm_model_name = 'nvidia/meta/llama-3.1-8b-instruct'
170
+ nim_base_url = 'http://your-server-ip:8000/v1'
171
+ nim_api_key = 'not-needed'
172
+ ```
173
+
174
+ ---
175
+
176
+ ## Environment Variables
177
+
178
+ | Variable | Description | Default |
179
+ |----------|-------------|---------|
180
+ | `NIM_API_KEY` | Your NVIDIA API key | `<YOUR_NIM_API_KEY>` |
181
+ | `NIM_BASE_URL` | NIM endpoint URL | `https://integrate.api.nvidia.com/v1` |
182
+
183
+ ---
184
+
185
+ ## Troubleshooting
186
+
187
+ ### "Authentication failed"
188
+ - Check your `NIM_API_KEY` is set correctly
189
+ - For cloud NIM, ensure your key is active at https://build.nvidia.com
190
+
191
+ ### "Model not found"
192
+ - Verify the model name format: `nvidia/<org>/<model-name>`
193
+ - Check available models at https://build.nvidia.com
194
+
195
+ ### Connection timeout
196
+ - For self-hosted: ensure the container is running and port is exposed
197
+ - Check firewall rules for port 8000
198
+
199
+ ### Out of memory (self-hosted)
200
+ - Use a smaller model (e.g., Phi-3 Mini instead of Llama 70B)
201
+ - Enable quantization: add `--env QUANTIZATION=int8` to docker run
202
+ - Use tensor parallelism for large models: `--gpus all` with multiple GPUs
203
+
204
+ ---
205
+
206
+ ## Performance Comparison
207
+
208
+ | Setup | Tokens/sec | Latency | Cost |
209
+ |-------|-----------|---------|------|
210
+ | OpenAI GPT-4o-mini | ~150 | Low | $0.60/M tokens |
211
+ | NVIDIA NIM Cloud (8B) | ~100 | Low | ~$0.10/M tokens |
212
+ | Self-hosted NIM (A100) | ~80 | Very Low | Hardware cost only |
213
+ | Self-hosted NIM (A10G) | ~40 | Low | Hardware cost only |
214
+ | Ollama (local) | ~30 | Very Low | Free |
215
+
216
+ ---
217
+
218
+ ## Quick Reference
219
+
220
+ ```bash
221
+ # 1. Set API key (for cloud NIM)
222
+ export NIM_API_KEY="nvapi-..."
223
+
224
+ # 2. Edit config
225
+ # llm_model_name = 'nvidia/meta/llama-3.1-8b-instruct'
226
+
227
+ # 3. Run
228
+ python run.py
229
+ ```
230
+
231
+ For more details, visit:
232
+ - https://build.nvidia.com (Cloud NIM)
233
+ - https://docs.nvidia.com/nim/ (Self-hosted NIM)
reverie/common/llm.py CHANGED
@@ -29,6 +29,22 @@ def get_chat_model(
29
  timeout=Config.timeout
30
  )
31
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
  # Huggingface
33
  elif model_name in {'mistralai/Mistral-7B-Instruct-v0.3'}:
34
  llm = HuggingFacePipeline.from_model_id(
 
29
  timeout=Config.timeout
30
  )
31
 
32
+ # NVIDIA NIM (OpenAI-compatible API)
33
+ elif model_name.startswith('nvidia/'):
34
+ nim_model = model_name.replace('nvidia/', '')
35
+ nim_url = getattr(Config, 'nim_base_url', 'https://integrate.api.nvidia.com/v1')
36
+ nim_key = getattr(Config, 'nim_api_key', os.getenv('NIM_API_KEY'))
37
+ logger.info(f"Using NVIDIA NIM model: {nim_model} at {nim_url}")
38
+ chat_model = ChatOpenAI(
39
+ model=nim_model,
40
+ temperature=temperature,
41
+ max_retries=Config.max_retries,
42
+ timeout=Config.timeout,
43
+ base_url=nim_url,
44
+ api_key=nim_key,
45
+ max_tokens=Config.max_tokens
46
+ )
47
+
48
  # Huggingface
49
  elif model_name in {'mistralai/Mistral-7B-Instruct-v0.3'}:
50
  llm = HuggingFacePipeline.from_model_id(
reverie/config/config.py CHANGED
@@ -47,6 +47,9 @@ class Config:
47
  api_key = os.getenv('OPENAI_API_KEY', '<YOUR_API_KEY>')
48
  ## Ollama base URL (for local models)
49
  ollama_base_url = 'http://localhost:11434'
 
 
 
50
 
51
  # plan
52
  ## Random values in choose_reaction
 
47
  api_key = os.getenv('OPENAI_API_KEY', '<YOUR_API_KEY>')
48
  ## Ollama base URL (for local models)
49
  ollama_base_url = 'http://localhost:11434'
50
+ ## NVIDIA NIM settings
51
+ nim_base_url = os.getenv('NIM_BASE_URL', 'https://integrate.api.nvidia.com/v1')
52
+ nim_api_key = os.getenv('NIM_API_KEY', '<YOUR_NIM_API_KEY>')
53
 
54
  # plan
55
  ## Random values in choose_reaction