Add SGLang deployment section
#1
by JustinTong - opened
README.md
CHANGED
|
@@ -41,7 +41,7 @@ scratch to handle variable image sizes and aspect ratios.
|
|
| 41 |
Find more information on our [blog](https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5).
|
| 42 |
|
| 43 |
> [!Note]
|
| 44 |
-
> To speed up local inference using vLLM, check out our released [EAGLE model](https://huggingface.co/mistralai/Mistral-Medium-3.5-128B-EAGLE).
|
| 45 |
|
| 46 |
## Key Features
|
| 47 |
|
|
@@ -164,7 +164,7 @@ The model can be deployed with:
|
|
| 164 |
- [`llama.cpp`](https://github.com/ggml-org/llama.cpp): See here for [Unsloth's GGUF](https://huggingface.co/unsloth/Mistral-Medium-3.5-128B-GGUF) - text only for now.
|
| 165 |
- [`LM studio`](https://lmstudio.ai/): WIP stay tuned !
|
| 166 |
- [`Ollama`](https://ollama.com//): See [here](https://ollama.com/library/mistral-medium-3.5).
|
| 167 |
-
- [`SGLang`](https://github.com/sgl-project/sglang): See [here](
|
| 168 |
- [`transformers`](https://github.com/huggingface/transformers): See [here](#transformers).
|
| 169 |
|
| 170 |
For optimal performance, we recommend using the Mistral AI API if local serving is subpar.
|
|
@@ -486,6 +486,33 @@ print(response.choices[0].message.content)
|
|
| 486 |
|
| 487 |
</details>
|
| 488 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 489 |
## Transformers
|
| 490 |
|
| 491 |
### Installation
|
|
|
|
| 41 |
Find more information on our [blog](https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5).
|
| 42 |
|
| 43 |
> [!Note]
|
| 44 |
+
> To speed up local inference using vLLM or SGLang, check out our released [EAGLE model](https://huggingface.co/mistralai/Mistral-Medium-3.5-128B-EAGLE).
|
| 45 |
|
| 46 |
## Key Features
|
| 47 |
|
|
|
|
| 164 |
- [`llama.cpp`](https://github.com/ggml-org/llama.cpp): See here for [Unsloth's GGUF](https://huggingface.co/unsloth/Mistral-Medium-3.5-128B-GGUF) - text only for now.
|
| 165 |
- [`LM studio`](https://lmstudio.ai/): WIP stay tuned !
|
| 166 |
- [`Ollama`](https://ollama.com//): See [here](https://ollama.com/library/mistral-medium-3.5).
|
| 167 |
+
- [`SGLang`](https://github.com/sgl-project/sglang): See [here](#sglang).
|
| 168 |
- [`transformers`](https://github.com/huggingface/transformers): See [here](#transformers).
|
| 169 |
|
| 170 |
For optimal performance, we recommend using the Mistral AI API if local serving is subpar.
|
|
|
|
| 486 |
|
| 487 |
</details>
|
| 488 |
|
| 489 |
+
## SGLang
|
| 490 |
+
|
| 491 |
+
Serve Mistral Medium 3.5 with the [SGLang library](https://github.com/sgl-project/sglang) for production-ready inference.
|
| 492 |
+
|
| 493 |
+
> [!Note]
|
| 494 |
+
> To speed up local inference using SGLang, check out our released [EAGLE model](https://huggingface.co/mistralai/Mistral-Medium-3.5-128B-EAGLE).
|
| 495 |
+
|
| 496 |
+
### Installation
|
| 497 |
+
|
| 498 |
+
Day-zero support ships in dedicated docker tags:
|
| 499 |
+
|
| 500 |
+
```
|
| 501 |
+
docker pull lmsysorg/sglang:dev-mistral-medium-3.5 # H100 / H200 (Hopper, CUDA 12.9)
|
| 502 |
+
docker pull lmsysorg/sglang:dev-cu13-mistral-medium-3.5 # B200 / B300 (Blackwell, CUDA 13.0)
|
| 503 |
+
```
|
| 504 |
+
|
| 505 |
+
Or follow the [SGLang installation guide](https://docs.sglang.io/get-started/install). Requires `transformers >= 5.4.0`.
|
| 506 |
+
|
| 507 |
+
### Serve the Model
|
| 508 |
+
|
| 509 |
+
```bash
|
| 510 |
+
python -m sglang.launch_server --model-path mistralai/Mistral-Medium-3.5-128B \
|
| 511 |
+
--tp 8 --tool-call-parser mistral --reasoning-parser mistral
|
| 512 |
+
```
|
| 513 |
+
|
| 514 |
+
For the full deployment guide, benchmarks, and per-request examples (reasoning effort, tool calls, vision, streaming), see the [SGLang cookbook entry for Mistral Medium 3.5](https://docs.sglang.io/cookbook/autoregressive/Mistral/Mistral-Medium-3.5).
|
| 515 |
+
|
| 516 |
## Transformers
|
| 517 |
|
| 518 |
### Installation
|