mistralai
/

Mistral-Medium-3.5-128B

Model card Files Files and versions

Add SGLang deployment section

#1

by JustinTong - opened 25 days ago

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

Files changed (1) hide show

README.md +29 -2

README.md CHANGED Viewed

@@ -41,7 +41,7 @@ scratch to handle variable image sizes and aspect ratios.
 Find more information on our [blog](https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5).
 > [!Note]
-> To speed up local inference using vLLM, check out our released [EAGLE model](https://huggingface.co/mistralai/Mistral-Medium-3.5-128B-EAGLE).
 ## Key Features
@@ -164,7 +164,7 @@ The model can be deployed with:
 - [`llama.cpp`](https://github.com/ggml-org/llama.cpp): See here for [Unsloth's GGUF](https://huggingface.co/unsloth/Mistral-Medium-3.5-128B-GGUF) - text only for now.
 - [`LM studio`](https://lmstudio.ai/): WIP stay tuned !
 - [`Ollama`](https://ollama.com//): See [here](https://ollama.com/library/mistral-medium-3.5).
-- [`SGLang`](https://github.com/sgl-project/sglang): See [here](https://docs.sglang.io/basic_usage/send_request.html).
 - [`transformers`](https://github.com/huggingface/transformers): See [here](#transformers).
 For optimal performance, we recommend using the Mistral AI API if local serving is subpar.
@@ -486,6 +486,33 @@ print(response.choices[0].message.content)
 </details>
 ## Transformers
 ### Installation

 Find more information on our [blog](https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5).
 > [!Note]
+> To speed up local inference using vLLM or SGLang, check out our released [EAGLE model](https://huggingface.co/mistralai/Mistral-Medium-3.5-128B-EAGLE).
 ## Key Features
 - [`llama.cpp`](https://github.com/ggml-org/llama.cpp): See here for [Unsloth's GGUF](https://huggingface.co/unsloth/Mistral-Medium-3.5-128B-GGUF) - text only for now.
 - [`LM studio`](https://lmstudio.ai/): WIP stay tuned !
 - [`Ollama`](https://ollama.com//): See [here](https://ollama.com/library/mistral-medium-3.5).
+- [`SGLang`](https://github.com/sgl-project/sglang): See [here](#sglang).
 - [`transformers`](https://github.com/huggingface/transformers): See [here](#transformers).
 For optimal performance, we recommend using the Mistral AI API if local serving is subpar.
 </details>
+## SGLang
+Serve Mistral Medium 3.5 with the [SGLang library](https://github.com/sgl-project/sglang) for production-ready inference.
+> [!Note]
+> To speed up local inference using SGLang, check out our released [EAGLE model](https://huggingface.co/mistralai/Mistral-Medium-3.5-128B-EAGLE).
+### Installation
+Day-zero support ships in dedicated docker tags:
+```
+docker pull lmsysorg/sglang:dev-mistral-medium-3.5         # H100 / H200 (Hopper, CUDA 12.9)
+docker pull lmsysorg/sglang:dev-cu13-mistral-medium-3.5    # B200 / B300 (Blackwell, CUDA 13.0)
+```
+Or follow the [SGLang installation guide](https://docs.sglang.io/get-started/install). Requires `transformers >= 5.4.0`.
+### Serve the Model
+```bash
+python -m sglang.launch_server --model-path mistralai/Mistral-Medium-3.5-128B \
+  --tp 8 --tool-call-parser mistral --reasoning-parser mistral
+```
+For the full deployment guide, benchmarks, and per-request examples (reasoning effort, tool calls, vision, streaming), see the [SGLang cookbook entry for Mistral Medium 3.5](https://docs.sglang.io/cookbook/autoregressive/Mistral/Mistral-Medium-3.5).
 ## Transformers
 ### Installation