Add SGLang deployment section

#1
by JustinTong - opened
Files changed (1) hide show
  1. README.md +29 -2
README.md CHANGED
@@ -41,7 +41,7 @@ scratch to handle variable image sizes and aspect ratios.
41
  Find more information on our [blog](https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5).
42
 
43
  > [!Note]
44
- > To speed up local inference using vLLM, check out our released [EAGLE model](https://huggingface.co/mistralai/Mistral-Medium-3.5-128B-EAGLE).
45
 
46
  ## Key Features
47
 
@@ -164,7 +164,7 @@ The model can be deployed with:
164
  - [`llama.cpp`](https://github.com/ggml-org/llama.cpp): See here for [Unsloth's GGUF](https://huggingface.co/unsloth/Mistral-Medium-3.5-128B-GGUF) - text only for now.
165
  - [`LM studio`](https://lmstudio.ai/): WIP stay tuned !
166
  - [`Ollama`](https://ollama.com//): See [here](https://ollama.com/library/mistral-medium-3.5).
167
- - [`SGLang`](https://github.com/sgl-project/sglang): See [here](https://docs.sglang.io/basic_usage/send_request.html).
168
  - [`transformers`](https://github.com/huggingface/transformers): See [here](#transformers).
169
 
170
  For optimal performance, we recommend using the Mistral AI API if local serving is subpar.
@@ -486,6 +486,33 @@ print(response.choices[0].message.content)
486
 
487
  </details>
488
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
489
  ## Transformers
490
 
491
  ### Installation
 
41
  Find more information on our [blog](https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5).
42
 
43
  > [!Note]
44
+ > To speed up local inference using vLLM or SGLang, check out our released [EAGLE model](https://huggingface.co/mistralai/Mistral-Medium-3.5-128B-EAGLE).
45
 
46
  ## Key Features
47
 
 
164
  - [`llama.cpp`](https://github.com/ggml-org/llama.cpp): See here for [Unsloth's GGUF](https://huggingface.co/unsloth/Mistral-Medium-3.5-128B-GGUF) - text only for now.
165
  - [`LM studio`](https://lmstudio.ai/): WIP stay tuned !
166
  - [`Ollama`](https://ollama.com//): See [here](https://ollama.com/library/mistral-medium-3.5).
167
+ - [`SGLang`](https://github.com/sgl-project/sglang): See [here](#sglang).
168
  - [`transformers`](https://github.com/huggingface/transformers): See [here](#transformers).
169
 
170
  For optimal performance, we recommend using the Mistral AI API if local serving is subpar.
 
486
 
487
  </details>
488
 
489
+ ## SGLang
490
+
491
+ Serve Mistral Medium 3.5 with the [SGLang library](https://github.com/sgl-project/sglang) for production-ready inference.
492
+
493
+ > [!Note]
494
+ > To speed up local inference using SGLang, check out our released [EAGLE model](https://huggingface.co/mistralai/Mistral-Medium-3.5-128B-EAGLE).
495
+
496
+ ### Installation
497
+
498
+ Day-zero support ships in dedicated docker tags:
499
+
500
+ ```
501
+ docker pull lmsysorg/sglang:dev-mistral-medium-3.5 # H100 / H200 (Hopper, CUDA 12.9)
502
+ docker pull lmsysorg/sglang:dev-cu13-mistral-medium-3.5 # B200 / B300 (Blackwell, CUDA 13.0)
503
+ ```
504
+
505
+ Or follow the [SGLang installation guide](https://docs.sglang.io/get-started/install). Requires `transformers >= 5.4.0`.
506
+
507
+ ### Serve the Model
508
+
509
+ ```bash
510
+ python -m sglang.launch_server --model-path mistralai/Mistral-Medium-3.5-128B \
511
+ --tp 8 --tool-call-parser mistral --reasoning-parser mistral
512
+ ```
513
+
514
+ For the full deployment guide, benchmarks, and per-request examples (reasoning effort, tool calls, vision, streaming), see the [SGLang cookbook entry for Mistral Medium 3.5](https://docs.sglang.io/cookbook/autoregressive/Mistral/Mistral-Medium-3.5).
515
+
516
  ## Transformers
517
 
518
  ### Installation