Update README.md
Browse files
README.md
CHANGED
|
@@ -7,7 +7,7 @@ base_model:
|
|
| 7 |
|
| 8 |
## Model Description
|
| 9 |
|
| 10 |
-
**MiMo-V2.5-NVFP4** is an NVFP4-quantized version of [
|
| 11 |
|
| 12 |
This is a multi-modal model, supporting text, images, audio and video. This quantization carefully preserves those capabilities.
|
| 13 |
|
|
@@ -51,7 +51,7 @@ Note: You will of course want to modify this to bind mount your HF cache, or you
|
|
| 51 |
-e OMP_NUM_THREADS=16 \
|
| 52 |
-e SAFETENSORS_FAST_GPU=1 \
|
| 53 |
-e CUTE_DSL_ARCH="sm_120a" \
|
| 54 |
-
docker.io/
|
| 55 |
python -m sglang.launch_server \
|
| 56 |
--model-path lukealonso/MiMo-V2.5-NVFP4 \
|
| 57 |
--served-model-name "MiMo-V2.5" \
|
|
|
|
| 7 |
|
| 8 |
## Model Description
|
| 9 |
|
| 10 |
+
**MiMo-V2.5-NVFP4** is an NVFP4-quantized version of [XiaomiMiMo/MiMo-V2.5](https://huggingface.co/XiaomiMiMo/MiMo-V2.5).
|
| 11 |
|
| 12 |
This is a multi-modal model, supporting text, images, audio and video. This quantization carefully preserves those capabilities.
|
| 13 |
|
|
|
|
| 51 |
-e OMP_NUM_THREADS=16 \
|
| 52 |
-e SAFETENSORS_FAST_GPU=1 \
|
| 53 |
-e CUTE_DSL_ARCH="sm_120a" \
|
| 54 |
+
docker.io/lukealonso/sglang-cuda13-b12x \
|
| 55 |
python -m sglang.launch_server \
|
| 56 |
--model-path lukealonso/MiMo-V2.5-NVFP4 \
|
| 57 |
--served-model-name "MiMo-V2.5" \
|