lukealonso
/

MiMo-V2.5-NVFP4

8-bit precision

Model card Files Files and versions

lukealonso commited on 7 days ago

Commit

9ce2d8b

·

verified ·

1 Parent(s): fe935b5

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -7,7 +7,7 @@ base_model:
 ## Model Description
-**MiMo-V2.5-NVFP4** is an NVFP4-quantized version of [zai-org/GLM-5.1](https://huggingface.co/XiaomiMiMo/MiMo-V2.5).
 This is a multi-modal model, supporting text, images, audio and video. This quantization carefully preserves those capabilities.
@@ -51,7 +51,7 @@ Note: You will of course want to modify this to bind mount your HF cache, or you
     -e OMP_NUM_THREADS=16 \
     -e SAFETENSORS_FAST_GPU=1 \
     -e CUTE_DSL_ARCH="sm_120a" \
-    docker.io/library/sglang-cuda13-b12x:latest \
     python -m sglang.launch_server \
       --model-path lukealonso/MiMo-V2.5-NVFP4 \
       --served-model-name "MiMo-V2.5" \

 ## Model Description
+**MiMo-V2.5-NVFP4** is an NVFP4-quantized version of [XiaomiMiMo/MiMo-V2.5](https://huggingface.co/XiaomiMiMo/MiMo-V2.5).
 This is a multi-modal model, supporting text, images, audio and video. This quantization carefully preserves those capabilities.
     -e OMP_NUM_THREADS=16 \
     -e SAFETENSORS_FAST_GPU=1 \
     -e CUTE_DSL_ARCH="sm_120a" \
+    docker.io/lukealonso/sglang-cuda13-b12x \
     python -m sglang.launch_server \
       --model-path lukealonso/MiMo-V2.5-NVFP4 \
       --served-model-name "MiMo-V2.5" \