docs: add KTransformers CPU offloading inference guide (#34)

- docs: add KTransformers CPU offloading inference guide (69da6c01b1df1d19073003a959a6f3be742a598a)

Co-authored-by: Weiyu Xie <ErvinX@users.noreply.huggingface.co>

Files changed (1) hide show

README.md +6 -0

README.md CHANGED Viewed

@@ -250,6 +250,12 @@ curl -i http://localhost:9001/v1/chat/completions \
         }'
 ```
 ### Notifications
 #### 1. System prompt

         }'
 ```
+### Inference with KTransformers (CPU Offloading)
+[KTransformers](https://github.com/kvcache-ai/ktransformers) enables efficient MiMo-V2-Flash deployment on consumer-grade hardware by offloading MoE expert computations to CPU, built on top of SGLang. With **4× RTX 5090 + 2× AMD EPYC 9355**, it achieves up to **35.7 tokens/s** decode speed.
+For quick start and benchmarks, visit [KTransformers](https://ktransformers.net/zh/benchmarks#MiMo-V2-Flash-FP8-TP4).
 ### Notifications
 #### 1. System prompt