Spaces:

tritesh
/

dflash-mlx-universal-demo

Runtime error

App Files Files Community

tritesh commited on 1 day ago

Commit

0b3ae42

verified ·

1 Parent(s): d728bf2

Upload README.md

Browse files

Files changed (1) hide show

README.md +64 -10

README.md CHANGED Viewed

@@ -1,15 +1,69 @@
 ---
-sdk: gradio
-title: Dflash Mlx Universal Demo
-emoji: 📊
 colorFrom: purple
-colorTo: purple
-sdk_version: 6.14.0
-python_version: '3.13'
 app_file: app.py
-pinned: false
-tags:
-- ml-intern
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: "DFlash-MLX-Universal: Interactive Demo"
+emoji: 🚀
 colorFrom: purple
+colorTo: blue
+sdk: gradio
+sdk_version: "5.0.0"
 app_file: app.py
+pinned: true
 ---
+# 🚀 DFlash-MLX-Universal Demo
+**Block Diffusion Speculative Decoding for Apple Silicon (MLX)**
+This interactive demo showcases [DFlash](https://arxiv.org/abs/2602.06036) — a block diffusion model that accelerates LLM inference by **6×** on Apple Silicon with **lossless output**.
+## What is DFlash?
+- **Traditional speculative decoding**: Drafts 1 token at a time → 2-3× speedup
+- **DFlash**: Drafts 16 tokens in parallel via diffusion → **6× speedup**
+- **Key innovation**: Draft model conditions on target model's hidden states (KV injection)
+- **Result**: Output identical to greedy autoregressive generation
+## Demo Tabs
+| Tab | What it does |
+|-----|-------------|
+| 🏃 **Quick Start** | Select a model, enter a prompt, generate code & see simulated results |
+| 🛠️ **Convert Drafter** | Get the `uv` command to convert official drafters to MLX format |
+| 🎓 **Training** | Code template to train custom drafters for unsupported models |
+| 🖥️ **Server** | Commands to start an OpenAI-compatible local server |
+| 📊 **Benchmarks** | Performance table: 6× speedup across 6 models |
+| 📖 **Architecture** | Deep dive into how block diffusion + KV injection works |
+| 📦 **Installation** | `uv` and `pip` setup instructions |
+## Supported Models
+- **Qwen3** (4B, 8B)
+- **Qwen3.5** (4B, 9B, 27B)
+- **Qwen3.6** (27B, 35B-A3B)
+- **LLaMA-3.1** (8B)
+- **Gemma-4** (31B)
+## Quick Start (on your Mac)
+```bash
+# 1. Install uv
+brew install uv
+# 2. Clone and setup
+git clone https://huggingface.co/tritesh/dflash-mlx-universal.git
+cd dflash-mlx-universal
+./setup_uv.sh
+# 3. Convert a drafter
+uv run python -m dflash_mlx.convert \
+    --model z-lab/Qwen3-4B-DFlash-b16 \
+    --output ./Qwen3-4B-DFlash-mlx
+# 4. Generate
+uv run python examples/qwen3_4b_demo.py
+```
+## Links
+- **Paper**: [arXiv:2602.06036](https://arxiv.org/abs/2602.06036)
+- **Repository**: [tritesh/dflash-mlx-universal](https://huggingface.co/tritesh/dflash-mlx-universal)
+- **Package**: `dflash-mlx-universal` (PyPI compatible)