Spaces:
Runtime error
Runtime error
| title: 'DFlash-MLX-Universal: Interactive Demo' | |
| emoji: π | |
| colorFrom: purple | |
| colorTo: blue | |
| sdk: gradio | |
| sdk_version: 5.0.0 | |
| app_file: app.py | |
| pinned: true | |
| tags: | |
| - ml-intern | |
| # π DFlash-MLX-Universal Demo | |
| **Block Diffusion Speculative Decoding for Apple Silicon (MLX)** | |
| This interactive demo showcases [DFlash](https://arxiv.org/abs/2602.06036) β a block diffusion model that accelerates LLM inference by **6Γ** on Apple Silicon with **lossless output**. | |
| ## What is DFlash? | |
| - **Traditional speculative decoding**: Drafts 1 token at a time β 2-3Γ speedup | |
| - **DFlash**: Drafts 16 tokens in parallel via diffusion β **6Γ speedup** | |
| - **Key innovation**: Draft model conditions on target model's hidden states (KV injection) | |
| - **Result**: Output identical to greedy autoregressive generation | |
| ## Demo Tabs | |
| | Tab | What it does | | |
| |-----|-------------| | |
| | π **Quick Start** | Select a model, enter a prompt, generate code & see simulated results | | |
| | π οΈ **Convert Drafter** | Get the `uv` command to convert official drafters to MLX format | | |
| | π **Training** | Code template to train custom drafters for unsupported models | | |
| | π₯οΈ **Server** | Commands to start an OpenAI-compatible local server | | |
| | π **Benchmarks** | Performance table: 6Γ speedup across 6 models | | |
| | π **Architecture** | Deep dive into how block diffusion + KV injection works | | |
| | π¦ **Installation** | `uv` and `pip` setup instructions | | |
| ## Supported Models | |
| - **Qwen3** (4B, 8B) | |
| - **Qwen3.5** (4B, 9B, 27B) | |
| - **Qwen3.6** (27B, 35B-A3B) | |
| - **LLaMA-3.1** (8B) | |
| - **Gemma-4** (31B) | |
| ## Quick Start (on your Mac) | |
| ```bash | |
| # 1. Install uv | |
| brew install uv | |
| # 2. Clone and setup | |
| git clone https://huggingface.co/tritesh/dflash-mlx-universal.git | |
| cd dflash-mlx-universal | |
| ./setup_uv.sh | |
| # 3. Convert a drafter | |
| uv run python -m dflash_mlx.convert \ | |
| --model z-lab/Qwen3-4B-DFlash-b16 \ | |
| --output ./Qwen3-4B-DFlash-mlx | |
| # 4. Generate | |
| uv run python examples/qwen3_4b_demo.py | |
| ``` | |
| ## Links | |
| - **Paper**: [arXiv:2602.06036](https://arxiv.org/abs/2602.06036) | |
| - **Repository**: [tritesh/dflash-mlx-universal](https://huggingface.co/tritesh/dflash-mlx-universal) | |
| - **Package**: `dflash-mlx-universal` (PyPI compatible) | |