--- title: 'DFlash-MLX-Universal: Interactive Demo' emoji: 🚀 colorFrom: purple colorTo: blue sdk: gradio sdk_version: 5.0.0 app_file: app.py pinned: true tags: - ml-intern --- # 🚀 DFlash-MLX-Universal Demo **Block Diffusion Speculative Decoding for Apple Silicon (MLX)** This interactive demo showcases [DFlash](https://arxiv.org/abs/2602.06036) — a block diffusion model that accelerates LLM inference by **6×** on Apple Silicon with **lossless output**. ## What is DFlash? - **Traditional speculative decoding**: Drafts 1 token at a time → 2-3× speedup - **DFlash**: Drafts 16 tokens in parallel via diffusion → **6× speedup** - **Key innovation**: Draft model conditions on target model's hidden states (KV injection) - **Result**: Output identical to greedy autoregressive generation ## Demo Tabs | Tab | What it does | |-----|-------------| | 🏃 **Quick Start** | Select a model, enter a prompt, generate code & see simulated results | | 🛠️ **Convert Drafter** | Get the `uv` command to convert official drafters to MLX format | | 🎓 **Training** | Code template to train custom drafters for unsupported models | | 🖥️ **Server** | Commands to start an OpenAI-compatible local server | | 📊 **Benchmarks** | Performance table: 6× speedup across 6 models | | 📖 **Architecture** | Deep dive into how block diffusion + KV injection works | | 📦 **Installation** | `uv` and `pip` setup instructions | ## Supported Models - **Qwen3** (4B, 8B) - **Qwen3.5** (4B, 9B, 27B) - **Qwen3.6** (27B, 35B-A3B) - **LLaMA-3.1** (8B) - **Gemma-4** (31B) ## Quick Start (on your Mac) ```bash # 1. Install uv brew install uv # 2. Clone and setup git clone https://huggingface.co/tritesh/dflash-mlx-universal.git cd dflash-mlx-universal ./setup_uv.sh # 3. Convert a drafter uv run python -m dflash_mlx.convert \ --model z-lab/Qwen3-4B-DFlash-b16 \ --output ./Qwen3-4B-DFlash-mlx # 4. Generate uv run python examples/qwen3_4b_demo.py ``` ## Links - **Paper**: [arXiv:2602.06036](https://arxiv.org/abs/2602.06036) - **Repository**: [tritesh/dflash-mlx-universal](https://huggingface.co/tritesh/dflash-mlx-universal) - **Package**: `dflash-mlx-universal` (PyPI compatible)