Spaces:
Runtime error
Runtime error
File size: 2,214 Bytes
edca3c2 5c66bee 0b3ae42 edca3c2 0b3ae42 5c66bee edca3c2 0b3ae42 5c66bee edca3c2 0b3ae42 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 | ---
title: 'DFlash-MLX-Universal: Interactive Demo'
emoji: π
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 5.0.0
app_file: app.py
pinned: true
tags:
- ml-intern
---
# π DFlash-MLX-Universal Demo
**Block Diffusion Speculative Decoding for Apple Silicon (MLX)**
This interactive demo showcases [DFlash](https://arxiv.org/abs/2602.06036) β a block diffusion model that accelerates LLM inference by **6Γ** on Apple Silicon with **lossless output**.
## What is DFlash?
- **Traditional speculative decoding**: Drafts 1 token at a time β 2-3Γ speedup
- **DFlash**: Drafts 16 tokens in parallel via diffusion β **6Γ speedup**
- **Key innovation**: Draft model conditions on target model's hidden states (KV injection)
- **Result**: Output identical to greedy autoregressive generation
## Demo Tabs
| Tab | What it does |
|-----|-------------|
| π **Quick Start** | Select a model, enter a prompt, generate code & see simulated results |
| π οΈ **Convert Drafter** | Get the `uv` command to convert official drafters to MLX format |
| π **Training** | Code template to train custom drafters for unsupported models |
| π₯οΈ **Server** | Commands to start an OpenAI-compatible local server |
| π **Benchmarks** | Performance table: 6Γ speedup across 6 models |
| π **Architecture** | Deep dive into how block diffusion + KV injection works |
| π¦ **Installation** | `uv` and `pip` setup instructions |
## Supported Models
- **Qwen3** (4B, 8B)
- **Qwen3.5** (4B, 9B, 27B)
- **Qwen3.6** (27B, 35B-A3B)
- **LLaMA-3.1** (8B)
- **Gemma-4** (31B)
## Quick Start (on your Mac)
```bash
# 1. Install uv
brew install uv
# 2. Clone and setup
git clone https://huggingface.co/tritesh/dflash-mlx-universal.git
cd dflash-mlx-universal
./setup_uv.sh
# 3. Convert a drafter
uv run python -m dflash_mlx.convert \
--model z-lab/Qwen3-4B-DFlash-b16 \
--output ./Qwen3-4B-DFlash-mlx
# 4. Generate
uv run python examples/qwen3_4b_demo.py
```
## Links
- **Paper**: [arXiv:2602.06036](https://arxiv.org/abs/2602.06036)
- **Repository**: [tritesh/dflash-mlx-universal](https://huggingface.co/tritesh/dflash-mlx-universal)
- **Package**: `dflash-mlx-universal` (PyPI compatible)
|