Spaces:

tritesh
/

dflash-mlx-universal-demo

Runtime error

App Files Files Community

dflash-mlx-universal-demo / README.md

tritesh

Update ML Intern artifact metadata

5c66bee verified 1 day ago

preview code

raw

history blame contribute delete

2.21 kB

	---
	title: 'DFlash-MLX-Universal: Interactive Demo'
	emoji: 🚀
	colorFrom: purple
	colorTo: blue
	sdk: gradio
	sdk_version: 5.0.0
	app_file: app.py
	pinned: true
	tags:
	- ml-intern
	---

	# 🚀 DFlash-MLX-Universal Demo

	Block Diffusion Speculative Decoding for Apple Silicon (MLX)

	This interactive demo showcases [DFlash](https://arxiv.org/abs/2602.06036) — a block diffusion model that accelerates LLM inference by 6× on Apple Silicon with lossless output.

	## What is DFlash?

	- Traditional speculative decoding: Drafts 1 token at a time → 2-3× speedup
	- DFlash: Drafts 16 tokens in parallel via diffusion → 6× speedup
	- Key innovation: Draft model conditions on target model's hidden states (KV injection)
	- Result: Output identical to greedy autoregressive generation

	## Demo Tabs

	\| Tab \| What it does \|
	\|-----\|-------------\|
	\| 🏃 Quick Start \| Select a model, enter a prompt, generate code & see simulated results \|
	\| 🛠️ Convert Drafter \| Get the `uv` command to convert official drafters to MLX format \|
	\| 🎓 Training \| Code template to train custom drafters for unsupported models \|
	\| 🖥️ Server \| Commands to start an OpenAI-compatible local server \|
	\| 📊 Benchmarks \| Performance table: 6× speedup across 6 models \|
	\| 📖 Architecture \| Deep dive into how block diffusion + KV injection works \|
	\| 📦 Installation \| `uv` and `pip` setup instructions \|

	## Supported Models

	- Qwen3 (4B, 8B)
	- Qwen3.5 (4B, 9B, 27B)
	- Qwen3.6 (27B, 35B-A3B)
	- LLaMA-3.1 (8B)
	- Gemma-4 (31B)

	## Quick Start (on your Mac)

	```bash
	# 1. Install uv
	brew install uv

	# 2. Clone and setup
	git clone https://huggingface.co/tritesh/dflash-mlx-universal.git
	cd dflash-mlx-universal
	./setup_uv.sh

	# 3. Convert a drafter
	uv run python -m dflash_mlx.convert \
	--model z-lab/Qwen3-4B-DFlash-b16 \
	--output ./Qwen3-4B-DFlash-mlx

	# 4. Generate
	uv run python examples/qwen3_4b_demo.py
	```

	## Links

	- Paper: [arXiv:2602.06036](https://arxiv.org/abs/2602.06036)
	- Repository: [tritesh/dflash-mlx-universal](https://huggingface.co/tritesh/dflash-mlx-universal)
	- Package: `dflash-mlx-universal` (PyPI compatible)