Spaces:

tritesh
/

dflash-mlx-universal-demo

Runtime error

App Files Files Community

dflash-mlx-universal-demo / README.md

tritesh

Update ML Intern artifact metadata

5c66bee verified 1 day ago

preview code

raw

history blame contribute delete

2.21 kB

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

metadata

title: 'DFlash-MLX-Universal: Interactive Demo'
emoji: 🚀
colorFrom: purple
colorTo: blue
sdk: gradio
sdk_version: 5.0.0
app_file: app.py
pinned: true
tags:
  - ml-intern

🚀 DFlash-MLX-Universal Demo

Block Diffusion Speculative Decoding for Apple Silicon (MLX)

This interactive demo showcases DFlash — a block diffusion model that accelerates LLM inference by 6× on Apple Silicon with lossless output.

What is DFlash?

Traditional speculative decoding: Drafts 1 token at a time → 2-3× speedup
DFlash: Drafts 16 tokens in parallel via diffusion → 6× speedup
Key innovation: Draft model conditions on target model's hidden states (KV injection)
Result: Output identical to greedy autoregressive generation

Demo Tabs

Tab	What it does
🏃 Quick Start	Select a model, enter a prompt, generate code & see simulated results
🛠️ Convert Drafter	Get the `uv` command to convert official drafters to MLX format
🎓 Training	Code template to train custom drafters for unsupported models
🖥️ Server	Commands to start an OpenAI-compatible local server
📊 Benchmarks	Performance table: 6× speedup across 6 models
📖 Architecture	Deep dive into how block diffusion + KV injection works
📦 Installation	`uv` and `pip` setup instructions

Supported Models

Qwen3 (4B, 8B)
Qwen3.5 (4B, 9B, 27B)
Qwen3.6 (27B, 35B-A3B)
LLaMA-3.1 (8B)
Gemma-4 (31B)

Quick Start (on your Mac)

# 1. Install uv
brew install uv

# 2. Clone and setup
git clone https://huggingface.co/tritesh/dflash-mlx-universal.git
cd dflash-mlx-universal
./setup_uv.sh

# 3. Convert a drafter
uv run python -m dflash_mlx.convert \
    --model z-lab/Qwen3-4B-DFlash-b16 \
    --output ./Qwen3-4B-DFlash-mlx

# 4. Generate
uv run python examples/qwen3_4b_demo.py

🚀 DFlash-MLX-Universal Demo

What is DFlash?

Demo Tabs

Supported Models

Quick Start (on your Mac)

Links