Spaces:
Runtime error
Runtime error
metadata
title: Diffusion Chatbot
emoji: π€
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: apache-2.0
π€ Diffusion Chatbot
Flask server hosting the Qwen3-0.6B-diffusion-bd3lm-v0.1 model with real-time streaming inference. Watch diffusion language models generate text step-by-step!
β¨ Features
- π― Real-time Streaming: Watch the diffusion denoising process live
- π‘ Three API Endpoints: Simple generation, batch states, and SSE streaming
- β‘ GPU Support: Automatic GPU detection with CPU fallback
- π Progressive Generation: See how different parts of text appear at different steps
π‘ API Endpoints
1. Health Check
GET /health
2. Generate Text (Simple)
POST /generate
Content-Type: application/json
{
"prompt": "Your question here",
"max_new_tokens": 256
}
3. Generate with Real-time Streaming (SSE) β
POST /generate_sse
Content-Type: application/json
{
"prompt": "Your question here",
"max_new_tokens": 100,
"capture_interval": 10
}
π‘ Example Usage
# Simple generation
curl -X POST https://YOUR_USERNAME-diffusion-chatbot.hf.space/generate \
-H "Content-Type: application/json" \
-d '{"prompt": "Hello, how are you?", "max_new_tokens": 50}'
# Real-time streaming
curl -N -X POST https://YOUR_USERNAME-diffusion-chatbot.hf.space/generate_sse \
-H "Content-Type: application/json" \
-d '{"prompt": "Write a poem", "max_new_tokens": 100, "capture_interval": 10}'
π§ Technical Details
| Component | Technology |
|---|---|
| Model | dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1 |
| Framework | Flask + PyTorch |
| Method | Block Diffusion Language Model (BD3LM) |
| Base Model | Qwen |
βοΈ Configuration
| Variable | Description | Default |
|---|---|---|
MODEL_NAME |
HuggingFace model name | dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1 |
PORT |
Server port | 7860 |
π§ How It Works
Unlike traditional language models that generate text left-to-right, diffusion language models:
- Start with all tokens masked
- Iteratively denoise over multiple steps
- Generate different parts of text at different steps
- Create a unique "thought process" visualization
π Notes
- Model downloads automatically on first run (~1.5GB)
- First request may be slow as model loads
- GPU is optional - automatic CPU fallback
- Lower
capture_interval= more frequent updates