diffusion-chatbot / README.md
Cashel's picture
Upload folder using huggingface_hub
0758411 verified
metadata
title: Diffusion Chatbot
emoji: πŸ€–
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: apache-2.0

πŸ€– Diffusion Chatbot

Docker Python

Flask server hosting the Qwen3-0.6B-diffusion-bd3lm-v0.1 model with real-time streaming inference. Watch diffusion language models generate text step-by-step!

✨ Features

  • 🎯 Real-time Streaming: Watch the diffusion denoising process live
  • πŸ“‘ Three API Endpoints: Simple generation, batch states, and SSE streaming
  • ⚑ GPU Support: Automatic GPU detection with CPU fallback
  • πŸ”„ Progressive Generation: See how different parts of text appear at different steps

πŸ“‘ API Endpoints

1. Health Check

GET /health

2. Generate Text (Simple)

POST /generate
Content-Type: application/json

{
  "prompt": "Your question here",
  "max_new_tokens": 256
}

3. Generate with Real-time Streaming (SSE) ⭐

POST /generate_sse
Content-Type: application/json

{
  "prompt": "Your question here",
  "max_new_tokens": 100,
  "capture_interval": 10
}

πŸ’‘ Example Usage

# Simple generation
curl -X POST https://YOUR_USERNAME-diffusion-chatbot.hf.space/generate \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Hello, how are you?", "max_new_tokens": 50}'

# Real-time streaming
curl -N -X POST https://YOUR_USERNAME-diffusion-chatbot.hf.space/generate_sse \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Write a poem", "max_new_tokens": 100, "capture_interval": 10}'

πŸ”§ Technical Details

Component Technology
Model dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1
Framework Flask + PyTorch
Method Block Diffusion Language Model (BD3LM)
Base Model Qwen

βš™οΈ Configuration

Variable Description Default
MODEL_NAME HuggingFace model name dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1
PORT Server port 7860

🧠 How It Works

Unlike traditional language models that generate text left-to-right, diffusion language models:

  1. Start with all tokens masked
  2. Iteratively denoise over multiple steps
  3. Generate different parts of text at different steps
  4. Create a unique "thought process" visualization

πŸ“ Notes

  • Model downloads automatically on first run (~1.5GB)
  • First request may be slow as model loads
  • GPU is optional - automatic CPU fallback
  • Lower capture_interval = more frequent updates

πŸ™ Acknowledgments