Spaces:

Cashel
/

diffusion-chatbot

Runtime error

App Files Files Community

diffusion-chatbot / README.md

Cashel

Upload folder using huggingface_hub

0758411 verified about 1 month ago

preview code

raw

history blame contribute delete

2.93 kB

metadata

title: Diffusion Chatbot
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: apache-2.0

🤖 Diffusion Chatbot

Flask server hosting the Qwen3-0.6B-diffusion-bd3lm-v0.1 model with real-time streaming inference. Watch diffusion language models generate text step-by-step!

✨ Features

🎯 Real-time Streaming: Watch the diffusion denoising process live
📡 Three API Endpoints: Simple generation, batch states, and SSE streaming
⚡ GPU Support: Automatic GPU detection with CPU fallback
🔄 Progressive Generation: See how different parts of text appear at different steps

📡 API Endpoints

1. Health Check

GET /health

2. Generate Text (Simple)

POST /generate
Content-Type: application/json

{
  "prompt": "Your question here",
  "max_new_tokens": 256
}

3. Generate with Real-time Streaming (SSE) ⭐

POST /generate_sse
Content-Type: application/json

{
  "prompt": "Your question here",
  "max_new_tokens": 100,
  "capture_interval": 10
}

💡 Example Usage

# Simple generation
curl -X POST https://YOUR_USERNAME-diffusion-chatbot.hf.space/generate \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Hello, how are you?", "max_new_tokens": 50}'

# Real-time streaming
curl -N -X POST https://YOUR_USERNAME-diffusion-chatbot.hf.space/generate_sse \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Write a poem", "max_new_tokens": 100, "capture_interval": 10}'

🔧 Technical Details

Component	Technology
Model	dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1
Framework	Flask + PyTorch
Method	Block Diffusion Language Model (BD3LM)
Base Model	Qwen

⚙️ Configuration

Variable	Description	Default
`MODEL_NAME`	HuggingFace model name	`dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1`
`PORT`	Server port	`7860`

🧠 How It Works

Unlike traditional language models that generate text left-to-right, diffusion language models:

Start with all tokens masked
Iteratively denoise over multiple steps
Generate different parts of text at different steps
Create a unique "thought process" visualization

📝 Notes

Model downloads automatically on first run (~1.5GB)
First request may be slow as model loads
GPU is optional - automatic CPU fallback
Lower capture_interval = more frequent updates

🙏 Acknowledgments

Model: dllm-hub
Framework: dLLM
Base: Qwen