---
title: Diffusion Chatbot
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
license: apache-2.0
---

# 🤖 Diffusion Chatbot

[![Docker](https://img.shields.io/badge/Docker-Ready-blue)](https://www.docker.com/)
[![Python](https://img.shields.io/badge/Python-3.10+-green)](https://www.python.org/)

Flask server hosting the **Qwen3-0.6B-diffusion-bd3lm-v0.1** model with real-time streaming inference. Watch diffusion language models generate text step-by-step!

## ✨ Features

- 🎯 **Real-time Streaming**: Watch the diffusion denoising process live
- 📡 **Three API Endpoints**: Simple generation, batch states, and SSE streaming
- ⚡ **GPU Support**: Automatic GPU detection with CPU fallback
- 🔄 **Progressive Generation**: See how different parts of text appear at different steps

## 📡 API Endpoints

### 1. Health Check
```bash
GET /health
```

### 2. Generate Text (Simple)
```bash
POST /generate
Content-Type: application/json

{
  "prompt": "Your question here",
  "max_new_tokens": 256
}
```

### 3. Generate with Real-time Streaming (SSE) ⭐
```bash
POST /generate_sse
Content-Type: application/json

{
  "prompt": "Your question here",
  "max_new_tokens": 100,
  "capture_interval": 10
}
```

## 💡 Example Usage

```bash
# Simple generation
curl -X POST https://YOUR_USERNAME-diffusion-chatbot.hf.space/generate \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Hello, how are you?", "max_new_tokens": 50}'

# Real-time streaming
curl -N -X POST https://YOUR_USERNAME-diffusion-chatbot.hf.space/generate_sse \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Write a poem", "max_new_tokens": 100, "capture_interval": 10}'
```

## 🔧 Technical Details

| Component | Technology |
|-----------|------------|
| **Model** | [dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1](https://huggingface.co/dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1) |
| **Framework** | Flask + PyTorch |
| **Method** | Block Diffusion Language Model (BD3LM) |
| **Base Model** | Qwen |

## ⚙️ Configuration

| Variable | Description | Default |
|----------|-------------|---------|
| `MODEL_NAME` | HuggingFace model name | `dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1` |
| `PORT` | Server port | `7860` |

## 🧠 How It Works

Unlike traditional language models that generate text left-to-right, diffusion language models:

1. Start with all tokens masked
2. Iteratively denoise over multiple steps
3. Generate different parts of text at different steps
4. Create a unique "thought process" visualization

## 📝 Notes

- Model downloads automatically on first run (~1.5GB)
- First request may be slow as model loads
- GPU is optional - automatic CPU fallback
- Lower `capture_interval` = more frequent updates

## 🙏 Acknowledgments

- Model: [dllm-hub](https://huggingface.co/dllm-hub)
- Framework: [dLLM](https://github.com/ZHZisZZ/dllm)
- Base: [Qwen](https://github.com/QwenLM/Qwen)