--- title: Diffusion Chatbot emoji: 🤖 colorFrom: blue colorTo: purple sdk: docker pinned: false license: apache-2.0 --- # 🤖 Diffusion Chatbot [![Docker](https://img.shields.io/badge/Docker-Ready-blue)](https://www.docker.com/) [![Python](https://img.shields.io/badge/Python-3.10+-green)](https://www.python.org/) Flask server hosting the **Qwen3-0.6B-diffusion-bd3lm-v0.1** model with real-time streaming inference. Watch diffusion language models generate text step-by-step! ## ✨ Features - 🎯 **Real-time Streaming**: Watch the diffusion denoising process live - 📡 **Three API Endpoints**: Simple generation, batch states, and SSE streaming - ⚡ **GPU Support**: Automatic GPU detection with CPU fallback - 🔄 **Progressive Generation**: See how different parts of text appear at different steps ## 📡 API Endpoints ### 1. Health Check ```bash GET /health ``` ### 2. Generate Text (Simple) ```bash POST /generate Content-Type: application/json { "prompt": "Your question here", "max_new_tokens": 256 } ``` ### 3. Generate with Real-time Streaming (SSE) ⭐ ```bash POST /generate_sse Content-Type: application/json { "prompt": "Your question here", "max_new_tokens": 100, "capture_interval": 10 } ``` ## 💡 Example Usage ```bash # Simple generation curl -X POST https://YOUR_USERNAME-diffusion-chatbot.hf.space/generate \ -H "Content-Type: application/json" \ -d '{"prompt": "Hello, how are you?", "max_new_tokens": 50}' # Real-time streaming curl -N -X POST https://YOUR_USERNAME-diffusion-chatbot.hf.space/generate_sse \ -H "Content-Type: application/json" \ -d '{"prompt": "Write a poem", "max_new_tokens": 100, "capture_interval": 10}' ``` ## 🔧 Technical Details | Component | Technology | |-----------|------------| | **Model** | [dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1](https://huggingface.co/dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1) | | **Framework** | Flask + PyTorch | | **Method** | Block Diffusion Language Model (BD3LM) | | **Base Model** | Qwen | ## ⚙️ Configuration | Variable | Description | Default | |----------|-------------|---------| | `MODEL_NAME` | HuggingFace model name | `dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1` | | `PORT` | Server port | `7860` | ## 🧠 How It Works Unlike traditional language models that generate text left-to-right, diffusion language models: 1. Start with all tokens masked 2. Iteratively denoise over multiple steps 3. Generate different parts of text at different steps 4. Create a unique "thought process" visualization ## 📝 Notes - Model downloads automatically on first run (~1.5GB) - First request may be slow as model loads - GPU is optional - automatic CPU fallback - Lower `capture_interval` = more frequent updates ## 🙏 Acknowledgments - Model: [dllm-hub](https://huggingface.co/dllm-hub) - Framework: [dLLM](https://github.com/ZHZisZZ/dllm) - Base: [Qwen](https://github.com/QwenLM/Qwen)