Spaces:

Cashel
/

diffusion-chatbot

Runtime error

App Files Files Community

Cashel commited on Mar 7

Commit

0758411

verified ·

1 Parent(s): a919dff

Upload folder using huggingface_hub

Browse files

Files changed (1) hide show

README.md +58 -21

README.md CHANGED Viewed

@@ -8,25 +8,29 @@ pinned: false
 license: apache-2.0
 ---
-# Diffusion Chatbot
-Flask server hosting the Qwen3-0.6B-diffusion-bd3lm-v0.1 model with real-time streaming inference.
-## Features
-- **Real-time streaming**: Watch the diffusion model generate text step-by-step
-- **Three endpoints**: Simple generation, batch intermediate states, and real-time SSE streaming
-- **GPU support**: Automatically uses GPU if available, falls back to CPU
-## API Endpoints
-### Health Check
-```
 GET /health
 ```
-### Generate Text
-```
 POST /generate
 Content-Type: application/json
@@ -36,8 +40,8 @@ Content-Type: application/json
 }
 ```
-### Generate with Real-time Streaming (SSE)
-```
 POST /generate_sse
 Content-Type: application/json
@@ -48,21 +52,54 @@ Content-Type: application/json
 }
 ```
-## Example Usage
 ```bash
 curl -X POST https://YOUR_USERNAME-diffusion-chatbot.hf.space/generate \
   -H "Content-Type: application/json" \
   -d '{"prompt": "Hello, how are you?", "max_new_tokens": 50}'
 ```
-## Technical Details
-- **Model**: [dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1](https://huggingface.co/dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1)
-- **Framework**: Flask + PyTorch
-- **Diffusion Method**: Block Diffusion Language Model (BD3LM)
-## Environment Variables
-- `MODEL_NAME`: HuggingFace model name (default: dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1)
-- `PORT`: Server port (default: 7860 for HF Spaces)

 license: apache-2.0
 ---
+# 🤖 Diffusion Chatbot
+[![Docker](https://img.shields.io/badge/Docker-Ready-blue)](https://www.docker.com/)
+[![Python](https://img.shields.io/badge/Python-3.10+-green)](https://www.python.org/)
+Flask server hosting the **Qwen3-0.6B-diffusion-bd3lm-v0.1** model with real-time streaming inference. Watch diffusion language models generate text step-by-step!
+## ✨ Features
+- 🎯 **Real-time Streaming**: Watch the diffusion denoising process live
+- 📡 **Three API Endpoints**: Simple generation, batch states, and SSE streaming
+- ⚡ **GPU Support**: Automatic GPU detection with CPU fallback
+- 🔄 **Progressive Generation**: See how different parts of text appear at different steps
+## 📡 API Endpoints
+### 1. Health Check
+```bash
 GET /health
 ```
+### 2. Generate Text (Simple)
+```bash
 POST /generate
 Content-Type: application/json
 }
 ```
+### 3. Generate with Real-time Streaming (SSE) ⭐
+```bash
 POST /generate_sse
 Content-Type: application/json
 }
 ```
+## 💡 Example Usage
 ```bash
+# Simple generation
 curl -X POST https://YOUR_USERNAME-diffusion-chatbot.hf.space/generate \
   -H "Content-Type: application/json" \
   -d '{"prompt": "Hello, how are you?", "max_new_tokens": 50}'
+# Real-time streaming
+curl -N -X POST https://YOUR_USERNAME-diffusion-chatbot.hf.space/generate_sse \
+  -H "Content-Type: application/json" \
+  -d '{"prompt": "Write a poem", "max_new_tokens": 100, "capture_interval": 10}'
 ```
+## 🔧 Technical Details
+| Component | Technology |
+|-----------|------------|
+| **Model** | [dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1](https://huggingface.co/dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1) |
+| **Framework** | Flask + PyTorch |
+| **Method** | Block Diffusion Language Model (BD3LM) |
+| **Base Model** | Qwen |
+## ⚙️ Configuration
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `MODEL_NAME` | HuggingFace model name | `dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1` |
+| `PORT` | Server port | `7860` |
+## 🧠 How It Works
+Unlike traditional language models that generate text left-to-right, diffusion language models:
+1. Start with all tokens masked
+2. Iteratively denoise over multiple steps
+3. Generate different parts of text at different steps
+4. Create a unique "thought process" visualization
+## 📝 Notes
+- Model downloads automatically on first run (~1.5GB)
+- First request may be slow as model loads
+- GPU is optional - automatic CPU fallback
+- Lower `capture_interval` = more frequent updates
+## 🙏 Acknowledgments
+- Model: [dllm-hub](https://huggingface.co/dllm-hub)
+- Framework: [dLLM](https://github.com/ZHZisZZ/dllm)
+- Base: [Qwen](https://github.com/QwenLM/Qwen)