Cashel commited on
Commit
0758411
Β·
verified Β·
1 Parent(s): a919dff

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +58 -21
README.md CHANGED
@@ -8,25 +8,29 @@ pinned: false
8
  license: apache-2.0
9
  ---
10
 
11
- # Diffusion Chatbot
12
 
13
- Flask server hosting the Qwen3-0.6B-diffusion-bd3lm-v0.1 model with real-time streaming inference.
 
14
 
15
- ## Features
16
 
17
- - **Real-time streaming**: Watch the diffusion model generate text step-by-step
18
- - **Three endpoints**: Simple generation, batch intermediate states, and real-time SSE streaming
19
- - **GPU support**: Automatically uses GPU if available, falls back to CPU
20
 
21
- ## API Endpoints
 
 
 
22
 
23
- ### Health Check
24
- ```
 
 
25
  GET /health
26
  ```
27
 
28
- ### Generate Text
29
- ```
30
  POST /generate
31
  Content-Type: application/json
32
 
@@ -36,8 +40,8 @@ Content-Type: application/json
36
  }
37
  ```
38
 
39
- ### Generate with Real-time Streaming (SSE)
40
- ```
41
  POST /generate_sse
42
  Content-Type: application/json
43
 
@@ -48,21 +52,54 @@ Content-Type: application/json
48
  }
49
  ```
50
 
51
- ## Example Usage
52
 
53
  ```bash
 
54
  curl -X POST https://YOUR_USERNAME-diffusion-chatbot.hf.space/generate \
55
  -H "Content-Type: application/json" \
56
  -d '{"prompt": "Hello, how are you?", "max_new_tokens": 50}'
 
 
 
 
 
57
  ```
58
 
59
- ## Technical Details
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
60
 
61
- - **Model**: [dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1](https://huggingface.co/dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1)
62
- - **Framework**: Flask + PyTorch
63
- - **Diffusion Method**: Block Diffusion Language Model (BD3LM)
 
64
 
65
- ## Environment Variables
66
 
67
- - `MODEL_NAME`: HuggingFace model name (default: dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1)
68
- - `PORT`: Server port (default: 7860 for HF Spaces)
 
 
8
  license: apache-2.0
9
  ---
10
 
11
+ # πŸ€– Diffusion Chatbot
12
 
13
+ [![Docker](https://img.shields.io/badge/Docker-Ready-blue)](https://www.docker.com/)
14
+ [![Python](https://img.shields.io/badge/Python-3.10+-green)](https://www.python.org/)
15
 
16
+ Flask server hosting the **Qwen3-0.6B-diffusion-bd3lm-v0.1** model with real-time streaming inference. Watch diffusion language models generate text step-by-step!
17
 
18
+ ## ✨ Features
 
 
19
 
20
+ - 🎯 **Real-time Streaming**: Watch the diffusion denoising process live
21
+ - πŸ“‘ **Three API Endpoints**: Simple generation, batch states, and SSE streaming
22
+ - ⚑ **GPU Support**: Automatic GPU detection with CPU fallback
23
+ - πŸ”„ **Progressive Generation**: See how different parts of text appear at different steps
24
 
25
+ ## πŸ“‘ API Endpoints
26
+
27
+ ### 1. Health Check
28
+ ```bash
29
  GET /health
30
  ```
31
 
32
+ ### 2. Generate Text (Simple)
33
+ ```bash
34
  POST /generate
35
  Content-Type: application/json
36
 
 
40
  }
41
  ```
42
 
43
+ ### 3. Generate with Real-time Streaming (SSE) ⭐
44
+ ```bash
45
  POST /generate_sse
46
  Content-Type: application/json
47
 
 
52
  }
53
  ```
54
 
55
+ ## πŸ’‘ Example Usage
56
 
57
  ```bash
58
+ # Simple generation
59
  curl -X POST https://YOUR_USERNAME-diffusion-chatbot.hf.space/generate \
60
  -H "Content-Type: application/json" \
61
  -d '{"prompt": "Hello, how are you?", "max_new_tokens": 50}'
62
+
63
+ # Real-time streaming
64
+ curl -N -X POST https://YOUR_USERNAME-diffusion-chatbot.hf.space/generate_sse \
65
+ -H "Content-Type: application/json" \
66
+ -d '{"prompt": "Write a poem", "max_new_tokens": 100, "capture_interval": 10}'
67
  ```
68
 
69
+ ## πŸ”§ Technical Details
70
+
71
+ | Component | Technology |
72
+ |-----------|------------|
73
+ | **Model** | [dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1](https://huggingface.co/dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1) |
74
+ | **Framework** | Flask + PyTorch |
75
+ | **Method** | Block Diffusion Language Model (BD3LM) |
76
+ | **Base Model** | Qwen |
77
+
78
+ ## βš™οΈ Configuration
79
+
80
+ | Variable | Description | Default |
81
+ |----------|-------------|---------|
82
+ | `MODEL_NAME` | HuggingFace model name | `dllm-hub/Qwen3-0.6B-diffusion-bd3lm-v0.1` |
83
+ | `PORT` | Server port | `7860` |
84
+
85
+ ## 🧠 How It Works
86
+
87
+ Unlike traditional language models that generate text left-to-right, diffusion language models:
88
+
89
+ 1. Start with all tokens masked
90
+ 2. Iteratively denoise over multiple steps
91
+ 3. Generate different parts of text at different steps
92
+ 4. Create a unique "thought process" visualization
93
+
94
+ ## πŸ“ Notes
95
 
96
+ - Model downloads automatically on first run (~1.5GB)
97
+ - First request may be slow as model loads
98
+ - GPU is optional - automatic CPU fallback
99
+ - Lower `capture_interval` = more frequent updates
100
 
101
+ ## πŸ™ Acknowledgments
102
 
103
+ - Model: [dllm-hub](https://huggingface.co/dllm-hub)
104
+ - Framework: [dLLM](https://github.com/ZHZisZZ/dllm)
105
+ - Base: [Qwen](https://github.com/QwenLM/Qwen)