File size: 1,340 Bytes
ef557d2 d443994 ef557d2 d443994 d96c7c3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | ---
datasets:
- Skylion007/openwebtext
language:
- en
library_name: transformers
license: apache-2.0
metrics:
- perplexity
pipeline_tag: text-generation
---
# LangFlow
LangFlow is a continuous diffusion language model that operates in embedding space. Unlike discrete diffusion models (MDLM, SEDD, DUO), LangFlow performs diffusion directly on continuous token embeddings, enabling smoother denoising dynamics.
## Using LangFlow
To use the pre-trained model for text generation, use the following snippet:
```python
from transformers import AutoModelForMaskedLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('gpt2')
model = AutoModelForMaskedLM.from_pretrained('chumengl/langflow-owt', trust_remote_code=True)
# Generate samples
samples = model.generate_samples(num_samples=5, num_steps=128)
texts = tokenizer.batch_decode(samples, skip_special_tokens=True)
for text in texts:
print(text)
```
## Model Details
- **Architecture**: DiT (Diffusion Transformer) backbone with adaptive layer normalization
- **Context Length**: 1024 tokens
- **Parameters**: ~130M non-embedding parameters (similar to GPT-2 medium)
- **Training**: 1M steps on OpenWebText corpus
- **Tokenizer**: GPT-2 tokenizer (50,257 vocab size)
## Model Card Contact
Chumeng Liang (chumengl@illinois.edu)
|