--- datasets: - Skylion007/openwebtext language: - en library_name: transformers license: apache-2.0 metrics: - perplexity pipeline_tag: text-generation --- # LangFlow LangFlow is a continuous diffusion language model that operates in embedding space. Unlike discrete diffusion models (MDLM, SEDD, DUO), LangFlow performs diffusion directly on continuous token embeddings, enabling smoother denoising dynamics. ## Using LangFlow To use the pre-trained model for text generation, use the following snippet: ```python from transformers import AutoModelForMaskedLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained('gpt2') model = AutoModelForMaskedLM.from_pretrained('chumengl/langflow-owt', trust_remote_code=True) # Generate samples samples = model.generate_samples(num_samples=5, num_steps=128) texts = tokenizer.batch_decode(samples, skip_special_tokens=True) for text in texts: print(text) ``` ## Model Details - **Architecture**: DiT (Diffusion Transformer) backbone with adaptive layer normalization - **Context Length**: 1024 tokens - **Parameters**: ~130M non-embedding parameters (similar to GPT-2 medium) - **Training**: 1M steps on OpenWebText corpus - **Tokenizer**: GPT-2 tokenizer (50,257 vocab size) ## Model Card Contact Chumeng Liang (chumengl@illinois.edu)