--- language: en tags: - mask-predict - diffusion - masked-lm library_name: transformers base_model: answerdotai/ModernBERT-base pipeline_tag: fill-mask --- # modernbert-diffusion-code ## Model Summary A diffusion-style masked language model fine-tuned in `code` mode using a discrete denoising objective. ## Model Details - **Model ID:** philipp-zettl/modernbert-diffusion-code - **Base model:** answerdotai/ModernBERT-base - **Training mode:** code - **Task type:** Masked token denoising / diffusion-style infilling ## Intended Use Intended for code completion, infilling, and refactoring tasks on Python-like code. **Example** ```python from refinebert.diffusion_engine import MaskedDiffusionEngine engine = MaskedDiffusionEngine("philipp-zettl/modernbert-diffusion-code") prompt = "def fibonacci(n):" output = engine.generate(prompt, num_new_tokens=20, steps=12, guidance_scale=3.0) print(output) ``` ## Training Data Datasets are streamed from Hugging Face and mixed by mode. ### Dataset Mix | Dataset | Percentage | Purpose | | --- | --- | --- | | bigcode/the-stack-dedup (python) | 100% | Python code | Fallback: The Stack may fall back to CodeParrot depending on availability. ## Training Procedure - **Steps:** 150000 - **Batch size:** 4 - **Sequence length:** 256 - **Learning rate:** 5e-05 - **CFG dropout probability:** 0.1 - **Samples loaded into RAM:** 100000 ## Training Time & Hardware - **Duration:** 7h 50m 28s - **Hardware:** NVIDIA GeForce RTX 2060 x1 (CUDA available) ## Metrics (Training) | Metric | Value | | --- | --- | | Training loss (latest) | 3.2864 | | Training loss (mean) | 3.1062 | | Training step | 150000 / 150000 | ## Limitations & Considerations - The model is trained with a masked-token diffusion objective and may not behave like an autoregressive LM. - Data sources may have licensing or content constraints—review source dataset cards before deployment. - Performance can vary substantially by mode (code) and prompt structure.