---
language: en
tags:
  - mask-predict
  - diffusion
  - masked-lm
library_name: transformers
base_model: answerdotai/ModernBERT-base
pipeline_tag: fill-mask
---

# modernbert-diffusion-code

## Model Summary
A diffusion-style masked language model fine-tuned in `code` mode using a discrete denoising objective.

## Model Details
- **Model ID:** philipp-zettl/modernbert-diffusion-code
- **Base model:** answerdotai/ModernBERT-base
- **Training mode:** code
- **Task type:** Masked token denoising / diffusion-style infilling

## Intended Use
Intended for code completion, infilling, and refactoring tasks on Python-like code.

**Example**
```python
from refinebert.diffusion_engine import MaskedDiffusionEngine

engine = MaskedDiffusionEngine("philipp-zettl/modernbert-diffusion-code")
prompt = "def fibonacci(n):"
output = engine.generate(prompt, num_new_tokens=20, steps=12, guidance_scale=3.0)
print(output)
```

## Training Data
Datasets are streamed from Hugging Face and mixed by mode.

### Dataset Mix
| Dataset | Percentage | Purpose |
| --- | --- | --- |
| bigcode/the-stack-dedup (python) | 100% | Python code |

Fallback: The Stack may fall back to CodeParrot depending on availability.

## Training Procedure
- **Steps:** 150000
- **Batch size:** 4
- **Sequence length:** 256
- **Learning rate:** 5e-05
- **CFG dropout probability:** 0.1
- **Samples loaded into RAM:** 100000

## Training Time & Hardware
- **Duration:** 7h 50m 28s
- **Hardware:** NVIDIA GeForce RTX 2060 x1 (CUDA available)

## Metrics (Training)
| Metric | Value |
| --- | --- |
| Training loss (latest) | 3.2864 |
| Training loss (mean) | 3.1062 |
| Training step | 150000 / 150000 |

## Limitations & Considerations
- The model is trained with a masked-token diffusion objective and may not behave like an autoregressive LM.
- Data sources may have licensing or content constraints—review source dataset cards before deployment.
- Performance can vary substantially by mode (code) and prompt structure.