Spaces:

diffutron
/

README

Running

App Files Files Community

README / README.md

suayptalha

Update README.md

efe60bb verified 23 days ago

preview code

raw

history blame contribute delete

2.88 kB

	---
	title: README
	emoji: 🌍
	colorFrom: yellow
	colorTo: gray
	sdk: static
	pinned: false
	---
	<div align="center">
	<img src="https://cdn-uploads.huggingface.co/production/uploads/63da3d7ae697e5898cb86854/4EnLA20pUWnvqppA5y5Q4.gif" alt="denoising_small_16_9" />
	<h1>Diffutron: A Masked Diffusion Language Model for Turkish Language</h1>
	</div>

	<p align="center">
	&nbsp&nbsp \| 🤗 <a href="https://huggingface.co/collections/diffutron/diffutronlm">Models</a>&nbsp&nbsp \|
	&nbsp&nbsp 📊 <a href="https://huggingface.co/datasets/diffutron/DiffutronLM-Pretraining-Corpus">Pre-training Dataset</a>&nbsp&nbsp \|
	&nbsp&nbsp 📄 <a href="https://arxiv.org/abs/2603.20466">Paper</a>&nbsp&nbsp \|
	</p>

	## Overview

	Diffutron is a lightweight, non-autoregressive Masked Diffusion Language Model (MDLM) specifically optimized for the Turkish language. By utilizing a discrete diffusion process, Diffutron generates text through iterative refinement, allowing for bi-directional context awareness and high parameter efficiency.

	## Core Features

	* Architecture: Discrete Masked Diffusion (MDLM) using a 307M parameter encoder backbone.
	* Efficiency: Achieves competitive performance against 2B+ parameter autoregressive models on Turkish benchmarks.
	* Adaptation: LoRA-based (r=256) continual pre-training on a 2M sequence Turkish corpus.
	* Instruction Tuning: Progressive strategy using LlamaTurk and InstrucTurca datasets for enhanced command following.

	## Benchmarks

	Diffutron achieves a significant reduction in perplexity and competitive scores across the CETVEL benchmark suite:

	\| Benchmark \| Diffutron-1st-Stage (0.3B) \| Diffutron-2nd-Stage (0.3B) \| TURNA (1.1B) \| Kumru (2B) \| Kanarya (2B) \| Llama-3.2 (3B) \| Trendyol (7B) \| Aya-101 (13B) \|
	\| :--- \| :---: \| :---: \| :---: \| :---: \| :---: \| :---: \| :---: \| :---: \|
	\| Belebele_TR \| 22.22 \| 27.00 \| 22.56 \| 29.00 \| 28.11 \| 55.78 \| 36.22 \| 22.89 \|
	\| EXAMS_TR \| 25.95 \| 27.74 \| 23.66 \| 30.03 \| 30.03 \| 26.21 \| 28.50 \| 22.90 \|
	\| IronyTR \| 50.67 \| 52.00 \| 48.33 \| 51.00 \| 50.00 \| 50.17 \| 50.00 \| 52.17 \|
	\| News_Cat \| 23.20 \| 32.40 \| 32.80 \| 26.40 \| 66.80 \| 64.00 \| 81.20 \| 20.00 \|
	\| MNLI_TR \| 33.29 \| 32.81 \| 34.94 \| 36.42 \| 33.40 \| 34.76 \| 35.19 \| 27.90 \|
	\| STS_TR \| 17.77 \| 18.78 \| 14.21 \| 11.75 \| 12.91 \| 12.91 \| 15.52 \| 16.97 \|
	\| XCOPA_TR \| 53.80 \| 52.00 \| 55.80 \| 54.00 \| 64.20 \| 54.60 \| 61.00 \| 59.60 \|
	\| Average \| 32.41 \| 34.68 \| 33.19 \| 34.09 \| 40.78 \| 42.63 \| 43.95 \| 31.78 \|


	## Citation

	```bibtex
	@misc{diffutron2026,
	title={Diffutron: A Masked Diffusion Language Model for Turkish Language},
	author={Şuayp Talha Kocabay and Talha Rüzgar Akkuş},
	year={2026},
	eprint={2603.20466},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2603.20466},
	}
	```