Configuration Parsing Warning:In adapter_config.json: "peft.task_type" must be a string

LLaDA-8B-Instruct-s1k-sft

LLaDA-8B-Instruct-s1k-sft is a diffusion-based instruct model post-trained from LLaDA-8B-Instruct on simplescaling/s1K, using MDLM (masked diffusion) and trained with the dLLM framework.

Model Overview

LLaDA-8B-Instruct-s1k-sft has the following features:

For broader training and ablation reporting in the dLLM ecosystem, see the dLLM paper.

Eval notes: Metrics use confidence-threshold decoding (alg: confidence_threshold). The primary table is at confidence_threshold = 0.9; full grids sweep confidence_threshold ∈ {0.6, 0.7, 0.8, 0.9} with max_new_tokens ∈ {256, 512}. TPS not recorded for this checkpoint.


Primary results

Benchmark max_new_tokens=256
Acc %
max_new_tokens=512
Acc %
GSM8K 82.03 82.79
HumanEval 39.63 47.56
MBPP 34.40 23.00
MATH N/A 41.24

Threshold sweep

Benchmark Ï„=0.6
Acc %
Ï„=0.7
Acc %
Ï„=0.8
Acc %
Ï„=0.9
Acc %
GSM8K 73.46 78.62 80.82 82.03
HumanEval 24.39 31.71 39.02 39.63
MBPP 30.20 32.20 33.80 34.40
MATH N/A N/A N/A N/A

max_new_tokens=256, columns sweep confidence_threshold ∈ {0.6, 0.7, 0.8, 0.9}

Benchmark Ï„=0.6
Acc %
Ï„=0.7
Acc %
Ï„=0.8
Acc %
Ï„=0.9
Acc %
GSM8K 75.28 78.92 81.58 82.79
HumanEval 33.54 38.41 45.12 47.56
MBPP 24.40 24.80 23.60 23.00
MATH 34.28 38.34 40.50 41.24

max_new_tokens=512, columns sweep confidence_threshold ∈ {0.6, 0.7, 0.8, 0.9}

Downloads last month
33
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for OnAnOrange/LLaDA-8B-Instruct-s1k-sft

Adapter
(30)
this model

Papers for OnAnOrange/LLaDA-8B-Instruct-s1k-sft