Summary Table

Model	MMLU-Pro mean	MMLU-Pro std	GPQA mean	GPQA std
`qwen/qwen3-4b-2507`	`44.70%`	`0.50 pp`	`37.04%`	`0.58 pp`
`Qwen3-4B-2507-Heretic-SDFT-v1`	`44.50%`	`0.35 pp`	`37.21%`	`0.29 pp`
`Qwen3-4B-Instruct-2507-heretic-v1.2`	`44.60%`	`0.53 pp`	`38.72%`	`2.10 pp`

Interpretation

MMLU-Pro

The three models are effectively tied on the MMLU-Pro sample.

official: 44.70%
SDFT: 44.50%
heretic: 44.60%

The between-model differences are smaller than one percentage point and are on the same scale as run-to-run variation. On this sample, there is no evidence of a meaningful capability gap in general knowledge/reasoning between the three models.

GPQA Diamond

GPQA Diamond separates the models more clearly.

official: 37.04%
SDFT: 37.21%
heretic: 38.72%

The heretic model is highest on mean GPQA accuracy, but it also has the largest variance across runs. The SDFT model and the official quant are nearly indistinguishable on mean GPQA score.

Downloads last month: 3

GGUF

Model size

4B params

Architecture

qwen3

Hardware compatibility

We're not able to determine the quantization variants.

View all variants

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Ilya626/Qwen3-4B-2507-Heretic-SDFT-v1

Base model

Qwen/Qwen3-4B-Instruct-2507

Quantized

(229)

this model

Official Qwen qwen/qwen3-4b-2507

SDFT model Qwen3-4B-2507-Heretic-SDFT-v1

Heretic Qwen3-4B-Instruct-2507-heretic-v1.2

Summary Table

Interpretation

MMLU-Pro

GPQA Diamond

Model tree for Ilya626/Qwen3-4B-2507-Heretic-SDFT-v1

Official Qwen `qwen/qwen3-4b-2507`

SDFT model `Qwen3-4B-2507-Heretic-SDFT-v1`

Heretic `Qwen3-4B-Instruct-2507-heretic-v1.2`