Qwen3-4B-Instruct-2507-uncensored-v2-EAFT

(Probably not recommended for actual use, but I haven't extensively tested it)

Minimally trained version of Qwen3-4B-Instruct-2507. It should have zero refusals, but shouldn't be too offensive by default. It will adhere to detailed prompts tho.

Trained with Unsloth using my fork of trl with Entropy Adaptive Fine Tuning added:

pip install --no-deps git+https://github.com/electroglyph/trl.git@24EAFT

(The implementation is 100% done by the paper's author, I just stuck it in trl)

Perplexity and KL divergence compared to parent model:

These stats based on wikitext train split, about 350GB of logits.

(TLDR: lower perplexity, and a KLD around 12x better than an abliterated model)

====== Perplexity statistics ======
Mean PPL(Q)                   :  10.164880 ±   0.026138
Mean PPL(base)                :  10.984474 ±   0.030165
Cor(ln(PPL(Q)), ln(PPL(base))):  99.35%
Mean ln(PPL(Q)/PPL(base))     :  -0.077544 ±   0.000350
Mean PPL(Q)/PPL(base)         :   0.925386 ±   0.000324
Mean PPL(Q)-PPL(base)         :  -0.819594 ±   0.005149

====== KL divergence statistics ======
Mean    KLD:   0.034324 ±   0.000032
Maximum KLD:   4.381997
99.9%   KLD:   0.275690
99.0%   KLD:   0.149398
95.0%   KLD:   0.095380
90.0%   KLD:   0.075580
Median  KLD:   0.027208
10.0%   KLD:   0.000553
 5.0%   KLD:   0.000077
 1.0%   KLD:   0.000003
 0.1%   KLD:  -0.000000
Minimum KLD:  -0.000014

====== Token probability statistics ======
Mean    Δp: -1.770 ± 0.004 %
Maximum Δp: 74.026%
99.9%   Δp: 19.553%
99.0%   Δp:  9.223%
95.0%   Δp:  3.394%
90.0%   Δp:  1.485%
75.0%   Δp:  0.084%
Median  Δp: -0.118%
25.0%   Δp: -3.134%
10.0%   Δp: -8.037%
 5.0%   Δp: -11.159%
 1.0%   Δp: -17.508%
 0.1%   Δp: -26.736%
Minimum Δp: -96.977%
RMS Δp    :  5.061 ± 0.007 %
Same top p: 92.762 ± 0.023 %

training params:

rank 16 / alpha 16

EPOCHS = 2

args = SFTConfig(
        per_device_train_batch_size = 5,
        gradient_accumulation_steps = 1,
        warmup_steps = 20,
        num_train_epochs = EPOCHS,
        learning_rate = 6e-6,
        optim = "adamw_torch_fused",
        weight_decay = 0.01,
        lr_scheduler_type = "cosine_with_restarts", # shuffled each epoch
        lr_scheduler_kwargs={"num_cycles": EPOCHS},
        seed = 888,
        loss_type = "eaft",
        eaft_alpha = 1.0,
    ),

loss / grad:

a little over 5k rows in the dataset (no you can't have it, sorry. it's vile)

Downloads last month: 8

Safetensors

Model size

4B params

Tensor type

BF16