YatNMN-Softplus + scalar_bias + constant α=1 d=12 Chinchilla (261M) — PyTorch

A 261M-parameter nanochat-architecture GPT with YatNMN-Softplus MLP where both α is fixed at 1 and bias is a single shared scalar (1,). This is the minimal YatNMN ablation — the pure (x·W+b)²/(||x−W||²+ε) formula without learnable per-layer scaling or per-neuron bias.

Result: the bare YatNMN formula alone barely beats GELU (−0.03 nats on C4). The 0.14-nat advantage of full YatNMN-Softplus comes from the synergy between per-neuron bias and learnable α, not from the formula itself.

Ablation table (d=12, 261M, Chinchilla 20×, 3-seed mean)

Variant C4 smooth wikitext PPL vs GELU
YatNMN per-neuron + learnable α 2.98 40.15 −0.14
YatNMN scalar_bias + learnable α 3.06 39.53 −0.06
YatNMN per-neuron + constant α=1 3.10 67.09 −0.02
YatNMN sb + constant α=1 (this) 3.09 78.34 −0.03
GELU 3.12 46.52 baseline

Quick start

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "mlnomad/yatnmn-softplus-sb-ca-d12-chinchilla-261M-pytorch",
    trust_remote_code=True, dtype=torch.float32,
).eval()
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")

ids = tokenizer("The meaning of life is", return_tensors="pt").input_ids
with torch.no_grad():
    out = model.generate(ids, max_new_tokens=50, do_sample=False,
                         use_cache=True, pad_token_id=tokenizer.eos_token_id or 0)
print(tokenizer.decode(out[0], skip_special_tokens=True))

Config

Scalar (1,) bias + softplus_bias + learnable_epsilon + constant_alpha=True (α=1 fixed).

Parameters 261,096,362
Final smooth loss 3.08 (3-seed mean 3.09 ± 0.01)
Wikitext-103 PPL 78.34
Training data allenai/c4, 5.22 B tokens (Chinchilla 20×)
Hardware TPU v6e-8, europe-west4-a

Related

License

Apache 2.0.

Downloads last month
936
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train mlnomad/yatnmn-softplus-sb-ca-d12-chinchilla-261M-pytorch

Space using mlnomad/yatnmn-softplus-sb-ca-d12-chinchilla-261M-pytorch 1