babylm2026-ipa

LLaMA-style causal language model trained for BabyLM 2026 with IPA phonological feature augmentation.

Architecture

  • Base: LLaMA-style transformer (RMSNorm, RoPE, SwiGLU FFN, weight-tied LM head)
  • IPA variant: ipa_full — gated fusion of panphon IPA feature vectors at the embedding layer
  • Hidden size: 512 | Layers: 6 | Heads: 8
  • Vocab size: 32000 | Context: 512 tokens
  • IPA dim: 24 (panphon binary articulatory features)
  • Training split: 10M

IPA Fusion Variants

Variant Description
baseline Token embeddings only
ipa_add embed + W·ipa
ipa_gate embed + σ(W_g·ipa) ⊙ (W·ipa)
ipa_full Gated fusion + MSE auxiliary reconstruction loss

Usage

from transformers import AutoConfig, AutoModelForCausalLM
import torch

config = AutoConfig.from_pretrained("pakphum/babylm2026-ipa", trust_remote_code=True)
model  = AutoModelForCausalLM.from_pretrained("pakphum/babylm2026-ipa", trust_remote_code=True)
model.eval()

# input_ids from your tokenizer
input_ids = torch.tensor([[1, 2, 3, 4]])

# Optional: supply IPA vectors [B, T, ipa_dim=24]. Zeros = no phonological signal.
ipa_vectors = torch.zeros(1, 4, config.ipa_dim)

with torch.no_grad():
    out = model(input_ids, ipa_vectors=ipa_vectors)
    # out.logits: [B, T, vocab_size]

Citation

BabyLM 2026 shared task.

Downloads last month
98
Safetensors
Model size
35.4M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support