babylm2026-ipa
LLaMA-style causal language model trained for BabyLM 2026 with IPA phonological feature augmentation.
Architecture
- Base: LLaMA-style transformer (RMSNorm, RoPE, SwiGLU FFN, weight-tied LM head)
- IPA variant:
ipa_full— gated fusion of panphon IPA feature vectors at the embedding layer - Hidden size: 512 | Layers: 6 | Heads: 8
- Vocab size: 32000 | Context: 512 tokens
- IPA dim: 24 (panphon binary articulatory features)
- Training split: 10M
IPA Fusion Variants
| Variant | Description |
|---|---|
baseline |
Token embeddings only |
ipa_add |
embed + W·ipa |
ipa_gate |
embed + σ(W_g·ipa) ⊙ (W·ipa) |
ipa_full |
Gated fusion + MSE auxiliary reconstruction loss |
Usage
from transformers import AutoConfig, AutoModelForCausalLM
import torch
config = AutoConfig.from_pretrained("pakphum/babylm2026-ipa", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("pakphum/babylm2026-ipa", trust_remote_code=True)
model.eval()
# input_ids from your tokenizer
input_ids = torch.tensor([[1, 2, 3, 4]])
# Optional: supply IPA vectors [B, T, ipa_dim=24]. Zeros = no phonological signal.
ipa_vectors = torch.zeros(1, 4, config.ipa_dim)
with torch.no_grad():
out = model(input_ids, ipa_vectors=ipa_vectors)
# out.logits: [B, T, vocab_size]
Citation
BabyLM 2026 shared task.
- Downloads last month
- 98
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support