LAMB / README.md
aimgo's picture
Update README.md
f8cba78 verified
metadata
license: cc-by-nc-4.0
language:
  - la
base_model:
  - answerdotai/ModernBERT-base
pipeline_tag: fill-mask
tags:
  - Latin

LAMB (LAtin ModernBERT) is a Latin encoder-only model based on the ModernBERT architecture, pre-trained on nearly 24B Latin tokens, and ready for use with any Latin orthography.

Features

Usage

Predicting Masked Tokens

from transformers import AutoTokenizer, AutoModelForMaskedLM

model_id = "aimgo/LAMB"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForMaskedLM.from_pretrained(model_id)

text = "et ecce tu eras [MASK] me et ego foris, et ibi te quaerebam"

inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)

masked_index = inputs["input_ids"][0].tolist().index(tokenizer.mask_token_id)
predicted_token_id = outputs.logits[0, masked_index].argmax(axis=-1)
predicted_token = tokenizer.decode(predicted_token_id)

print("Input:", text)
print("Predicted:", predicted_token)

If you use this in your work, please cite:

@misc{mccarthy2025LAMB,
  author       = {McCarthy, A. M.},
  title        = {{LAMB}: A Modern Masked Language Model for Latin},
  year         = {2025},
  howpublished = {\url{https://huggingface.co/aimgo/LAMB}},
  note         = {Model}
}