--- license: cc-by-nc-4.0 language: - la base_model: - answerdotai/ModernBERT-base pipeline_tag: fill-mask tags: - Latin --- LAMB (**LA**tin **M**odern**B**ERT) is a Latin encoder-only model based on the ModernBERT architecture, pre-trained on nearly **24B Latin tokens**, and ready for use with any Latin orthography. ### Features ### Usage #### Predicting Masked Tokens ```python from transformers import AutoTokenizer, AutoModelForMaskedLM model_id = "aimgo/LAMB" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForMaskedLM.from_pretrained(model_id) text = "et ecce tu eras [MASK] me et ego foris, et ibi te quaerebam" inputs = tokenizer(text, return_tensors="pt") outputs = model(**inputs) masked_index = inputs["input_ids"][0].tolist().index(tokenizer.mask_token_id) predicted_token_id = outputs.logits[0, masked_index].argmax(axis=-1) predicted_token = tokenizer.decode(predicted_token_id) print("Input:", text) print("Predicted:", predicted_token) ``` If you use this in your work, please cite: ``` @misc{mccarthy2025LAMB, author = {McCarthy, A. M.}, title = {{LAMB}: A Modern Masked Language Model for Latin}, year = {2025}, howpublished = {\url{https://huggingface.co/aimgo/LAMB}}, note = {Model} } ```