ToMMeR-Llama-3.2-1B_L14_R64
ToMMeR is a lightweight probing model extracting emergent mention detection capabilities from early layers representations of any LLM backbone, achieving high Zero Shot recall across a wide set of 13 NER benchmarks.
Model Details
This model can be plugged at layer 14 of meta-llama/Llama-3.2-1B, with a computational overhead not greater than an additional attention head.
| Property | Value |
|---|---|
| Base LLM | meta-llama/Llama-3.2-1B |
| Layer | 14 |
| #Params | 264.2K |
Usage
Installation
To use ToMMeR, you need to install its codebase first.
pip install git+https://github.com/VictorMorand/llm2ner.git
Raw inference
By default, ToMMeR outputs span probabilities, but we also propose built-in options for decoding entities.
- Inputs:
- tokens (batch, seq): tokens to process,
- model: LLM to extract representation from.
- Outputs: (batch, seq, seq) matrix (masked outside valid spans)
from xpm_torch.huggingface import TorchHFHub
from llm2ner import ToMMeR, utils
tommer: ToMMeR = TorchHFHub.from_pretrained("llm2ner/ToMMeR-Llama-3.2-1B_L14_R64")
# load Backbone llm, optionnally cut the unused layer to save GPU space.
llm = utils.load_llm( tommer.llm_name, cut_to_layer=tommer.layer,)
tommer.to(llm.device)
#### Raw Inference
text = ["Large language models are awesome"]
print(f"Input text: {text[0]}")
#tokenize in shape (1, seq_len)
tokens = llm.tokenizer(text, return_tensors="pt")["input_ids"].to(llm.device)
# Output raw scores
output = tommer.forward(tokens, llm) # (batch_size, seq_len, seq_len)
print(f"Raw Output shape: {output.shape}")
#use given decoding strategy to infer entities
entities = tommer.infer_entities(tokens=tokens, model=llm, threshold=0.5, decoding_strategy="greedy")
str_entities = [ llm.tokenizer.decode(tokens[0,b:e+1]) for b, e in entities[0]]
print(f"Predicted entities: {str_entities}")
>>>INFO:root:Cut LlamaModel with 16 layers to 7 layers
>>> Input text: Large language models are awesome
>>> Raw Output shape: torch.Size([1, 6, 6])
>>> Predicted entities: ['Large language models']
Fancy Outputs
We also provide inference and plotting utils in llm2ner.plotting.
from xpm_torch.huggingface import TorchHFHub
from llm2ner import ToMMeR, utils, plotting
tommer: ToMMeR = TorchHFHub.from_pretrained("llm2ner/ToMMeR-Llama-3.2-1B_L14_R64")
# load Backbone llm, optionnally cut the unused layer to save GPU space.
llm = utils.load_llm( tommer.llm_name, cut_to_layer=tommer.layer,)
tommer.to(llm.device)
text = "Large language models are awesome. While trained on language modeling, they exhibit emergent Zero Shot abilities that make them suitable for a wide range of tasks, including Named Entity Recognition (NER). "
#fancy interactive output
outputs = plotting.demo_inference( text, tommer, llm,
decoding_strategy="threshold", # or "greedy" for flat segmentation
threshold=0.5, # default 50%
show_attn=True,
)
Please visit the repository for more details and a demo notebook.
Evaluation Results
| dataset | precision | recall | f1 | n_samples |
|---|---|---|---|---|
| MultiNERD | 0.1879 | 0.9876 | 0.3158 | 154144 |
| CoNLL 2003 | 0.2837 | 0.9404 | 0.4359 | 16493 |
| CrossNER_politics | 0.2613 | 0.9776 | 0.4124 | 1389 |
| CrossNER_AI | 0.272 | 0.9741 | 0.4252 | 879 |
| CrossNER_literature | 0.3043 | 0.9561 | 0.4616 | 916 |
| CrossNER_science | 0.2908 | 0.9622 | 0.4466 | 1193 |
| CrossNER_music | 0.343 | 0.9689 | 0.5066 | 945 |
| ncbi | 0.095 | 0.9481 | 0.1727 | 3952 |
| FabNER | 0.2533 | 0.7586 | 0.3798 | 13681 |
| WikiNeural | 0.1818 | 0.9837 | 0.3068 | 92672 |
| GENIA_NER | 0.1499 | 0.9768 | 0.26 | 16563 |
| ACE 2005 | 0.2535 | 0.3985 | 0.3099 | 8230 |
| Ontonotes | 0.2238 | 0.7312 | 0.3427 | 42193 |
| Aggregated | 0.1979 | 0.9299 | 0.3263 | 353250 |
| Mean | 0.2385 | 0.8895 | 0.3674 | 353250 |
Citation
If using this model or the approach, please cite the associated paper:
@misc{morand2025tommerefficiententity,
title={ToMMeR -- Efficient Entity Mention Detection from Large Language Models},
author={Victor Morand and Nadi Tomeh and Josiane Mothe and Benjamin Piwowarski},
year={2025},
eprint={2510.19410},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2510.19410},
}
License
Apache-2.0 (see repository for full text).
Model tree for llm2ner/ToMMeR-Llama-3.2-1B_L14_R64
Base model
meta-llama/Llama-3.2-1B