Sentence-RooseBERT
Sentence-RooseBERT is a Sentence-BERT adaptation of RooseBERT, a domain-specific language model pre-trained on English political debates and parliamentary speeches. It produces fixed-size sentence embeddings suited for semantic similarity, clustering, and retrieval tasks over political text.
⚠️ This model has not yet been formally evaluated. It is released as an experimental variant for the community to explore.
📄 Paper: RooseBERT: A New Deal For Political Language Modelling
💻 GitHub: https://github.com/deborahdore/RooseBERT
Training Data
Sentence-RooseBERT was pre-trained on 11GB of English political debate transcripts (1919–2025), including debates from Africa, Australia, Canada, Europe, Ireland, New Zealand, Scotland, the United Kingdom, the United States, the UN General Assembly, and the UN Security Council. See the base RooseBERT model cards for full details.
Intended Use
This model is intended for sentence-level tasks over political text, such as:
- Semantic textual similarity between debate passages or speeches
- Semantic search and retrieval over political corpora
- Clustering of political arguments or speeches by topic
- Classification via embedding similarity (e.g., zero-shot or few-shot)
How to Use
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("ddore14/Sentence-RooseBERT")
sentences = [
"We must invest in renewable energy to combat climate change.",
"The government's climate policy is failing future generations."
]
embeddings = model.encode(sentences)
print(embeddings.shape) # (2, 768)
Limitations
- This model has not been formally evaluated on any downstream benchmark. Performance on political NLP tasks is unknown.
- The model inherits any biases present in official political speech corpora, including geopolitical and linguistic over-representation.
- Not suitable for generative tasks or token-level labelling.
Related Models
| Model | Training | Casing | HuggingFace ID |
|---|---|---|---|
| RooseBERT-cont-cased | Continued pre-training | Cased | ddore14/RooseBERT-cont-cased |
| RooseBERT-cont-uncased | Continued pre-training | Uncased | ddore14/RooseBERT-cont-uncased |
| RooseBERT-scr-cased | From scratch | Cased | ddore14/RooseBERT-scr-cased |
| RooseBERT-scr-uncased | From scratch | Uncased | ddore14/RooseBERT-scr-uncased |
Citation
If you use RooseBERT in your research, please cite:
@article{dore2025roosebert,
title={RooseBERT: A New Deal For Political Language Modelling},
author={Dore, Deborah and Cabrio, Elena and Villata, Serena},
journal={arXiv preprint arXiv:2508.03250},
year={2025}
}
- Downloads last month
- 18