Sentence-RooseBERT

Sentence-RooseBERT is a Sentence-BERT adaptation of RooseBERT, a domain-specific language model pre-trained on English political debates and parliamentary speeches. It produces fixed-size sentence embeddings suited for semantic similarity, clustering, and retrieval tasks over political text.

⚠️ This model has not yet been formally evaluated. It is released as an experimental variant for the community to explore.

📄 Paper: RooseBERT: A New Deal For Political Language Modelling
💻 GitHub: https://github.com/deborahdore/RooseBERT


Training Data

Sentence-RooseBERT was pre-trained on 11GB of English political debate transcripts (1919–2025), including debates from Africa, Australia, Canada, Europe, Ireland, New Zealand, Scotland, the United Kingdom, the United States, the UN General Assembly, and the UN Security Council. See the base RooseBERT model cards for full details.


Intended Use

This model is intended for sentence-level tasks over political text, such as:

  • Semantic textual similarity between debate passages or speeches
  • Semantic search and retrieval over political corpora
  • Clustering of political arguments or speeches by topic
  • Classification via embedding similarity (e.g., zero-shot or few-shot)

How to Use

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("ddore14/Sentence-RooseBERT")

sentences = [
    "We must invest in renewable energy to combat climate change.",
    "The government's climate policy is failing future generations."
]

embeddings = model.encode(sentences)
print(embeddings.shape)  # (2, 768)

Limitations

  • This model has not been formally evaluated on any downstream benchmark. Performance on political NLP tasks is unknown.
  • The model inherits any biases present in official political speech corpora, including geopolitical and linguistic over-representation.
  • Not suitable for generative tasks or token-level labelling.

Related Models

Model Training Casing HuggingFace ID
RooseBERT-cont-cased Continued pre-training Cased ddore14/RooseBERT-cont-cased
RooseBERT-cont-uncased Continued pre-training Uncased ddore14/RooseBERT-cont-uncased
RooseBERT-scr-cased From scratch Cased ddore14/RooseBERT-scr-cased
RooseBERT-scr-uncased From scratch Uncased ddore14/RooseBERT-scr-uncased

Citation

If you use RooseBERT in your research, please cite:

@article{dore2025roosebert,
  title={RooseBERT: A New Deal For Political Language Modelling},
  author={Dore, Deborah and Cabrio, Elena and Villata, Serena},
  journal={arXiv preprint arXiv:2508.03250},
  year={2025}
}
Downloads last month
18
Safetensors
Model size
0.1B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including ddore14/Sentence-RooseBERT

Paper for ddore14/Sentence-RooseBERT