Sentence-RooseBERT

Sentence-RooseBERT is a Sentence-BERT adaptation of RooseBERT, a domain-specific language model pre-trained on English political debates and parliamentary speeches. It produces fixed-size sentence embeddings suited for semantic similarity, clustering, and retrieval tasks over political text.

⚠️ This model has not yet been formally evaluated. It is released as an experimental variant for the community to explore.

📄 Paper: RooseBERT: A New Deal For Political Language Modelling
💻 GitHub: https://github.com/deborahdore/RooseBERT

Training Data

Sentence-RooseBERT was pre-trained on 11GB of English political debate transcripts (1919–2025), including debates from Africa, Australia, Canada, Europe, Ireland, New Zealand, Scotland, the United Kingdom, the United States, the UN General Assembly, and the UN Security Council. See the base RooseBERT model cards for full details.

Intended Use

This model is intended for sentence-level tasks over political text, such as:

Semantic textual similarity between debate passages or speeches
Semantic search and retrieval over political corpora
Clustering of political arguments or speeches by topic
Classification via embedding similarity (e.g., zero-shot or few-shot)

How to Use

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("ddore14/Sentence-RooseBERT")

sentences = [
    "We must invest in renewable energy to combat climate change.",
    "The government's climate policy is failing future generations."
]

embeddings = model.encode(sentences)
print(embeddings.shape)  # (2, 768)

Limitations

This model has not been formally evaluated on any downstream benchmark. Performance on political NLP tasks is unknown.
The model inherits any biases present in official political speech corpora, including geopolitical and linguistic over-representation.
Not suitable for generative tasks or token-level labelling.

Related Models

Model	Training	Casing	HuggingFace ID
RooseBERT-cont-cased	Continued pre-training	Cased	`ddore14/RooseBERT-cont-cased`
RooseBERT-cont-uncased	Continued pre-training	Uncased	`ddore14/RooseBERT-cont-uncased`
RooseBERT-scr-cased	From scratch	Cased	`ddore14/RooseBERT-scr-cased`
RooseBERT-scr-uncased	From scratch	Uncased	`ddore14/RooseBERT-scr-uncased`

Citation

If you use RooseBERT in your research, please cite:

@article{dore2025roosebert,
  title={RooseBERT: A New Deal For Political Language Modelling},
  author={Dore, Deborah and Cabrio, Elena and Villata, Serena},
  journal={arXiv preprint arXiv:2508.03250},
  year={2025}
}

Downloads last month: 18

Safetensors

Model size

0.1B params

Tensor type

F16

Collection including ddore14/Sentence-RooseBERT

RooseBERT

Collection

6 items • Updated about 4 hours ago

Paper for ddore14/Sentence-RooseBERT

RooseBERT: A New Deal For Political Language Modelling

Paper • 2508.03250 • Published Aug 5, 2025 • 1