xlm-roberta-large-online-counseling-oncoco
Fine-tuned XLM-RoBERTa large model for fine-grained message classification in psychosocial online counseling conversations. Trained on the OnCoCo 1.0 dataset.
Try it out: OnCoCo Message Classifier Space
Model Description
This model classifies individual messages from online counseling conversations into one of 66 fine-grained categories — 38 counselor and 28 client categories — covering communication acts such as empathic reflection, problem exploration, motivational interviewing techniques, resource activation, and emotional support.
Messages are prefixed with the speaker role (Counselor: / Client: in English, Berater: / Klient: in German) to allow the model to resolve the role context. At inference time, logits for the other speaker's categories are masked so predictions always fall within the correct role-specific category set.
The model was developed as part of the OnCoCo project at Technische Hochschule Nürnberg.
The best model we trained on this dataset is this one.
Evaluation Results
Evaluated on a held-out 20% test split of the OnCoCo 1.0 dataset (bilingual DE+EN):
| Metric | Score |
|---|---|
| Top-1 Accuracy | 0.79 |
| Top-1 Macro F1 | 0.72 |
| Top-2 Accuracy | 0.88 |
| Top-2 Macro F1 | 0.83 |
Training Details
- Base model:
FacebookAI/xlm-roberta-large - Dataset: th-nuernberg/OnCoCoV1 — 5,556 messages (2,778 DE + 2,778 EN translations), 66 categories
- Split: 80/20 stratified train/test
- Languages: German (original) and English (GPT-4o translated, manually verified)
- Role prefixes: Messages are prefixed with
Counselor:/Client:(EN) orBerater:/Klient:(DE)
Usage
import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModelForSequenceClassification
model_id = "th-nuernberg/xlm-roberta-large-online-counseling-oncoco"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)
model.eval()
text = "Counselor: It sounds like you're feeling overwhelmed. Can you tell me more?"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
with torch.no_grad():
probs = F.softmax(model(**inputs).logits, dim=-1).squeeze()
top3 = probs.argsort(descending=True)[:3]
for i in top3:
print(f"{model.config.id2label[i.item()]}: {probs[i].item():.4f}")
To resolve category codes to human-readable descriptions:
import json
from huggingface_hub import hf_hub_download
path = hf_hub_download("th-nuernberg/OnCoCoV1", "code_to_category.json", repo_type="dataset")
with open(path) as f:
code2cat = json.load(f)
for i in top3:
code = model.config.id2label[i.item()]
print(f"{code} — {code2cat.get(code, '?')}: {probs[i].item():.4f}")
Category Taxonomy
The 66 categories are organized hierarchically for both speaker roles:
Counselor (38 categories)
- Formalities (opening, closing)
- Moderation
- Impact factors: analysis & clarification of problems (13), objectives (2), motivation (4), resource activation (5), problem solving (8)
- Other statements
Client (28 categories)
- Formalities (opening, closing)
- Empathy expression (3)
- Impact factors: problem analysis (8), objectives (2), motivation (2), resource activation (2), coping assistance (6)
- Other statements
Full label descriptions are available via the code_to_category.json file in the dataset repository.
Intended Use
- Automated content analysis of online counseling conversations
- Research on counselor–client communication patterns
- Educational feedback tools for counselor training
- Conversational AI research in the mental health domain
Limitations
- Performance varies across categories; rare categories with few training examples show lower F1 scores
- Some semantically overlapping categories (e.g., problem statement vs. problem definition) are harder to distinguish
- English texts are machine-translated from German; some translation artifacts may affect performance on native English counseling texts
Citation
If you use this model, please cite the OnCoCo paper:
@inproceedings{albrecht-etal-2026-oncoco,
title = "{O}n{C}o{C}o 1.0: A Public Dataset for Fine-Grained Message Classification in Online Counseling Conversations",
author = "Albrecht, Jens and Lehmann, Robert and Poltermann, Aleksandra and Rudolph, Eric and Steigerwald, Philipp and Stieler, Mara",
booktitle = "Proceedings of the Joint Workshop on Social Context (SoCon) and Integrating NLP and Psychology to Study Social Interactions (NLPSI) at LREC-COLING 2026",
month = may,
year = "2026",
address = "Palma de Mallorca, Spain",
publisher = "ELRA and ICCL",
}
ArXiv preprint: arXiv:2512.09804
License
CC BY-SA 4.0 — Technische Hochschule Nürnberg
- Downloads last month
- 35
Model tree for th-nuernberg/xlm-roberta-large-online-counseling-oncoco
Base model
FacebookAI/xlm-roberta-largeDataset used to train th-nuernberg/xlm-roberta-large-online-counseling-oncoco
Paper for th-nuernberg/xlm-roberta-large-online-counseling-oncoco
Evaluation results
- Top-1 Accuracy on OnCoCoV1self-reported0.790
- Top-1 Macro F1 on OnCoCoV1self-reported0.720
- Top-2 Accuracy on OnCoCoV1self-reported0.880
- Top-2 Macro F1 on OnCoCoV1self-reported0.830