xlm-roberta-large-online-counseling-oncoco

Fine-tuned XLM-RoBERTa large model for fine-grained message classification in psychosocial online counseling conversations. Trained on the OnCoCo 1.0 dataset.

Try it out: OnCoCo Message Classifier Space

Model Description

This model classifies individual messages from online counseling conversations into one of 66 fine-grained categories — 38 counselor and 28 client categories — covering communication acts such as empathic reflection, problem exploration, motivational interviewing techniques, resource activation, and emotional support.

Messages are prefixed with the speaker role (Counselor: / Client: in English, Berater: / Klient: in German) to allow the model to resolve the role context. At inference time, logits for the other speaker's categories are masked so predictions always fall within the correct role-specific category set.

The model was developed as part of the OnCoCo project at Technische Hochschule Nürnberg.
The best model we trained on this dataset is this one.

Evaluation Results

Evaluated on a held-out 20% test split of the OnCoCo 1.0 dataset (bilingual DE+EN):

Metric	Score
Top-1 Accuracy	0.79
Top-1 Macro F1	0.72
Top-2 Accuracy	0.88
Top-2 Macro F1	0.83

Training Details

Base model: FacebookAI/xlm-roberta-large
Dataset: th-nuernberg/OnCoCoV1 — 5,556 messages (2,778 DE + 2,778 EN translations), 66 categories
Split: 80/20 stratified train/test
Languages: German (original) and English (GPT-4o translated, manually verified)
Role prefixes: Messages are prefixed with Counselor: / Client: (EN) or Berater: / Klient: (DE)

Usage

import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_id = "th-nuernberg/xlm-roberta-large-online-counseling-oncoco"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)
model.eval()

text = "Counselor: It sounds like you're feeling overwhelmed. Can you tell me more?"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

with torch.no_grad():
    probs = F.softmax(model(**inputs).logits, dim=-1).squeeze()

top3 = probs.argsort(descending=True)[:3]
for i in top3:
    print(f"{model.config.id2label[i.item()]}: {probs[i].item():.4f}")

To resolve category codes to human-readable descriptions:

import json
from huggingface_hub import hf_hub_download

path = hf_hub_download("th-nuernberg/OnCoCoV1", "code_to_category.json", repo_type="dataset")
with open(path) as f:
    code2cat = json.load(f)

for i in top3:
    code = model.config.id2label[i.item()]
    print(f"{code} — {code2cat.get(code, '?')}: {probs[i].item():.4f}")

Category Taxonomy

The 66 categories are organized hierarchically for both speaker roles:

Counselor (38 categories)

Formalities (opening, closing)
Moderation
Impact factors: analysis & clarification of problems (13), objectives (2), motivation (4), resource activation (5), problem solving (8)
Other statements

Client (28 categories)

Formalities (opening, closing)
Empathy expression (3)
Impact factors: problem analysis (8), objectives (2), motivation (2), resource activation (2), coping assistance (6)
Other statements

Full label descriptions are available via the code_to_category.json file in the dataset repository.

Intended Use

Automated content analysis of online counseling conversations
Research on counselor–client communication patterns
Educational feedback tools for counselor training
Conversational AI research in the mental health domain

Limitations

Performance varies across categories; rare categories with few training examples show lower F1 scores
Some semantically overlapping categories (e.g., problem statement vs. problem definition) are harder to distinguish
English texts are machine-translated from German; some translation artifacts may affect performance on native English counseling texts

Citation

If you use this model, please cite the OnCoCo paper:

@inproceedings{albrecht-etal-2026-oncoco,
    title = "{O}n{C}o{C}o 1.0: A Public Dataset for Fine-Grained Message Classification in Online Counseling Conversations",
    author = "Albrecht, Jens and Lehmann, Robert and Poltermann, Aleksandra and Rudolph, Eric and Steigerwald, Philipp and Stieler, Mara",
    booktitle = "Proceedings of the Joint Workshop on Social Context (SoCon) and Integrating NLP and Psychology to Study Social Interactions (NLPSI) at LREC-COLING 2026",
    month = may,
    year = "2026",
    address = "Palma de Mallorca, Spain",
    publisher = "ELRA and ICCL",
}

ArXiv preprint: arXiv:2512.09804

License

CC BY-SA 4.0 — Technische Hochschule Nürnberg

Downloads last month: 35

Safetensors

Model size

0.6B params

Tensor type

F32

Model tree for th-nuernberg/xlm-roberta-large-online-counseling-oncoco

Base model

FacebookAI/xlm-roberta-large

Finetuned

(929)

this model

Dataset used to train th-nuernberg/xlm-roberta-large-online-counseling-oncoco

Paper for th-nuernberg/xlm-roberta-large-online-counseling-oncoco

OnCoCo 1.0: A Public Dataset for Fine-Grained Message Classification in Online Counseling Conversations

Paper • 2512.09804 • Published Dec 10, 2025

Evaluation results

Top-1 Accuracy on OnCoCoV1
self-reported

0.790
Top-1 Macro F1 on OnCoCoV1
self-reported

0.720
Top-2 Accuracy on OnCoCoV1
self-reported

0.880
Top-2 Macro F1 on OnCoCoV1
self-reported

0.830