modernbert-base-online-counseling-oncoco

Fine-tuned answerdotai/ModernBERT-base model for fine-grained message classification in psychosocial online counseling conversations. Trained on the OnCoCo 1.0 dataset.

Try it out: OnCoCo Message Classifier Space

Model Description

This model classifies individual messages from online counseling conversations into one of 66 fine-grained categories — 38 counselor and 28 client categories — covering communication acts such as empathic reflection, problem exploration, motivational interviewing techniques, resource activation, and emotional support.

Messages are prefixed with the speaker role (Counselor: / Client: in English, Berater: / Klient: in German) to allow the model to resolve the role context. At inference time, logits for the other speaker's categories are masked so predictions always fall within the correct role-specific category set.

The model was developed as part of the OnCoCo project at Technische Hochschule Nürnberg.
The best model we trained on this dataset is th-nuernberg/xlm-roberta-large-online-counseling-oncoco.

Evaluation Results

Evaluated on a held-out 20% test split of the OnCoCo 1.0 dataset (bilingual DE+EN):

Metric Score
Top-1 Accuracy 0.68
Top-1 Macro F1 0.57
Top-2 Accuracy 0.78
Top-2 Macro F1 0.69

Training Details

  • Base model: answerdotai/ModernBERT-base
  • Dataset: th-nuernberg/OnCoCoV1 — 5,556 messages (2,778 DE + 2,778 EN translations), 66 categories
  • Split: 80/20 stratified train/test
  • Languages: German (original) and English (GPT-4o translated, manually verified)
  • Role prefixes: Messages are prefixed with Counselor: / Client: (EN) or Berater: / Klient: (DE)

Usage

import torch
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModelForSequenceClassification

model_id = "th-nuernberg/modernbert-base-online-counseling-oncoco"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)
model.eval()

text = "Counselor: It sounds like you're feeling overwhelmed. Can you tell me more?"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)

with torch.no_grad():
    probs = F.softmax(model(**inputs).logits, dim=-1).squeeze()

top3 = probs.argsort(descending=True)[:3]
for i in top3:
    print(f"{model.config.id2label[i.item()]}: {probs[i].item():.4f}")

To resolve category codes to human-readable descriptions:

import json
from huggingface_hub import hf_hub_download

path = hf_hub_download("th-nuernberg/OnCoCoV1", "code_to_category.json", repo_type="dataset")
with open(path) as f:
    code2cat = json.load(f)

for i in top3:
    code = model.config.id2label[i.item()]
    print(f"{code}{code2cat.get(code, '?')}: {probs[i].item():.4f}")

Category Taxonomy

The 66 categories are organized hierarchically for both speaker roles:

Counselor (38 categories)

  • Formalities (opening, closing)
  • Moderation
  • Impact factors: analysis & clarification of problems (13), objectives (2), motivation (4), resource activation (5), problem solving (8)
  • Other statements

Client (28 categories)

  • Formalities (opening, closing)
  • Empathy expression (3)
  • Impact factors: problem analysis (8), objectives (2), motivation (2), resource activation (2), coping assistance (6)
  • Other statements

Full label descriptions are available via the code_to_category.json file in the dataset repository.

Intended Use

  • Automated content analysis of online counseling conversations
  • Research on counselor–client communication patterns
  • Educational feedback tools for counselor training
  • Conversational AI research in the mental health domain

Limitations

  • Performance varies across categories; rare categories with few training examples show lower F1 scores
  • Some semantically overlapping categories (e.g., problem statement vs. problem definition) are harder to distinguish
  • English texts are machine-translated from German; some translation artifacts may affect performance on native English counseling texts

Citation

If you use this model, please cite the OnCoCo paper:

@inproceedings{albrecht-etal-2026-oncoco,
    title = "{O}n{C}o{C}o 1.0: A Public Dataset for Fine-Grained Message Classification in Online Counseling Conversations",
    author = "Albrecht, Jens and Lehmann, Robert and Poltermann, Aleksandra and Rudolph, Eric and Steigerwald, Philipp and Stieler, Mara",
    booktitle = "Proceedings of the Joint Workshop on Social Context (SoCon) and Integrating NLP and Psychology to Study Social Interactions (NLPSI) at LREC-COLING 2026",
    month = may,
    year = "2026",
    address = "Palma de Mallorca, Spain",
    publisher = "ELRA and ICCL",
}

ArXiv preprint: arXiv:2512.09804

License

CC BY-SA 4.0 — Technische Hochschule Nürnberg

Downloads last month
22
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for th-nuernberg/modernbert-base-online-counseling-oncoco

Finetuned
(1189)
this model

Dataset used to train th-nuernberg/modernbert-base-online-counseling-oncoco

Paper for th-nuernberg/modernbert-base-online-counseling-oncoco

Evaluation results