---
language:
- en
license: apache-2.0
base_model: distilbert/distilbert-base-uncased
tags:
- text-classification
- guardrail
- safety
- cbt
- mental-health
- academic-stress
pipeline_tag: text-classification
---

# ReframeBot-Guardrail-DistilBERT

A 3-class text classifier fine-tuned from `distilbert-base-uncased` that
routes conversation turns to one of three task modes:

| Label | Meaning |
|---|---|
| `TASK_1` | CBT / academic stress — engage with Socratic questioning |
| `TASK_2` | Crisis / self-harm signal — redirect to emergency hotlines |
| `TASK_3` | Out-of-scope — validate feeling, pivot back to academics |

This model is the guardrail component of the ReframeBot system. Its output
is combined with a dual-signal crisis detector (regex + cosine similarity)
before the final routing decision is made.

## Usage

```python
from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="Nhatminh1234/ReframeBot-Guardrail-DistilBERT",
)

classifier("I'm really stressed about my finals next week")
# [{'label': 'TASK_1', 'score': 0.97}]

classifier("What's a good recipe for pasta?")
# [{'label': 'TASK_3', 'score': 0.94}]
```

## Training Details

| Hyperparameter | Value |
|---|---|
| Base model | distilbert-base-uncased |
| Number of labels | 3 (TASK_1, TASK_2, TASK_3) |
| Learning rate | 2e-6 |
| Batch size | 16 |
| Max epochs | 20 (early stopping, patience=3) |
| Weight decay | 0.01 |
| Max token length | 128 |
| Best model criterion | macro F1 |
| Hardware | NVIDIA RTX 5070 (laptop, 8 GB VRAM) |

**Dataset:** 1,674 labelled samples (80/20 train/val split). Includes hard
negatives — benign metaphors that superficially resemble crisis language
(e.g., "dying of embarrassment after that presentation").

## Evaluation

Per-class results on the validation split (335 samples):

| Class | Precision | Recall | F1 | Support |
|---|---|---|---|---|
| TASK_1 | 0.99 | 1.00 | 1.00 | 107 |
| TASK_2 | 0.98 | 1.00 | 0.99 | 91 |
| TASK_3 | 1.00 | 0.98 | 0.99 | 137 |
| **macro avg** | **0.99** | **0.99** | **0.99** | **335** |

Accuracy on a separate, harder held-out test set: **91.1%** (includes
boundary cases not present in the training distribution).

## Intended Use

Designed as a routing component in the ReframeBot system. The TASK_2
output alone is not sufficient for crisis intervention — the full system
also applies a regex + semantic similarity layer before acting on a
crisis signal.

## Project

GitHub: [ReframeBot](https://github.com/minhnghiem32131024429/ReframeBot)