A parallel multi-model system that simultaneously analyses text across three clinical dimensions — extending Tumaliuan et al. (2024) with modern transformers, SMOTE balancing, and cross-platform generalisation.
Dataset 1 is structurally equivalent to the base paper's Filipino Twitter corpus — same 6-class task, same clinical annotation method — making a direct F1 comparison valid.
Click any stage to see exactly what happened at that step — the real numbers and decisions.
Each dataset is trained independently answering a different clinical dimension. Click any row to expand the full statistics.
Click any file in the tree to see what it does and why it exists.
Sample 3 is the most interesting — it demonstrates masked suicidality, the key clinical finding of the project.
Four insights that go beyond the numbers.
| Dataset | Model | Accuracy | Macro F1 | Cohen's κ | |
|---|---|---|---|---|---|
| D1 | SVM | 92.36% | 0.9269 | 0.9072 | ★ Best D1 |
| D1 | XGBoost | 91.76% | 0.9217 | 0.9000 | |
| D1 | Logistic Regression | 91.52% | 0.9179 | 0.8971 | |
| D1 | XLM-RoBERTa | 90.52% | 0.9117 | 0.8852 | 4th — SVM wins |
| D2 | XLM-RoBERTa | 99.95% | 0.9993 | 0.9986 | ★ Best D2 |
| D2 | XGBoost | 99.27% | 0.9895 | 0.9789 | |
| D2 | Logistic Regression | 98.89% | 0.9839 | 0.9678 | |
| D3 | XLM-RoBERTa | 98.10% | 0.9810 | 0.9620 | ★ Best D3 |
| D3 | SVM | 93.68% | 0.9368 | 0.8736 | |
| D3 | Logistic Regression | 93.18% | 0.9318 | 0.8636 | |
| Tumaliuan et al. (2024) baseline | — | 0.8100 | — | ||
Dataset 3 has 232,074 Reddit posts. Our deployed models trained on 50K (25K per class). The professor asked us to split the full corpus into halves and retrain to validate whether our sample was sufficient and representative.
| Split | Model | Accuracy | Macro F1 | Cohen's κ | AUC-ROC | Verdict |
|---|---|---|---|---|---|---|
| Our 50K ★ | XLM-RoBERTa | 98.10% | 0.9810 | 0.9620 | — | Best overall |
| Our 50K | SVM | 93.68% | 0.9368 | 0.8736 | 0.9831 | |
| Our 50K | Logistic Regression | 93.18% | 0.9318 | 0.8636 | 0.9817 | |
| Our 50K | XGBoost | 91.62% | 0.9162 | 0.8324 | — | |
| Full 232K | XLM-RoBERTa | 98.02% | 0.9802 | 0.9604 | — | −0.0008 vs 50K |
| Full 232K | SVM | 94.60% | 0.9460 | 0.8919 | 0.9862 | |
| Full 232K | Logistic Regression | 94.34% | 0.9434 | 0.8868 | 0.9858 | |
| Full 232K | XGBoost | 70.52% | 0.6998 | 0.4104 | 0.7064 | Collapsed ↓ |
| H1 116K | XLM-RoBERTa | 97.78% | 0.9778 | 0.9556 | — | |
| H1 116K | SVM | 94.18% | 0.9418 | 0.8836 | 0.9835 | |
| H1 116K | Logistic Regression | 93.84% | 0.9384 | 0.8769 | 0.9824 | |
| H1 116K | XGBoost | 60.11% | 0.5521 | 0.2017 | 0.6051 | Worst result ↓ |
| H2 116K | XLM-RoBERTa | 98.02% | 0.9802 | 0.9604 | — | |
| H2 116K | SVM | 94.21% | 0.9421 | 0.8842 | 0.9850 | |
| H2 116K | Logistic Regression | 93.74% | 0.9374 | 0.8748 | 0.9832 | |
| H2 116K | XGBoost | 71.00% | 0.7085 | 0.4201 | 0.6805 | Collapsed ↓ |