File size: 4,495 Bytes
568591d
3bfc784
 
 
 
568591d
3bfc784
568591d
 
 
3bfc784
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
---
title: MindScan
emoji: 🧠
colorFrom: indigo
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
---

# MindScan β€” Mental Health Detection System
### NCI H9DAI Research Project 2026 Β· MSc Artificial Intelligence

A multi-model mental health text analysis system that runs 12 ML classifiers across 3 datasets simultaneously, returning depression type, binary depression likelihood, and suicide risk scores for any input text.

---

## Project Structure

```
MindScan/
β”œβ”€β”€ app.py              Flask backend β€” start here
β”œβ”€β”€ predict.py          Prediction logic (all 12 models)
β”œβ”€β”€ requirements.txt    Python dependencies
β”œβ”€β”€ README.md           This file
β”œβ”€β”€ templates/
β”‚   └── index.html      UI (served by Flask at localhost:5000)
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ classical/      Download from Google Drive (see below)
β”‚   └── transformers/   Download from Google Drive (see below)
└── notebooks/
    β”œβ”€β”€ DA_Notebook_One.ipynb   Classical model training
    └── DA_2_Notebook.ipynb     XLM-RoBERTa + comparison
```

---
## Github Link
https://github.com/Amod069/MindScan



## Setup

### 1. Download model files from Google Drive
Download `MindScan_Models/` from Google Drive and place the contents like this:
https://drive.google.com/drive/folders/16jfsPUcdekDWqtk4evTjQHQO2YoKJdpQ?usp=sharing

```
models/
β”œβ”€β”€ classical/
β”‚   β”œβ”€β”€ le_d1.pkl, le_d2.pkl, le_d3.pkl
β”‚   β”œβ”€β”€ tfidf_d1.pkl, tfidf_d2.pkl, tfidf_d3.pkl
β”‚   β”œβ”€β”€ logistic_regression_d1.pkl, _d2.pkl, _d3.pkl
β”‚   β”œβ”€β”€ svm_d1.pkl, _d2.pkl, _d3.pkl
β”‚   └── xgboost_d1.pkl, _d2.pkl, _d3.pkl
└── transformers/
    β”œβ”€β”€ xlmr_d1_final/
    β”œβ”€β”€ xlmr_d2_final/
    └── xlmr_d3_final/
```

### 2. Create Python environment
```bash
python -m venv venv

# Mac/Linux
source venv/bin/activate

# Windows
venv\Scripts\activate
```

### 3. Install dependencies
```bash
pip install -r requirements.txt
```

### 4. Run the server
```bash
python app.py
```

### 5. Open the UI
```
http://localhost:5000
```

**Note:** First startup takes ~30 seconds while XLM-RoBERTa models load into memory.

---

## The 3 Datasets

| | Dataset | Source | Size | Task |
|---|---|---|---|---|
| D1 | Nusrat et al. (2024) | Zenodo 14233292 | 14,983 tweets | 6-class depression type |
| D2 | albertobellardini | Kaggle | 10,314 tweets | Binary depression |
| D3 | nikhileswarkomati | Kaggle | 50,000 Reddit posts | Binary suicide risk |

## The 4 Models (per dataset = 12 total)

1. **Logistic Regression** β€” simple linear baseline
2. **SVM (LinearSVC)** β€” classical NLP gold standard
3. **XGBoost** β€” gradient boosting
4. **XLM-RoBERTa** β€” transformer, contextual embeddings

*Note: Random Forest excluded from deployment (646 MB files, worst performer on D1/D3).*

---

## Real Results

| Dataset | Best Model | Macro F1 | Cohen's Kappa |
|---|---|---|---|
| D1 Depression Type | **SVM** | 0.9269 | 0.9072 |
| D2 Binary Depression | **XLM-RoBERTa** | 0.9993 | 0.9986 |
| D3 Suicide Risk | **XLM-RoBERTa** | 0.9810 | 0.9620 |

**Key finding:** SVM outperforms XLM-RoBERTa on 6-class psychiatric classification (D1). All models exceed the Tumaliuan et al. (2024) benchmark of F1=0.81.

---

## API

**POST /predict**
```json
// Request
{ "text": "your text here" }

// Response
{
  "dataset1": {
    "task": "Depression Type (6 Classes)",
    "models": {
      "Logistic Regression": { "label": "postpartum", "confidence": 0.958 },
      "SVM":                  { "label": "postpartum", "confidence": 0.828 },
      "XGBoost":              { "label": "postpartum", "confidence": 0.999 },
      "XLM-RoBERTa":         { "label": "postpartum", "confidence": 0.997 }
    },
    "winner_model": "XGBoost",
    "winner_prediction": "postpartum",
    "winner_confidence": 0.999,
    "class_probs": { "postpartum": 0.997, "bipolar": 0.001, ... }
  },
  "dataset2": { ... },
  "dataset3": { ... },
  "risk_flag": false,
  "suicide_votes": "0/4 models flagged suicide risk",
  "processing_time_ms": 2341
}
```

**GET /health**
```json
{ "status": "ok", "models_ready": true }
```

---

## Disclaimer

This system is a research prototype built for academic coursework. It is **not** a clinical tool and must never be used for actual medical diagnosis or mental health assessment. All datasets are from publicly available sources for research purposes only.

---

*NCI H9DAI Β· Data Analytics for Artificial Intelligence Β· 2026*