File size: 4,383 Bytes
016c645 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 | # MindScan β Mental Health Detection System
### NCI H9DAI Research Project 2026 Β· MSc Artificial Intelligence
A multi-model mental health text analysis system that runs 12 ML classifiers across 3 datasets simultaneously, returning depression type, binary depression likelihood, and suicide risk scores for any input text.
---
## Project Structure
```
MindScan/
βββ app.py Flask backend β start here
βββ predict.py Prediction logic (all 12 models)
βββ requirements.txt Python dependencies
βββ README.md This file
βββ templates/
β βββ index.html UI (served by Flask at localhost:5000)
βββ models/
β βββ classical/ Download from Google Drive (see below)
β βββ transformers/ Download from Google Drive (see below)
βββ notebooks/
βββ DA_Notebook_One.ipynb Classical model training
βββ DA_2_Notebook.ipynb XLM-RoBERTa + comparison
```
---
## Github Link
https://github.com/Amod069/MindScan
## Setup
### 1. Download model files from Google Drive
Download `MindScan_Models/` from Google Drive and place the contents like this:
https://drive.google.com/drive/folders/16jfsPUcdekDWqtk4evTjQHQO2YoKJdpQ?usp=sharing
```
models/
βββ classical/
β βββ le_d1.pkl, le_d2.pkl, le_d3.pkl
β βββ tfidf_d1.pkl, tfidf_d2.pkl, tfidf_d3.pkl
β βββ logistic_regression_d1.pkl, _d2.pkl, _d3.pkl
β βββ svm_d1.pkl, _d2.pkl, _d3.pkl
β βββ xgboost_d1.pkl, _d2.pkl, _d3.pkl
βββ transformers/
βββ xlmr_d1_final/
βββ xlmr_d2_final/
βββ xlmr_d3_final/
```
### 2. Create Python environment
```bash
python -m venv venv
# Mac/Linux
source venv/bin/activate
# Windows
venv\Scripts\activate
```
### 3. Install dependencies
```bash
pip install -r requirements.txt
```
### 4. Run the server
```bash
python app.py
```
### 5. Open the UI
```
http://localhost:5000
```
**Note:** First startup takes ~30 seconds while XLM-RoBERTa models load into memory.
---
## The 3 Datasets
| | Dataset | Source | Size | Task |
|---|---|---|---|---|
| D1 | Nusrat et al. (2024) | Zenodo 14233292 | 14,983 tweets | 6-class depression type |
| D2 | albertobellardini | Kaggle | 10,314 tweets | Binary depression |
| D3 | nikhileswarkomati | Kaggle | 50,000 Reddit posts | Binary suicide risk |
## The 4 Models (per dataset = 12 total)
1. **Logistic Regression** β simple linear baseline
2. **SVM (LinearSVC)** β classical NLP gold standard
3. **XGBoost** β gradient boosting
4. **XLM-RoBERTa** β transformer, contextual embeddings
*Note: Random Forest excluded from deployment (646 MB files, worst performer on D1/D3).*
---
## Real Results
| Dataset | Best Model | Macro F1 | Cohen's Kappa |
|---|---|---|---|
| D1 Depression Type | **SVM** | 0.9269 | 0.9072 |
| D2 Binary Depression | **XLM-RoBERTa** | 0.9993 | 0.9986 |
| D3 Suicide Risk | **XLM-RoBERTa** | 0.9810 | 0.9620 |
**Key finding:** SVM outperforms XLM-RoBERTa on 6-class psychiatric classification (D1). All models exceed the Tumaliuan et al. (2024) benchmark of F1=0.81.
---
## API
**POST /predict**
```json
// Request
{ "text": "your text here" }
// Response
{
"dataset1": {
"task": "Depression Type (6 Classes)",
"models": {
"Logistic Regression": { "label": "postpartum", "confidence": 0.958 },
"SVM": { "label": "postpartum", "confidence": 0.828 },
"XGBoost": { "label": "postpartum", "confidence": 0.999 },
"XLM-RoBERTa": { "label": "postpartum", "confidence": 0.997 }
},
"winner_model": "XGBoost",
"winner_prediction": "postpartum",
"winner_confidence": 0.999,
"class_probs": { "postpartum": 0.997, "bipolar": 0.001, ... }
},
"dataset2": { ... },
"dataset3": { ... },
"risk_flag": false,
"suicide_votes": "0/4 models flagged suicide risk",
"processing_time_ms": 2341
}
```
**GET /health**
```json
{ "status": "ok", "models_ready": true }
```
---
## Disclaimer
This system is a research prototype built for academic coursework. It is **not** a clinical tool and must never be used for actual medical diagnosis or mental health assessment. All datasets are from publicly available sources for research purposes only.
---
*NCI H9DAI Β· Data Analytics for Artificial Intelligence Β· 2026*
|