Add README.md
Browse files
README.md
CHANGED
|
@@ -1,10 +1,166 @@
|
|
| 1 |
---
|
| 2 |
-
title:
|
| 3 |
-
emoji:
|
| 4 |
-
colorFrom:
|
| 5 |
-
colorTo:
|
| 6 |
sdk: docker
|
|
|
|
| 7 |
pinned: false
|
| 8 |
---
|
| 9 |
|
| 10 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
title: MindScan
|
| 3 |
+
emoji: π§
|
| 4 |
+
colorFrom: indigo
|
| 5 |
+
colorTo: purple
|
| 6 |
sdk: docker
|
| 7 |
+
app_port: 7860
|
| 8 |
pinned: false
|
| 9 |
---
|
| 10 |
|
| 11 |
+
# MindScan β Mental Health Detection System
|
| 12 |
+
### NCI H9DAI Research Project 2026 Β· MSc Artificial Intelligence
|
| 13 |
+
|
| 14 |
+
A multi-model mental health text analysis system that runs 12 ML classifiers across 3 datasets simultaneously, returning depression type, binary depression likelihood, and suicide risk scores for any input text.
|
| 15 |
+
|
| 16 |
+
---
|
| 17 |
+
|
| 18 |
+
## Project Structure
|
| 19 |
+
|
| 20 |
+
```
|
| 21 |
+
MindScan/
|
| 22 |
+
βββ app.py Flask backend β start here
|
| 23 |
+
βββ predict.py Prediction logic (all 12 models)
|
| 24 |
+
βββ requirements.txt Python dependencies
|
| 25 |
+
βββ README.md This file
|
| 26 |
+
βββ templates/
|
| 27 |
+
β βββ index.html UI (served by Flask at localhost:5000)
|
| 28 |
+
βββ models/
|
| 29 |
+
β βββ classical/ Download from Google Drive (see below)
|
| 30 |
+
β βββ transformers/ Download from Google Drive (see below)
|
| 31 |
+
βββ notebooks/
|
| 32 |
+
βββ DA_Notebook_One.ipynb Classical model training
|
| 33 |
+
βββ DA_2_Notebook.ipynb XLM-RoBERTa + comparison
|
| 34 |
+
```
|
| 35 |
+
|
| 36 |
+
---
|
| 37 |
+
## Github Link
|
| 38 |
+
https://github.com/Amod069/MindScan
|
| 39 |
+
|
| 40 |
+
|
| 41 |
+
|
| 42 |
+
## Setup
|
| 43 |
+
|
| 44 |
+
### 1. Download model files from Google Drive
|
| 45 |
+
Download `MindScan_Models/` from Google Drive and place the contents like this:
|
| 46 |
+
https://drive.google.com/drive/folders/16jfsPUcdekDWqtk4evTjQHQO2YoKJdpQ?usp=sharing
|
| 47 |
+
|
| 48 |
+
```
|
| 49 |
+
models/
|
| 50 |
+
βββ classical/
|
| 51 |
+
β βββ le_d1.pkl, le_d2.pkl, le_d3.pkl
|
| 52 |
+
β βββ tfidf_d1.pkl, tfidf_d2.pkl, tfidf_d3.pkl
|
| 53 |
+
β βββ logistic_regression_d1.pkl, _d2.pkl, _d3.pkl
|
| 54 |
+
β βββ svm_d1.pkl, _d2.pkl, _d3.pkl
|
| 55 |
+
β βββ xgboost_d1.pkl, _d2.pkl, _d3.pkl
|
| 56 |
+
βββ transformers/
|
| 57 |
+
βββ xlmr_d1_final/
|
| 58 |
+
βββ xlmr_d2_final/
|
| 59 |
+
βββ xlmr_d3_final/
|
| 60 |
+
```
|
| 61 |
+
|
| 62 |
+
### 2. Create Python environment
|
| 63 |
+
```bash
|
| 64 |
+
python -m venv venv
|
| 65 |
+
|
| 66 |
+
# Mac/Linux
|
| 67 |
+
source venv/bin/activate
|
| 68 |
+
|
| 69 |
+
# Windows
|
| 70 |
+
venv\Scripts\activate
|
| 71 |
+
```
|
| 72 |
+
|
| 73 |
+
### 3. Install dependencies
|
| 74 |
+
```bash
|
| 75 |
+
pip install -r requirements.txt
|
| 76 |
+
```
|
| 77 |
+
|
| 78 |
+
### 4. Run the server
|
| 79 |
+
```bash
|
| 80 |
+
python app.py
|
| 81 |
+
```
|
| 82 |
+
|
| 83 |
+
### 5. Open the UI
|
| 84 |
+
```
|
| 85 |
+
http://localhost:5000
|
| 86 |
+
```
|
| 87 |
+
|
| 88 |
+
**Note:** First startup takes ~30 seconds while XLM-RoBERTa models load into memory.
|
| 89 |
+
|
| 90 |
+
---
|
| 91 |
+
|
| 92 |
+
## The 3 Datasets
|
| 93 |
+
|
| 94 |
+
| | Dataset | Source | Size | Task |
|
| 95 |
+
|---|---|---|---|---|
|
| 96 |
+
| D1 | Nusrat et al. (2024) | Zenodo 14233292 | 14,983 tweets | 6-class depression type |
|
| 97 |
+
| D2 | albertobellardini | Kaggle | 10,314 tweets | Binary depression |
|
| 98 |
+
| D3 | nikhileswarkomati | Kaggle | 50,000 Reddit posts | Binary suicide risk |
|
| 99 |
+
|
| 100 |
+
## The 4 Models (per dataset = 12 total)
|
| 101 |
+
|
| 102 |
+
1. **Logistic Regression** β simple linear baseline
|
| 103 |
+
2. **SVM (LinearSVC)** β classical NLP gold standard
|
| 104 |
+
3. **XGBoost** β gradient boosting
|
| 105 |
+
4. **XLM-RoBERTa** β transformer, contextual embeddings
|
| 106 |
+
|
| 107 |
+
*Note: Random Forest excluded from deployment (646 MB files, worst performer on D1/D3).*
|
| 108 |
+
|
| 109 |
+
---
|
| 110 |
+
|
| 111 |
+
## Real Results
|
| 112 |
+
|
| 113 |
+
| Dataset | Best Model | Macro F1 | Cohen's Kappa |
|
| 114 |
+
|---|---|---|---|
|
| 115 |
+
| D1 Depression Type | **SVM** | 0.9269 | 0.9072 |
|
| 116 |
+
| D2 Binary Depression | **XLM-RoBERTa** | 0.9993 | 0.9986 |
|
| 117 |
+
| D3 Suicide Risk | **XLM-RoBERTa** | 0.9810 | 0.9620 |
|
| 118 |
+
|
| 119 |
+
**Key finding:** SVM outperforms XLM-RoBERTa on 6-class psychiatric classification (D1). All models exceed the Tumaliuan et al. (2024) benchmark of F1=0.81.
|
| 120 |
+
|
| 121 |
+
---
|
| 122 |
+
|
| 123 |
+
## API
|
| 124 |
+
|
| 125 |
+
**POST /predict**
|
| 126 |
+
```json
|
| 127 |
+
// Request
|
| 128 |
+
{ "text": "your text here" }
|
| 129 |
+
|
| 130 |
+
// Response
|
| 131 |
+
{
|
| 132 |
+
"dataset1": {
|
| 133 |
+
"task": "Depression Type (6 Classes)",
|
| 134 |
+
"models": {
|
| 135 |
+
"Logistic Regression": { "label": "postpartum", "confidence": 0.958 },
|
| 136 |
+
"SVM": { "label": "postpartum", "confidence": 0.828 },
|
| 137 |
+
"XGBoost": { "label": "postpartum", "confidence": 0.999 },
|
| 138 |
+
"XLM-RoBERTa": { "label": "postpartum", "confidence": 0.997 }
|
| 139 |
+
},
|
| 140 |
+
"winner_model": "XGBoost",
|
| 141 |
+
"winner_prediction": "postpartum",
|
| 142 |
+
"winner_confidence": 0.999,
|
| 143 |
+
"class_probs": { "postpartum": 0.997, "bipolar": 0.001, ... }
|
| 144 |
+
},
|
| 145 |
+
"dataset2": { ... },
|
| 146 |
+
"dataset3": { ... },
|
| 147 |
+
"risk_flag": false,
|
| 148 |
+
"suicide_votes": "0/4 models flagged suicide risk",
|
| 149 |
+
"processing_time_ms": 2341
|
| 150 |
+
}
|
| 151 |
+
```
|
| 152 |
+
|
| 153 |
+
**GET /health**
|
| 154 |
+
```json
|
| 155 |
+
{ "status": "ok", "models_ready": true }
|
| 156 |
+
```
|
| 157 |
+
|
| 158 |
+
---
|
| 159 |
+
|
| 160 |
+
## Disclaimer
|
| 161 |
+
|
| 162 |
+
This system is a research prototype built for academic coursework. It is **not** a clinical tool and must never be used for actual medical diagnosis or mental health assessment. All datasets are from publicly available sources for research purposes only.
|
| 163 |
+
|
| 164 |
+
---
|
| 165 |
+
|
| 166 |
+
*NCI H9DAI Β· Data Analytics for Artificial Intelligence Β· 2026*
|