Brain / README.md
Esvanth's picture
Upload folder using huggingface_hub
016c645 verified
# MindScan β€” Mental Health Detection System
### NCI H9DAI Research Project 2026 Β· MSc Artificial Intelligence
A multi-model mental health text analysis system that runs 12 ML classifiers across 3 datasets simultaneously, returning depression type, binary depression likelihood, and suicide risk scores for any input text.
---
## Project Structure
```
MindScan/
β”œβ”€β”€ app.py Flask backend β€” start here
β”œβ”€β”€ predict.py Prediction logic (all 12 models)
β”œβ”€β”€ requirements.txt Python dependencies
β”œβ”€β”€ README.md This file
β”œβ”€β”€ templates/
β”‚ └── index.html UI (served by Flask at localhost:5000)
β”œβ”€β”€ models/
β”‚ β”œβ”€β”€ classical/ Download from Google Drive (see below)
β”‚ └── transformers/ Download from Google Drive (see below)
└── notebooks/
β”œβ”€β”€ DA_Notebook_One.ipynb Classical model training
└── DA_2_Notebook.ipynb XLM-RoBERTa + comparison
```
---
## Github Link
https://github.com/Amod069/MindScan
## Setup
### 1. Download model files from Google Drive
Download `MindScan_Models/` from Google Drive and place the contents like this:
https://drive.google.com/drive/folders/16jfsPUcdekDWqtk4evTjQHQO2YoKJdpQ?usp=sharing
```
models/
β”œβ”€β”€ classical/
β”‚ β”œβ”€β”€ le_d1.pkl, le_d2.pkl, le_d3.pkl
β”‚ β”œβ”€β”€ tfidf_d1.pkl, tfidf_d2.pkl, tfidf_d3.pkl
β”‚ β”œβ”€β”€ logistic_regression_d1.pkl, _d2.pkl, _d3.pkl
β”‚ β”œβ”€β”€ svm_d1.pkl, _d2.pkl, _d3.pkl
β”‚ └── xgboost_d1.pkl, _d2.pkl, _d3.pkl
└── transformers/
β”œβ”€β”€ xlmr_d1_final/
β”œβ”€β”€ xlmr_d2_final/
└── xlmr_d3_final/
```
### 2. Create Python environment
```bash
python -m venv venv
# Mac/Linux
source venv/bin/activate
# Windows
venv\Scripts\activate
```
### 3. Install dependencies
```bash
pip install -r requirements.txt
```
### 4. Run the server
```bash
python app.py
```
### 5. Open the UI
```
http://localhost:5000
```
**Note:** First startup takes ~30 seconds while XLM-RoBERTa models load into memory.
---
## The 3 Datasets
| | Dataset | Source | Size | Task |
|---|---|---|---|---|
| D1 | Nusrat et al. (2024) | Zenodo 14233292 | 14,983 tweets | 6-class depression type |
| D2 | albertobellardini | Kaggle | 10,314 tweets | Binary depression |
| D3 | nikhileswarkomati | Kaggle | 50,000 Reddit posts | Binary suicide risk |
## The 4 Models (per dataset = 12 total)
1. **Logistic Regression** β€” simple linear baseline
2. **SVM (LinearSVC)** β€” classical NLP gold standard
3. **XGBoost** β€” gradient boosting
4. **XLM-RoBERTa** β€” transformer, contextual embeddings
*Note: Random Forest excluded from deployment (646 MB files, worst performer on D1/D3).*
---
## Real Results
| Dataset | Best Model | Macro F1 | Cohen's Kappa |
|---|---|---|---|
| D1 Depression Type | **SVM** | 0.9269 | 0.9072 |
| D2 Binary Depression | **XLM-RoBERTa** | 0.9993 | 0.9986 |
| D3 Suicide Risk | **XLM-RoBERTa** | 0.9810 | 0.9620 |
**Key finding:** SVM outperforms XLM-RoBERTa on 6-class psychiatric classification (D1). All models exceed the Tumaliuan et al. (2024) benchmark of F1=0.81.
---
## API
**POST /predict**
```json
// Request
{ "text": "your text here" }
// Response
{
"dataset1": {
"task": "Depression Type (6 Classes)",
"models": {
"Logistic Regression": { "label": "postpartum", "confidence": 0.958 },
"SVM": { "label": "postpartum", "confidence": 0.828 },
"XGBoost": { "label": "postpartum", "confidence": 0.999 },
"XLM-RoBERTa": { "label": "postpartum", "confidence": 0.997 }
},
"winner_model": "XGBoost",
"winner_prediction": "postpartum",
"winner_confidence": 0.999,
"class_probs": { "postpartum": 0.997, "bipolar": 0.001, ... }
},
"dataset2": { ... },
"dataset3": { ... },
"risk_flag": false,
"suicide_votes": "0/4 models flagged suicide risk",
"processing_time_ms": 2341
}
```
**GET /health**
```json
{ "status": "ok", "models_ready": true }
```
---
## Disclaimer
This system is a research prototype built for academic coursework. It is **not** a clinical tool and must never be used for actual medical diagnosis or mental health assessment. All datasets are from publicly available sources for research purposes only.
---
*NCI H9DAI Β· Data Analytics for Artificial Intelligence Β· 2026*