# MindScan — Mental Health Detection System
### NCI H9DAI Research Project 2026 · MSc Artificial Intelligence

A multi-model mental health text analysis system that runs 12 ML classifiers across 3 datasets simultaneously, returning depression type, binary depression likelihood, and suicide risk scores for any input text.

---

## Project Structure

```
MindScan/
├── app.py              Flask backend — start here
├── predict.py          Prediction logic (all 12 models)
├── requirements.txt    Python dependencies
├── README.md           This file
├── templates/
│   └── index.html      UI (served by Flask at localhost:5000)
├── models/
│   ├── classical/      Download from Google Drive (see below)
│   └── transformers/   Download from Google Drive (see below)
└── notebooks/
    ├── DA_Notebook_One.ipynb   Classical model training
    └── DA_2_Notebook.ipynb     XLM-RoBERTa + comparison
```

---
## Github Link
https://github.com/Amod069/MindScan


## Setup

### 1. Download model files from Google Drive
Download `MindScan_Models/` from Google Drive and place the contents like this:
https://drive.google.com/drive/folders/16jfsPUcdekDWqtk4evTjQHQO2YoKJdpQ?usp=sharing

```
models/
├── classical/
│   ├── le_d1.pkl, le_d2.pkl, le_d3.pkl
│   ├── tfidf_d1.pkl, tfidf_d2.pkl, tfidf_d3.pkl
│   ├── logistic_regression_d1.pkl, _d2.pkl, _d3.pkl
│   ├── svm_d1.pkl, _d2.pkl, _d3.pkl
│   └── xgboost_d1.pkl, _d2.pkl, _d3.pkl
└── transformers/
    ├── xlmr_d1_final/
    ├── xlmr_d2_final/
    └── xlmr_d3_final/
```

### 2. Create Python environment
```bash
python -m venv venv

# Mac/Linux
source venv/bin/activate

# Windows
venv\Scripts\activate
```

### 3. Install dependencies
```bash
pip install -r requirements.txt
```

### 4. Run the server
```bash
python app.py
```

### 5. Open the UI
```
http://localhost:5000
```

**Note:** First startup takes ~30 seconds while XLM-RoBERTa models load into memory.

---

## The 3 Datasets

| | Dataset | Source | Size | Task |
|---|---|---|---|---|
| D1 | Nusrat et al. (2024) | Zenodo 14233292 | 14,983 tweets | 6-class depression type |
| D2 | albertobellardini | Kaggle | 10,314 tweets | Binary depression |
| D3 | nikhileswarkomati | Kaggle | 50,000 Reddit posts | Binary suicide risk |

## The 4 Models (per dataset = 12 total)

1. **Logistic Regression** — simple linear baseline
2. **SVM (LinearSVC)** — classical NLP gold standard
3. **XGBoost** — gradient boosting
4. **XLM-RoBERTa** — transformer, contextual embeddings

*Note: Random Forest excluded from deployment (646 MB files, worst performer on D1/D3).*

---

## Real Results

| Dataset | Best Model | Macro F1 | Cohen's Kappa |
|---|---|---|---|
| D1 Depression Type | **SVM** | 0.9269 | 0.9072 |
| D2 Binary Depression | **XLM-RoBERTa** | 0.9993 | 0.9986 |
| D3 Suicide Risk | **XLM-RoBERTa** | 0.9810 | 0.9620 |

**Key finding:** SVM outperforms XLM-RoBERTa on 6-class psychiatric classification (D1). All models exceed the Tumaliuan et al. (2024) benchmark of F1=0.81.

---

## API

**POST /predict**
```json
// Request
{ "text": "your text here" }

// Response
{
  "dataset1": {
    "task": "Depression Type (6 Classes)",
    "models": {
      "Logistic Regression": { "label": "postpartum", "confidence": 0.958 },
      "SVM":                  { "label": "postpartum", "confidence": 0.828 },
      "XGBoost":              { "label": "postpartum", "confidence": 0.999 },
      "XLM-RoBERTa":         { "label": "postpartum", "confidence": 0.997 }
    },
    "winner_model": "XGBoost",
    "winner_prediction": "postpartum",
    "winner_confidence": 0.999,
    "class_probs": { "postpartum": 0.997, "bipolar": 0.001, ... }
  },
  "dataset2": { ... },
  "dataset3": { ... },
  "risk_flag": false,
  "suicide_votes": "0/4 models flagged suicide risk",
  "processing_time_ms": 2341
}
```

**GET /health**
```json
{ "status": "ok", "models_ready": true }
```

---

## Disclaimer

This system is a research prototype built for academic coursework. It is **not** a clinical tool and must never be used for actual medical diagnosis or mental health assessment. All datasets are from publicly available sources for research purposes only.

---

*NCI H9DAI · Data Analytics for Artificial Intelligence · 2026*