MindScan — Mental Health Detection System

NCI H9DAI Research Project 2026 · MSc Artificial Intelligence

A multi-model mental health text analysis system that runs 12 ML classifiers across 3 datasets simultaneously, returning depression type, binary depression likelihood, and suicide risk scores for any input text.

Project Structure

MindScan/
├── app.py              Flask backend — start here
├── predict.py          Prediction logic (all 12 models)
├── requirements.txt    Python dependencies
├── README.md           This file
├── templates/
│   └── index.html      UI (served by Flask at localhost:5000)
├── models/
│   ├── classical/      Download from Google Drive (see below)
│   └── transformers/   Download from Google Drive (see below)
└── notebooks/
    ├── DA_Notebook_One.ipynb   Classical model training
    └── DA_2_Notebook.ipynb     XLM-RoBERTa + comparison

Github Link

https://github.com/Amod069/MindScan

Setup

1. Download model files from Google Drive

Download MindScan_Models/ from Google Drive and place the contents like this: https://drive.google.com/drive/folders/16jfsPUcdekDWqtk4evTjQHQO2YoKJdpQ?usp=sharing

models/
├── classical/
│   ├── le_d1.pkl, le_d2.pkl, le_d3.pkl
│   ├── tfidf_d1.pkl, tfidf_d2.pkl, tfidf_d3.pkl
│   ├── logistic_regression_d1.pkl, _d2.pkl, _d3.pkl
│   ├── svm_d1.pkl, _d2.pkl, _d3.pkl
│   └── xgboost_d1.pkl, _d2.pkl, _d3.pkl
└── transformers/
    ├── xlmr_d1_final/
    ├── xlmr_d2_final/
    └── xlmr_d3_final/

2. Create Python environment

python -m venv venv

# Mac/Linux
source venv/bin/activate

# Windows
venv\Scripts\activate

3. Install dependencies

pip install -r requirements.txt

4. Run the server

python app.py

5. Open the UI

http://localhost:5000

Note: First startup takes ~30 seconds while XLM-RoBERTa models load into memory.

The 3 Datasets

	Dataset	Source	Size	Task
D1	Nusrat et al. (2024)	Zenodo 14233292	14,983 tweets	6-class depression type
D2	albertobellardini	Kaggle	10,314 tweets	Binary depression
D3	nikhileswarkomati	Kaggle	50,000 Reddit posts	Binary suicide risk

The 4 Models (per dataset = 12 total)

Logistic Regression — simple linear baseline
SVM (LinearSVC) — classical NLP gold standard
XGBoost — gradient boosting
XLM-RoBERTa — transformer, contextual embeddings

Note: Random Forest excluded from deployment (646 MB files, worst performer on D1/D3).

Real Results

Dataset	Best Model	Macro F1	Cohen's Kappa
D1 Depression Type	SVM	0.9269	0.9072
D2 Binary Depression	XLM-RoBERTa	0.9993	0.9986
D3 Suicide Risk	XLM-RoBERTa	0.9810	0.9620

Key finding: SVM outperforms XLM-RoBERTa on 6-class psychiatric classification (D1). All models exceed the Tumaliuan et al. (2024) benchmark of F1=0.81.

API

POST /predict

// Request
{ "text": "your text here" }

// Response
{
  "dataset1": {
    "task": "Depression Type (6 Classes)",
    "models": {
      "Logistic Regression": { "label": "postpartum", "confidence": 0.958 },
      "SVM":                  { "label": "postpartum", "confidence": 0.828 },
      "XGBoost":              { "label": "postpartum", "confidence": 0.999 },
      "XLM-RoBERTa":         { "label": "postpartum", "confidence": 0.997 }
    },
    "winner_model": "XGBoost",
    "winner_prediction": "postpartum",
    "winner_confidence": 0.999,
    "class_probs": { "postpartum": 0.997, "bipolar": 0.001, ... }
  },
  "dataset2": { ... },
  "dataset3": { ... },
  "risk_flag": false,
  "suicide_votes": "0/4 models flagged suicide risk",
  "processing_time_ms": 2341
}

GET /health

{ "status": "ok", "models_ready": true }

Disclaimer

This system is a research prototype built for academic coursework. It is not a clinical tool and must never be used for actual medical diagnosis or mental health assessment. All datasets are from publicly available sources for research purposes only.

NCI H9DAI · Data Analytics for Artificial Intelligence · 2026