Brain / README.md
Esvanth's picture
Upload folder using huggingface_hub
016c645 verified

MindScan β€” Mental Health Detection System

NCI H9DAI Research Project 2026 Β· MSc Artificial Intelligence

A multi-model mental health text analysis system that runs 12 ML classifiers across 3 datasets simultaneously, returning depression type, binary depression likelihood, and suicide risk scores for any input text.


Project Structure

MindScan/
β”œβ”€β”€ app.py              Flask backend β€” start here
β”œβ”€β”€ predict.py          Prediction logic (all 12 models)
β”œβ”€β”€ requirements.txt    Python dependencies
β”œβ”€β”€ README.md           This file
β”œβ”€β”€ templates/
β”‚   └── index.html      UI (served by Flask at localhost:5000)
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ classical/      Download from Google Drive (see below)
β”‚   └── transformers/   Download from Google Drive (see below)
└── notebooks/
    β”œβ”€β”€ DA_Notebook_One.ipynb   Classical model training
    └── DA_2_Notebook.ipynb     XLM-RoBERTa + comparison

Github Link

https://github.com/Amod069/MindScan

Setup

1. Download model files from Google Drive

Download MindScan_Models/ from Google Drive and place the contents like this: https://drive.google.com/drive/folders/16jfsPUcdekDWqtk4evTjQHQO2YoKJdpQ?usp=sharing

models/
β”œβ”€β”€ classical/
β”‚   β”œβ”€β”€ le_d1.pkl, le_d2.pkl, le_d3.pkl
β”‚   β”œβ”€β”€ tfidf_d1.pkl, tfidf_d2.pkl, tfidf_d3.pkl
β”‚   β”œβ”€β”€ logistic_regression_d1.pkl, _d2.pkl, _d3.pkl
β”‚   β”œβ”€β”€ svm_d1.pkl, _d2.pkl, _d3.pkl
β”‚   └── xgboost_d1.pkl, _d2.pkl, _d3.pkl
└── transformers/
    β”œβ”€β”€ xlmr_d1_final/
    β”œβ”€β”€ xlmr_d2_final/
    └── xlmr_d3_final/

2. Create Python environment

python -m venv venv

# Mac/Linux
source venv/bin/activate

# Windows
venv\Scripts\activate

3. Install dependencies

pip install -r requirements.txt

4. Run the server

python app.py

5. Open the UI

http://localhost:5000

Note: First startup takes ~30 seconds while XLM-RoBERTa models load into memory.


The 3 Datasets

Dataset Source Size Task
D1 Nusrat et al. (2024) Zenodo 14233292 14,983 tweets 6-class depression type
D2 albertobellardini Kaggle 10,314 tweets Binary depression
D3 nikhileswarkomati Kaggle 50,000 Reddit posts Binary suicide risk

The 4 Models (per dataset = 12 total)

  1. Logistic Regression β€” simple linear baseline
  2. SVM (LinearSVC) β€” classical NLP gold standard
  3. XGBoost β€” gradient boosting
  4. XLM-RoBERTa β€” transformer, contextual embeddings

Note: Random Forest excluded from deployment (646 MB files, worst performer on D1/D3).


Real Results

Dataset Best Model Macro F1 Cohen's Kappa
D1 Depression Type SVM 0.9269 0.9072
D2 Binary Depression XLM-RoBERTa 0.9993 0.9986
D3 Suicide Risk XLM-RoBERTa 0.9810 0.9620

Key finding: SVM outperforms XLM-RoBERTa on 6-class psychiatric classification (D1). All models exceed the Tumaliuan et al. (2024) benchmark of F1=0.81.


API

POST /predict

// Request
{ "text": "your text here" }

// Response
{
  "dataset1": {
    "task": "Depression Type (6 Classes)",
    "models": {
      "Logistic Regression": { "label": "postpartum", "confidence": 0.958 },
      "SVM":                  { "label": "postpartum", "confidence": 0.828 },
      "XGBoost":              { "label": "postpartum", "confidence": 0.999 },
      "XLM-RoBERTa":         { "label": "postpartum", "confidence": 0.997 }
    },
    "winner_model": "XGBoost",
    "winner_prediction": "postpartum",
    "winner_confidence": 0.999,
    "class_probs": { "postpartum": 0.997, "bipolar": 0.001, ... }
  },
  "dataset2": { ... },
  "dataset3": { ... },
  "risk_flag": false,
  "suicide_votes": "0/4 models flagged suicide risk",
  "processing_time_ms": 2341
}

GET /health

{ "status": "ok", "models_ready": true }

Disclaimer

This system is a research prototype built for academic coursework. It is not a clinical tool and must never be used for actual medical diagnosis or mental health assessment. All datasets are from publicly available sources for research purposes only.


NCI H9DAI Β· Data Analytics for Artificial Intelligence Β· 2026