YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
MindScan β Mental Health Detection System
NCI H9DAI Research Project 2026 Β· MSc Artificial Intelligence
A multi-model mental health text analysis system that runs 12 ML classifiers across 3 datasets simultaneously, returning depression type, binary depression likelihood, and suicide risk scores for any input text.
Project Structure
MindScan/
βββ app.py Flask backend β start here
βββ predict.py Prediction logic (all 12 models)
βββ requirements.txt Python dependencies
βββ README.md This file
βββ templates/
β βββ index.html UI (served by Flask at localhost:5000)
βββ models/
β βββ classical/ Download from Google Drive (see below)
β βββ transformers/ Download from Google Drive (see below)
βββ notebooks/
βββ DA_Notebook_One.ipynb Classical model training
βββ DA_2_Notebook.ipynb XLM-RoBERTa + comparison
Github Link
https://github.com/Amod069/MindScan
Setup
1. Download model files from Google Drive
Download MindScan_Models/ from Google Drive and place the contents like this:
https://drive.google.com/drive/folders/16jfsPUcdekDWqtk4evTjQHQO2YoKJdpQ?usp=sharing
models/
βββ classical/
β βββ le_d1.pkl, le_d2.pkl, le_d3.pkl
β βββ tfidf_d1.pkl, tfidf_d2.pkl, tfidf_d3.pkl
β βββ logistic_regression_d1.pkl, _d2.pkl, _d3.pkl
β βββ svm_d1.pkl, _d2.pkl, _d3.pkl
β βββ xgboost_d1.pkl, _d2.pkl, _d3.pkl
βββ transformers/
βββ xlmr_d1_final/
βββ xlmr_d2_final/
βββ xlmr_d3_final/
2. Create Python environment
python -m venv venv
# Mac/Linux
source venv/bin/activate
# Windows
venv\Scripts\activate
3. Install dependencies
pip install -r requirements.txt
4. Run the server
python app.py
5. Open the UI
http://localhost:5000
Note: First startup takes ~30 seconds while XLM-RoBERTa models load into memory.
The 3 Datasets
| Dataset | Source | Size | Task | |
|---|---|---|---|---|
| D1 | Nusrat et al. (2024) | Zenodo 14233292 | 14,983 tweets | 6-class depression type |
| D2 | albertobellardini | Kaggle | 10,314 tweets | Binary depression |
| D3 | nikhileswarkomati | Kaggle | 50,000 Reddit posts | Binary suicide risk |
The 4 Models (per dataset = 12 total)
- Logistic Regression β simple linear baseline
- SVM (LinearSVC) β classical NLP gold standard
- XGBoost β gradient boosting
- XLM-RoBERTa β transformer, contextual embeddings
Note: Random Forest excluded from deployment (646 MB files, worst performer on D1/D3).
Real Results
| Dataset | Best Model | Macro F1 | Cohen's Kappa |
|---|---|---|---|
| D1 Depression Type | SVM | 0.9269 | 0.9072 |
| D2 Binary Depression | XLM-RoBERTa | 0.9993 | 0.9986 |
| D3 Suicide Risk | XLM-RoBERTa | 0.9810 | 0.9620 |
Key finding: SVM outperforms XLM-RoBERTa on 6-class psychiatric classification (D1). All models exceed the Tumaliuan et al. (2024) benchmark of F1=0.81.
API
POST /predict
// Request
{ "text": "your text here" }
// Response
{
"dataset1": {
"task": "Depression Type (6 Classes)",
"models": {
"Logistic Regression": { "label": "postpartum", "confidence": 0.958 },
"SVM": { "label": "postpartum", "confidence": 0.828 },
"XGBoost": { "label": "postpartum", "confidence": 0.999 },
"XLM-RoBERTa": { "label": "postpartum", "confidence": 0.997 }
},
"winner_model": "XGBoost",
"winner_prediction": "postpartum",
"winner_confidence": 0.999,
"class_probs": { "postpartum": 0.997, "bipolar": 0.001, ... }
},
"dataset2": { ... },
"dataset3": { ... },
"risk_flag": false,
"suicide_votes": "0/4 models flagged suicide risk",
"processing_time_ms": 2341
}
GET /health
{ "status": "ok", "models_ready": true }
Disclaimer
This system is a research prototype built for academic coursework. It is not a clinical tool and must never be used for actual medical diagnosis or mental health assessment. All datasets are from publicly available sources for research purposes only.
NCI H9DAI Β· Data Analytics for Artificial Intelligence Β· 2026