| --- |
| title: MindScan |
| emoji: π§ |
| colorFrom: indigo |
| colorTo: purple |
| sdk: docker |
| app_port: 7860 |
| pinned: false |
| --- |
| |
| # MindScan β Mental Health Detection System |
| ### NCI H9DAI Research Project 2026 Β· MSc Artificial Intelligence |
|
|
| A multi-model mental health text analysis system that runs 12 ML classifiers across 3 datasets simultaneously, returning depression type, binary depression likelihood, and suicide risk scores for any input text. |
|
|
| --- |
|
|
| ## Project Structure |
|
|
| ``` |
| MindScan/ |
| βββ app.py Flask backend β start here |
| βββ predict.py Prediction logic (all 12 models) |
| βββ requirements.txt Python dependencies |
| βββ README.md This file |
| βββ templates/ |
| β βββ index.html UI (served by Flask at localhost:5000) |
| βββ models/ |
| β βββ classical/ Download from Google Drive (see below) |
| β βββ transformers/ Download from Google Drive (see below) |
| βββ notebooks/ |
| βββ DA_Notebook_One.ipynb Classical model training |
| βββ DA_2_Notebook.ipynb XLM-RoBERTa + comparison |
| ``` |
|
|
| --- |
| ## Github Link |
| https://github.com/Amod069/MindScan |
|
|
|
|
|
|
| ## Setup |
|
|
| ### 1. Download model files from Google Drive |
| Download `MindScan_Models/` from Google Drive and place the contents like this: |
| https://drive.google.com/drive/folders/16jfsPUcdekDWqtk4evTjQHQO2YoKJdpQ?usp=sharing |
|
|
| ``` |
| models/ |
| βββ classical/ |
| β βββ le_d1.pkl, le_d2.pkl, le_d3.pkl |
| β βββ tfidf_d1.pkl, tfidf_d2.pkl, tfidf_d3.pkl |
| β βββ logistic_regression_d1.pkl, _d2.pkl, _d3.pkl |
| β βββ svm_d1.pkl, _d2.pkl, _d3.pkl |
| β βββ xgboost_d1.pkl, _d2.pkl, _d3.pkl |
| βββ transformers/ |
| βββ xlmr_d1_final/ |
| βββ xlmr_d2_final/ |
| βββ xlmr_d3_final/ |
| ``` |
|
|
| ### 2. Create Python environment |
| ```bash |
| python -m venv venv |
| |
| # Mac/Linux |
| source venv/bin/activate |
| |
| # Windows |
| venv\Scripts\activate |
| ``` |
|
|
| ### 3. Install dependencies |
| ```bash |
| pip install -r requirements.txt |
| ``` |
|
|
| ### 4. Run the server |
| ```bash |
| python app.py |
| ``` |
|
|
| ### 5. Open the UI |
| ``` |
| http://localhost:5000 |
| ``` |
|
|
| **Note:** First startup takes ~30 seconds while XLM-RoBERTa models load into memory. |
|
|
| --- |
|
|
| ## The 3 Datasets |
|
|
| | | Dataset | Source | Size | Task | |
| |---|---|---|---|---| |
| | D1 | Nusrat et al. (2024) | Zenodo 14233292 | 14,983 tweets | 6-class depression type | |
| | D2 | albertobellardini | Kaggle | 10,314 tweets | Binary depression | |
| | D3 | nikhileswarkomati | Kaggle | 50,000 Reddit posts | Binary suicide risk | |
|
|
| ## The 4 Models (per dataset = 12 total) |
|
|
| 1. **Logistic Regression** β simple linear baseline |
| 2. **SVM (LinearSVC)** β classical NLP gold standard |
| 3. **XGBoost** β gradient boosting |
| 4. **XLM-RoBERTa** β transformer, contextual embeddings |
|
|
| *Note: Random Forest excluded from deployment (646 MB files, worst performer on D1/D3).* |
|
|
| --- |
|
|
| ## Real Results |
|
|
| | Dataset | Best Model | Macro F1 | Cohen's Kappa | |
| |---|---|---|---| |
| | D1 Depression Type | **SVM** | 0.9269 | 0.9072 | |
| | D2 Binary Depression | **XLM-RoBERTa** | 0.9993 | 0.9986 | |
| | D3 Suicide Risk | **XLM-RoBERTa** | 0.9810 | 0.9620 | |
|
|
| **Key finding:** SVM outperforms XLM-RoBERTa on 6-class psychiatric classification (D1). All models exceed the Tumaliuan et al. (2024) benchmark of F1=0.81. |
|
|
| --- |
|
|
| ## API |
|
|
| **POST /predict** |
| ```json |
| // Request |
| { "text": "your text here" } |
| |
| // Response |
| { |
| "dataset1": { |
| "task": "Depression Type (6 Classes)", |
| "models": { |
| "Logistic Regression": { "label": "postpartum", "confidence": 0.958 }, |
| "SVM": { "label": "postpartum", "confidence": 0.828 }, |
| "XGBoost": { "label": "postpartum", "confidence": 0.999 }, |
| "XLM-RoBERTa": { "label": "postpartum", "confidence": 0.997 } |
| }, |
| "winner_model": "XGBoost", |
| "winner_prediction": "postpartum", |
| "winner_confidence": 0.999, |
| "class_probs": { "postpartum": 0.997, "bipolar": 0.001, ... } |
| }, |
| "dataset2": { ... }, |
| "dataset3": { ... }, |
| "risk_flag": false, |
| "suicide_votes": "0/4 models flagged suicide risk", |
| "processing_time_ms": 2341 |
| } |
| ``` |
|
|
| **GET /health** |
| ```json |
| { "status": "ok", "models_ready": true } |
| ``` |
|
|
| --- |
|
|
| ## Disclaimer |
|
|
| This system is a research prototype built for academic coursework. It is **not** a clinical tool and must never be used for actual medical diagnosis or mental health assessment. All datasets are from publicly available sources for research purposes only. |
|
|
| --- |
|
|
| *NCI H9DAI Β· Data Analytics for Artificial Intelligence Β· 2026* |
|
|