Esvanth commited on
Commit
3bfc784
Β·
verified Β·
1 Parent(s): 7d675ba

Add README.md

Browse files
Files changed (1) hide show
  1. README.md +161 -5
README.md CHANGED
@@ -1,10 +1,166 @@
1
  ---
2
- title: Mindscan
3
- emoji: πŸ“‰
4
- colorFrom: green
5
- colorTo: red
6
  sdk: docker
 
7
  pinned: false
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: MindScan
3
+ emoji: 🧠
4
+ colorFrom: indigo
5
+ colorTo: purple
6
  sdk: docker
7
+ app_port: 7860
8
  pinned: false
9
  ---
10
 
11
+ # MindScan β€” Mental Health Detection System
12
+ ### NCI H9DAI Research Project 2026 Β· MSc Artificial Intelligence
13
+
14
+ A multi-model mental health text analysis system that runs 12 ML classifiers across 3 datasets simultaneously, returning depression type, binary depression likelihood, and suicide risk scores for any input text.
15
+
16
+ ---
17
+
18
+ ## Project Structure
19
+
20
+ ```
21
+ MindScan/
22
+ β”œβ”€β”€ app.py Flask backend β€” start here
23
+ β”œβ”€β”€ predict.py Prediction logic (all 12 models)
24
+ β”œβ”€β”€ requirements.txt Python dependencies
25
+ β”œβ”€β”€ README.md This file
26
+ β”œβ”€β”€ templates/
27
+ β”‚ └── index.html UI (served by Flask at localhost:5000)
28
+ β”œβ”€β”€ models/
29
+ β”‚ β”œβ”€β”€ classical/ Download from Google Drive (see below)
30
+ β”‚ └── transformers/ Download from Google Drive (see below)
31
+ └── notebooks/
32
+ β”œβ”€β”€ DA_Notebook_One.ipynb Classical model training
33
+ └── DA_2_Notebook.ipynb XLM-RoBERTa + comparison
34
+ ```
35
+
36
+ ---
37
+ ## Github Link
38
+ https://github.com/Amod069/MindScan
39
+
40
+
41
+
42
+ ## Setup
43
+
44
+ ### 1. Download model files from Google Drive
45
+ Download `MindScan_Models/` from Google Drive and place the contents like this:
46
+ https://drive.google.com/drive/folders/16jfsPUcdekDWqtk4evTjQHQO2YoKJdpQ?usp=sharing
47
+
48
+ ```
49
+ models/
50
+ β”œβ”€β”€ classical/
51
+ β”‚ β”œβ”€β”€ le_d1.pkl, le_d2.pkl, le_d3.pkl
52
+ β”‚ β”œβ”€β”€ tfidf_d1.pkl, tfidf_d2.pkl, tfidf_d3.pkl
53
+ β”‚ β”œβ”€β”€ logistic_regression_d1.pkl, _d2.pkl, _d3.pkl
54
+ β”‚ β”œβ”€β”€ svm_d1.pkl, _d2.pkl, _d3.pkl
55
+ β”‚ └── xgboost_d1.pkl, _d2.pkl, _d3.pkl
56
+ └── transformers/
57
+ β”œβ”€β”€ xlmr_d1_final/
58
+ β”œβ”€β”€ xlmr_d2_final/
59
+ └── xlmr_d3_final/
60
+ ```
61
+
62
+ ### 2. Create Python environment
63
+ ```bash
64
+ python -m venv venv
65
+
66
+ # Mac/Linux
67
+ source venv/bin/activate
68
+
69
+ # Windows
70
+ venv\Scripts\activate
71
+ ```
72
+
73
+ ### 3. Install dependencies
74
+ ```bash
75
+ pip install -r requirements.txt
76
+ ```
77
+
78
+ ### 4. Run the server
79
+ ```bash
80
+ python app.py
81
+ ```
82
+
83
+ ### 5. Open the UI
84
+ ```
85
+ http://localhost:5000
86
+ ```
87
+
88
+ **Note:** First startup takes ~30 seconds while XLM-RoBERTa models load into memory.
89
+
90
+ ---
91
+
92
+ ## The 3 Datasets
93
+
94
+ | | Dataset | Source | Size | Task |
95
+ |---|---|---|---|---|
96
+ | D1 | Nusrat et al. (2024) | Zenodo 14233292 | 14,983 tweets | 6-class depression type |
97
+ | D2 | albertobellardini | Kaggle | 10,314 tweets | Binary depression |
98
+ | D3 | nikhileswarkomati | Kaggle | 50,000 Reddit posts | Binary suicide risk |
99
+
100
+ ## The 4 Models (per dataset = 12 total)
101
+
102
+ 1. **Logistic Regression** β€” simple linear baseline
103
+ 2. **SVM (LinearSVC)** β€” classical NLP gold standard
104
+ 3. **XGBoost** β€” gradient boosting
105
+ 4. **XLM-RoBERTa** β€” transformer, contextual embeddings
106
+
107
+ *Note: Random Forest excluded from deployment (646 MB files, worst performer on D1/D3).*
108
+
109
+ ---
110
+
111
+ ## Real Results
112
+
113
+ | Dataset | Best Model | Macro F1 | Cohen's Kappa |
114
+ |---|---|---|---|
115
+ | D1 Depression Type | **SVM** | 0.9269 | 0.9072 |
116
+ | D2 Binary Depression | **XLM-RoBERTa** | 0.9993 | 0.9986 |
117
+ | D3 Suicide Risk | **XLM-RoBERTa** | 0.9810 | 0.9620 |
118
+
119
+ **Key finding:** SVM outperforms XLM-RoBERTa on 6-class psychiatric classification (D1). All models exceed the Tumaliuan et al. (2024) benchmark of F1=0.81.
120
+
121
+ ---
122
+
123
+ ## API
124
+
125
+ **POST /predict**
126
+ ```json
127
+ // Request
128
+ { "text": "your text here" }
129
+
130
+ // Response
131
+ {
132
+ "dataset1": {
133
+ "task": "Depression Type (6 Classes)",
134
+ "models": {
135
+ "Logistic Regression": { "label": "postpartum", "confidence": 0.958 },
136
+ "SVM": { "label": "postpartum", "confidence": 0.828 },
137
+ "XGBoost": { "label": "postpartum", "confidence": 0.999 },
138
+ "XLM-RoBERTa": { "label": "postpartum", "confidence": 0.997 }
139
+ },
140
+ "winner_model": "XGBoost",
141
+ "winner_prediction": "postpartum",
142
+ "winner_confidence": 0.999,
143
+ "class_probs": { "postpartum": 0.997, "bipolar": 0.001, ... }
144
+ },
145
+ "dataset2": { ... },
146
+ "dataset3": { ... },
147
+ "risk_flag": false,
148
+ "suicide_votes": "0/4 models flagged suicide risk",
149
+ "processing_time_ms": 2341
150
+ }
151
+ ```
152
+
153
+ **GET /health**
154
+ ```json
155
+ { "status": "ok", "models_ready": true }
156
+ ```
157
+
158
+ ---
159
+
160
+ ## Disclaimer
161
+
162
+ This system is a research prototype built for academic coursework. It is **not** a clinical tool and must never be used for actual medical diagnosis or mental health assessment. All datasets are from publicly available sources for research purposes only.
163
+
164
+ ---
165
+
166
+ *NCI H9DAI Β· Data Analytics for Artificial Intelligence Β· 2026*