Spaces:

FocusGuard
/

IntegrationTest

Sleeping

App Files Files Community

IntegrationTest / others /README.md

Yingtao-Zheng

Add general README into folder /others

fad97ce about 1 month ago

preview code

raw

history blame contribute delete

3.2 kB

FocusGuard

Real-time webcam-based focus detection system combining geometric feature extraction with machine learning classification. The pipeline extracts 17 facial features (EAR, gaze, head pose, PERCLOS, blink rate, etc.) from MediaPipe landmarks and classifies attentiveness using MLP and XGBoost models. Served via a React + FastAPI web application with live WebSocket video.

1. Project Structure

├── data/                     Raw collected sessions (collected_<name>/*.npz)
├── data_preparation/         Data loading, cleaning, and exploration
├── notebooks/                Training notebooks (MLP, XGBoost) with LOPO evaluation
├── models/                   Feature extraction modules and training scripts
├── checkpoints/              All saved weights (mlp_best.pt, xgboost_*_best.json, GRU, scalers)
├── evaluation/               Training logs and metrics (JSON)
├── ui/                       Live OpenCV demo and inference pipeline
├── src/                      React/Vite frontend source
├── static/                   Built frontend (served by FastAPI)
├── app.py / main.py          FastAPI backend (API, WebSocket, DB)
├── requirements.txt          Python dependencies
└── package.json              Frontend dependencies

2. Setup

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Frontend (only needed if modifying the React app):

npm install
npm run build
cp -r dist/* static/

3. Running

Web application (API + frontend):

uvicorn app:app --host 0.0.0.0 --port 7860

Open http://localhost:7860 in a browser.

Live camera demo (OpenCV):

python ui/live_demo.py
python ui/live_demo.py --xgb      # XGBoost mode

Training:

python -m models.mlp.train        # MLP
python -m models.xgboost.train    # XGBoost

4. Dataset

9 participants, each recorded via webcam with real-time labelling (focused / unfocused)
144,793 total samples, 10 selected features, binary classification
Collected using python -m models.collect_features --name <name>
Stored as .npz files in data/collected_<name>/

5. Models

Model	Test Accuracy	Test F1	ROC-AUC
XGBoost (600 trees, depth 8, lr 0.149)	95.87%	0.959	0.991
MLP (64→32, 30 epochs, lr 1e-3)	92.92%	0.929	0.971

Both evaluated on a held-out 15% stratified test split. LOPO (Leave-One-Person-Out) cross-validation available in notebooks/.

6. Feature Pipeline

Face mesh — MediaPipe 478-landmark detection
Head pose — solvePnP → yaw, pitch, roll, face score, gaze offset, head deviation
Eye scorer — EAR (left/right/avg), horizontal/vertical gaze ratio, MAR
Temporal tracking — PERCLOS, blink rate, closure duration, yawn duration
Classification — 10-feature vector → MLP or XGBoost → focused / unfocused

7. Tech Stack

Backend: Python, FastAPI, WebSocket, aiosqlite
Frontend: React, Vite, TypeScript
ML: PyTorch (MLP), XGBoost, scikit-learn
Vision: MediaPipe, OpenCV