# data_preparation/

Load, clean, split `.npz` data for training/notebooks. **Important:** recompute **head_deviation** from **clipped** yaw/pitch (see `prepare_dataset.py`). **10** features for `face_orientation`: `head_deviation`, `s_face`, `s_eye`, `h_gaze`, `pitch`, `ear_left`, `ear_avg`, `ear_right`, `gaze_offset`, `perclos`.

**prepare_dataset.py:** `load_all_pooled()`, `load_per_person()` for LOPO, `get_numpy_splits()` (XGBoost), `get_dataloaders()` (MLP). Cleans yaw/pitch/roll and EAR to fixed ranges. Face_orientation uses 10 features: head_deviation, s_face, s_eye, h_gaze, pitch, ear_left, ear_avg, ear_right, gaze_offset, perclos.

**data_exploration.ipynb:** EDA — stats, class balance, histograms, correlations.

Import from `models.mlp.train` / `models.xgboost.train` / notebooks — don’t run this module standalone.