File size: 4,230 Bytes
c92947f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
---
library_name: sklearn
tags:
- sklearn
- lightgbm
- tabular-classification
- notification-timing
- interruptibility
pipeline_tag: tabular-classification
metrics:
- roc_auc
- f1
- accuracy
model-index:
- name: notification-bad-timing-detector
  results:
  - task:
      type: tabular-classification
      name: Notification Bad Timing Detection
    metrics:
    - name: ROC-AUC
      type: roc_auc
      value: 0.8338
    - name: PR-AUC
      type: average_precision
      value: 0.8525
    - name: F1
      type: f1
      value: 0.7766
    - name: Accuracy
      type: accuracy
      value: 0.7583
---

# πŸ”” Notification Bad-Timing Probability Detector

## Overview
Predicts the probability that **now is a bad time** to send a push notification.
Uses 21 contextual signals: user activity patterns, battery status, and notification interaction history.

## Performance

| Metric | Calibrated Model | 5-Model Ensemble |
|--------|:---:|:---:|
| **ROC-AUC** | 0.8338 | 0.8344 |
| **PR-AUC** | 0.8525 | 0.8595 |
| **Brier Score** | 0.1657 | β€” |
| **Accuracy** | 0.7583 | β€” |
| **F1** | 0.7766 | β€” |

## Architecture
- **Base model**: LightGBM gradient boosted trees β€” SOTA for tabular data per [Grinsztajn et al. 2022](https://arxiv.org/abs/2207.08815) and [TabArena 2025](https://arxiv.org/abs/2506.16791)
- **Calibration**: Isotonic regression for well-calibrated probability output
- **Ensemble**: 5 models with different random seeds (averaged for best robustness)
- **Hyperparameter tuning**: Random search over 40 configurations with early stopping

## Features (21 input signals)

| Category | Features |
|----------|----------|
| **Time** | hour_of_day, day_of_week, hour_sin, hour_cos, is_weekend, is_night |
| **Battery** | battery_level, is_charging, battery_change_rate |
| **Activity** | screen_on, screen_on_duration_30min, app_opens_last_hour, session_length_current, time_since_last_interaction |
| **Notifications** | notif_shown_last_30min, notif_clicked_last_30min, notif_dismissed_last_30min, notif_ignored_last_30min, notif_shown_last_24h, notif_ctr_last_7d, recent_notification_density |

## Top Features by Importance
1. **notif_ctr_last_7d** β€” 7-day notification click-through rate
2. **time_since_last_interaction** β€” seconds since last user action
3. **battery_level** β€” current battery percentage
4. **notif_shown_last_30min** β€” notification fatigue signal
5. **battery_change_rate** β€” battery drain rate

## Usage

```python
import pickle, numpy as np

with open("calibrated_model.pkl", "rb") as f:
    model = pickle.load(f)

# Feature order: hour_of_day, day_of_week, hour_sin, hour_cos, is_weekend, is_night,
# battery_level, is_charging, battery_change_rate, screen_on, screen_on_duration_30min,
# app_opens_last_hour, session_length_current, time_since_last_interaction,
# notif_shown_last_30min, notif_clicked_last_30min, notif_dismissed_last_30min,
# notif_ignored_last_30min, notif_shown_last_24h, notif_ctr_last_7d, recent_notification_density

features = np.array([[14, 2, 0.97, -0.22, 0, 0, 85.0, 0, -1.0, 1, 800, 6, 300, 15, 1, 1, 0, 0, 22, 0.4, 1.0]])
bad_timing_prob = model.predict_proba(features)[:, 1][0]
print(f"Bad timing probability: {bad_timing_prob:.3f}")  # ~0.07 = good time!
```

### Decision Thresholds
| Probability | Action |
|------------|--------|
| P < 0.3 | βœ… Send notification |
| 0.3 ≀ P ≀ 0.5 | ⚠️ Consider priority |
| P > 0.5 | 🚫 Delay notification |
| P > 0.8 | πŸ”΄ Definitely delay |

## Training Details
- Based on [C-3PO](https://arxiv.org/abs/1803.00458) (Cheetah Mobile, 600M MAU production system)
- 100K synthetic samples with realistic correlation patterns from mobile behavior research
- 70/15/15 train/val/test split
- Dataset: [alianassmaaa/notification-timing-dataset](https://huggingface.co/datasets/alianassmaaa/notification-timing-dataset)

## Files
| File | Description |
|------|-------------|
| `calibrated_model.pkl` | Calibrated model (recommended for deployment) |
| `ensemble_models.pkl` | 5-model ensemble (best accuracy) |
| `model_metadata.json` | Features, hyperparameters, metrics |
| `feature_importances.csv` | Feature importance rankings |
| `sweep_results.csv` | Full hyperparameter sweep results |