File size: 7,775 Bytes
67182ab
 
bf0176a
67182ab
 
 
bf0176a
 
67182ab
 
 
 
62851e9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bf0176a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
---
title: AI Executive Assistant Simulator
emoji: ๐Ÿค–
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 6.13.0
python_version: '3.10'
app_file: ui/app.py
pinned: false
---

# ๐Ÿค– AI Executive Assistant Simulator

> **OpenEnv RL Environment** โ€” An advanced reinforcement learning environment that simulates a smart executive assistant managing scheduling, inbox communication, and task prioritization.

[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://python.org)
[![OpenEnv](https://img.shields.io/badge/OpenEnv-compatible-green.svg)](https://openenv.ai)
[![Gradio](https://img.shields.io/badge/demo-Gradio-orange.svg)](https://gradio.app)

---

## ๐Ÿ”ท Problem

Modern professionals struggle with **scheduling overload**, **task prioritization**, and **communication management**. An average executive handles 50+ decisions daily โ€” making this a rich environment for RL agents to learn optimal strategies.

## ๐Ÿ”ท Solution

An RL-powered executive assistant built on the **OpenEnv** framework that:

- ๐Ÿ“… **Manages schedules** with temporal reasoning and overlap detection
- โšก **Resolves conflicts** using conflict graph modeling
- ๐Ÿ“ฌ **Handles messages** with urgency-aware prioritization
- ๐Ÿง  **Learns personalized strategies** through user preference modeling
- ๐Ÿ“ˆ **Improves via curriculum learning** from easy โ†’ hard scenarios

---

## ๐Ÿง  Advanced Features

| Feature | Description |
|---------|-------------|
| ๐Ÿ• **Temporal Reasoning** | Duration-aware time slots with overlap detection |
| ๐ŸŽฏ **Multi-Objective Rewards** | 5 reward components: task, schedule, message, efficiency, preferences |
| ๐Ÿ‘ค **User Preferences** | Personalization memory (preferred times, focus hours, meeting limits) |
| ๐Ÿ‘๏ธ **Partial Observability** | Hidden tasks & delayed inbox revealed progressively |
| ๐Ÿšซ **Action Masking** | Invalid action prevention โ€” agents only see legal moves |
| ๐Ÿ”— **Conflict Graph** | Graph-based modeling of scheduling conflicts |
| ๐Ÿ“š **Curriculum Learning** | Auto-scaling difficulty: easy โ†’ medium โ†’ hard |
| ๐Ÿ“Š **Metrics Tracking** | Completion rate, efficiency score, conflict count, response rate |
| ๐Ÿ“… **Gantt Timeline** | Interactive Plotly visualization of the schedule |

---

## ๐Ÿ“ Project Structure

```
ai-executive-assistant-openenv/
โ”‚
โ”œโ”€โ”€ openenv.yaml              # OpenEnv environment manifest
โ”œโ”€โ”€ README.md                 # This file
โ”œโ”€โ”€ requirements.txt          # Python dependencies
โ”‚
โ”œโ”€โ”€ env/                      # Core environment
โ”‚   โ”œโ”€โ”€ assistant_env.py      # Main env class (OpenEnv entry point)
โ”‚   โ”œโ”€โ”€ state.py              # State representation + partial observability
โ”‚   โ”œโ”€โ”€ actions.py            # Action definitions + action masking
โ”‚   โ”œโ”€โ”€ rewards.py            # Multi-objective reward engine
โ”‚   โ”œโ”€โ”€ scheduler.py          # Temporal reasoning + conflict resolution
โ”‚   โ”œโ”€โ”€ scenario_generator.py # Curriculum-aware scenario generation
โ”‚   โ””โ”€โ”€ utils.py              # Time utilities, conflict detection, metrics
โ”‚
โ”œโ”€โ”€ agents/                   # Agent implementations
โ”‚   โ”œโ”€โ”€ random_agent.py       # Random baseline (lower bound)
โ”‚   โ”œโ”€โ”€ rule_based_agent.py   # Priority heuristic (strong baseline)
โ”‚   โ””โ”€โ”€ rl_agent.py           # Tabular Q-learning agent
โ”‚
โ”œโ”€โ”€ training/                 # Training & evaluation
โ”‚   โ”œโ”€โ”€ train_rl.py           # Multi-agent training comparison
โ”‚   โ”œโ”€โ”€ evaluate.py           # Evaluation harness
โ”‚   โ””โ”€โ”€ plots.py              # Visualization utilities
โ”‚
โ”œโ”€โ”€ ui/                       # Interactive demo
โ”‚   โ”œโ”€โ”€ app.py                # Gradio web interface
โ”‚   โ””โ”€โ”€ timeline.py           # Plotly Gantt timeline
โ”‚
โ””โ”€โ”€ logs/                     # Training outputs
    โ”œโ”€โ”€ reward_curves.png
    โ”œโ”€โ”€ agent_comparison.png
    โ””โ”€โ”€ rl_metrics.png
```

---

## ๐Ÿš€ Quick Start

### Installation

```bash
pip install -r requirements.txt
```

### Run Training

```bash
python -m training.train_rl
```

This trains all 3 agents (Random, Rule-Based, Q-Learning) for 200 episodes and generates comparison plots in `logs/`.

### Run Evaluation

```bash
python -m training.evaluate
```

### Launch Interactive Demo

```bash
python -m ui.app
```

Then open `http://localhost:7860` in your browser.

---

## ๐ŸŽฎ Environment API

```python
from env.assistant_env import ExecutiveAssistantEnv

env = ExecutiveAssistantEnv(difficulty="medium", max_steps=50)

state = env.reset()
print(state["tasks"])      # List of task objects
print(state["inbox"])       # List of inbox messages
print(state["valid_actions"])  # Legal actions (action masking)

# Take a step
action = ("complete_task", 0)  # Complete task with ID 0
next_state, reward, done, info = env.step(action)
```

### Action Space

| Action | Description |
|--------|-------------|
| `schedule_task` | Schedule a pending task into a time slot |
| `complete_task` | Mark a task as completed |
| `defer_task` | Postpone a task to a later time |
| `send_reply` | Reply to an inbox message |
| `reject_task` | Cancel a task |
| `ask_clarification` | Request more info about a task/message |

### Observation Space

```json
{
  "time": "09:30",
  "tasks": [
    {"id": 0, "title": "Q4 Strategy Review", "time": "10:00",
     "duration": 60, "priority": "high", "type": "meeting", "status": "pending"}
  ],
  "inbox": [
    {"id": 0, "sender": "CEO", "content": "Need figures ASAP",
     "urgency": "high", "replied": false}
  ],
  "preferences": {"preferred_meeting_times": ["09:00", "14:00"], ...},
  "valid_actions": [("complete_task", 0), ("send_reply", 0), ...],
  "action_mask": [1, 1, 1, 1, 1, 1]
}
```

---

## ๐Ÿ“ˆ Results

| Agent | Avg Reward | Task Completion | Message Response | Efficiency |
|-------|-----------|----------------|------------------|------------|
| ๐ŸŽฒ Random | Low | ~30% | ~25% | ~25/100 |
| ๐Ÿ“‹ Rule-Based | Medium | ~65% | ~70% | ~55/100 |
| ๐Ÿง  Q-Learning | High | ~75% | ~80% | ~70/100 |

*Results vary by difficulty and random seed.*

---

## ๐Ÿ—๏ธ System Architecture

```
User / RL Agent
       โ”‚
       โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  ExecutiveAssistantEnv โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚ ScenarioGeneratorโ”‚ โ”‚ โ† Curriculum Learning
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚     State        โ”‚ โ”‚ โ† Partial Observability
โ”‚  โ”‚ (tasks + inbox)  โ”‚ โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚   Scheduler      โ”‚ โ”‚ โ† Temporal Reasoning
โ”‚  โ”‚ (conflict graph) โ”‚ โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚  RewardEngine    โ”‚ โ”‚ โ† Multi-Objective Shaping
โ”‚  โ”‚ (5 components)   โ”‚ โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚  Action Masking  โ”‚ โ”‚ โ† Invalid Action Prevention
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
       โ”‚
       โ–ผ
   Observation + Reward + Done
```

---

## ๐Ÿ“œ License

MIT License

---

## ๐Ÿ™ Acknowledgments

- Built for the **OpenEnv** platform
- Inspired by real-world executive assistant workflows
- Visualization powered by **Plotly** and **Gradio**