File size: 7,775 Bytes
67182ab bf0176a 67182ab bf0176a 67182ab 62851e9 bf0176a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 | ---
title: AI Executive Assistant Simulator
emoji: ๐ค
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 6.13.0
python_version: '3.10'
app_file: ui/app.py
pinned: false
---
# ๐ค AI Executive Assistant Simulator
> **OpenEnv RL Environment** โ An advanced reinforcement learning environment that simulates a smart executive assistant managing scheduling, inbox communication, and task prioritization.
[](https://python.org)
[](https://openenv.ai)
[](https://gradio.app)
---
## ๐ท Problem
Modern professionals struggle with **scheduling overload**, **task prioritization**, and **communication management**. An average executive handles 50+ decisions daily โ making this a rich environment for RL agents to learn optimal strategies.
## ๐ท Solution
An RL-powered executive assistant built on the **OpenEnv** framework that:
- ๐
**Manages schedules** with temporal reasoning and overlap detection
- โก **Resolves conflicts** using conflict graph modeling
- ๐ฌ **Handles messages** with urgency-aware prioritization
- ๐ง **Learns personalized strategies** through user preference modeling
- ๐ **Improves via curriculum learning** from easy โ hard scenarios
---
## ๐ง Advanced Features
| Feature | Description |
|---------|-------------|
| ๐ **Temporal Reasoning** | Duration-aware time slots with overlap detection |
| ๐ฏ **Multi-Objective Rewards** | 5 reward components: task, schedule, message, efficiency, preferences |
| ๐ค **User Preferences** | Personalization memory (preferred times, focus hours, meeting limits) |
| ๐๏ธ **Partial Observability** | Hidden tasks & delayed inbox revealed progressively |
| ๐ซ **Action Masking** | Invalid action prevention โ agents only see legal moves |
| ๐ **Conflict Graph** | Graph-based modeling of scheduling conflicts |
| ๐ **Curriculum Learning** | Auto-scaling difficulty: easy โ medium โ hard |
| ๐ **Metrics Tracking** | Completion rate, efficiency score, conflict count, response rate |
| ๐
**Gantt Timeline** | Interactive Plotly visualization of the schedule |
---
## ๐ Project Structure
```
ai-executive-assistant-openenv/
โ
โโโ openenv.yaml # OpenEnv environment manifest
โโโ README.md # This file
โโโ requirements.txt # Python dependencies
โ
โโโ env/ # Core environment
โ โโโ assistant_env.py # Main env class (OpenEnv entry point)
โ โโโ state.py # State representation + partial observability
โ โโโ actions.py # Action definitions + action masking
โ โโโ rewards.py # Multi-objective reward engine
โ โโโ scheduler.py # Temporal reasoning + conflict resolution
โ โโโ scenario_generator.py # Curriculum-aware scenario generation
โ โโโ utils.py # Time utilities, conflict detection, metrics
โ
โโโ agents/ # Agent implementations
โ โโโ random_agent.py # Random baseline (lower bound)
โ โโโ rule_based_agent.py # Priority heuristic (strong baseline)
โ โโโ rl_agent.py # Tabular Q-learning agent
โ
โโโ training/ # Training & evaluation
โ โโโ train_rl.py # Multi-agent training comparison
โ โโโ evaluate.py # Evaluation harness
โ โโโ plots.py # Visualization utilities
โ
โโโ ui/ # Interactive demo
โ โโโ app.py # Gradio web interface
โ โโโ timeline.py # Plotly Gantt timeline
โ
โโโ logs/ # Training outputs
โโโ reward_curves.png
โโโ agent_comparison.png
โโโ rl_metrics.png
```
---
## ๐ Quick Start
### Installation
```bash
pip install -r requirements.txt
```
### Run Training
```bash
python -m training.train_rl
```
This trains all 3 agents (Random, Rule-Based, Q-Learning) for 200 episodes and generates comparison plots in `logs/`.
### Run Evaluation
```bash
python -m training.evaluate
```
### Launch Interactive Demo
```bash
python -m ui.app
```
Then open `http://localhost:7860` in your browser.
---
## ๐ฎ Environment API
```python
from env.assistant_env import ExecutiveAssistantEnv
env = ExecutiveAssistantEnv(difficulty="medium", max_steps=50)
state = env.reset()
print(state["tasks"]) # List of task objects
print(state["inbox"]) # List of inbox messages
print(state["valid_actions"]) # Legal actions (action masking)
# Take a step
action = ("complete_task", 0) # Complete task with ID 0
next_state, reward, done, info = env.step(action)
```
### Action Space
| Action | Description |
|--------|-------------|
| `schedule_task` | Schedule a pending task into a time slot |
| `complete_task` | Mark a task as completed |
| `defer_task` | Postpone a task to a later time |
| `send_reply` | Reply to an inbox message |
| `reject_task` | Cancel a task |
| `ask_clarification` | Request more info about a task/message |
### Observation Space
```json
{
"time": "09:30",
"tasks": [
{"id": 0, "title": "Q4 Strategy Review", "time": "10:00",
"duration": 60, "priority": "high", "type": "meeting", "status": "pending"}
],
"inbox": [
{"id": 0, "sender": "CEO", "content": "Need figures ASAP",
"urgency": "high", "replied": false}
],
"preferences": {"preferred_meeting_times": ["09:00", "14:00"], ...},
"valid_actions": [("complete_task", 0), ("send_reply", 0), ...],
"action_mask": [1, 1, 1, 1, 1, 1]
}
```
---
## ๐ Results
| Agent | Avg Reward | Task Completion | Message Response | Efficiency |
|-------|-----------|----------------|------------------|------------|
| ๐ฒ Random | Low | ~30% | ~25% | ~25/100 |
| ๐ Rule-Based | Medium | ~65% | ~70% | ~55/100 |
| ๐ง Q-Learning | High | ~75% | ~80% | ~70/100 |
*Results vary by difficulty and random seed.*
---
## ๐๏ธ System Architecture
```
User / RL Agent
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโ
โ ExecutiveAssistantEnv โ
โ โโโโโโโโโโโโโโโโโโโ โ
โ โ ScenarioGeneratorโ โ โ Curriculum Learning
โ โโโโโโโฌโโโโโโโโโโโโ โ
โ โโโโโโโผโโโโโโโโโโโโ โ
โ โ State โ โ โ Partial Observability
โ โ (tasks + inbox) โ โ
โ โโโโโโโฌโโโโโโโโโโโโ โ
โ โโโโโโโผโโโโโโโโโโโโ โ
โ โ Scheduler โ โ โ Temporal Reasoning
โ โ (conflict graph) โ โ
โ โโโโโโโฌโโโโโโโโโโโโ โ
โ โโโโโโโผโโโโโโโโโโโโ โ
โ โ RewardEngine โ โ โ Multi-Objective Shaping
โ โ (5 components) โ โ
โ โโโโโโโโโโโโโโโโโโโ โ
โ โโโโโโโโโโโโโโโโโโโ โ
โ โ Action Masking โ โ โ Invalid Action Prevention
โ โโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
Observation + Reward + Done
```
---
## ๐ License
MIT License
---
## ๐ Acknowledgments
- Built for the **OpenEnv** platform
- Inspired by real-world executive assistant workflows
- Visualization powered by **Plotly** and **Gradio** |