| --- |
| title: AI Executive Assistant Simulator |
| emoji: ๐ค |
| colorFrom: blue |
| colorTo: indigo |
| sdk: gradio |
| sdk_version: 6.13.0 |
| python_version: '3.10' |
| app_file: ui/app.py |
| pinned: false |
| --- |
| |
| # ๐ค AI Executive Assistant Simulator |
|
|
| > **OpenEnv RL Environment** โ An advanced reinforcement learning environment that simulates a smart executive assistant managing scheduling, inbox communication, and task prioritization. |
|
|
| [](https://python.org) |
| [](https://openenv.ai) |
| [](https://gradio.app) |
|
|
| --- |
|
|
| ## ๐ท Problem |
|
|
| Modern professionals struggle with **scheduling overload**, **task prioritization**, and **communication management**. An average executive handles 50+ decisions daily โ making this a rich environment for RL agents to learn optimal strategies. |
|
|
| ## ๐ท Solution |
|
|
| An RL-powered executive assistant built on the **OpenEnv** framework that: |
|
|
| - ๐
**Manages schedules** with temporal reasoning and overlap detection |
| - โก **Resolves conflicts** using conflict graph modeling |
| - ๐ฌ **Handles messages** with urgency-aware prioritization |
| - ๐ง **Learns personalized strategies** through user preference modeling |
| - ๐ **Improves via curriculum learning** from easy โ hard scenarios |
|
|
| --- |
|
|
| ## ๐ง Advanced Features |
|
|
| | Feature | Description | |
| |---------|-------------| |
| | ๐ **Temporal Reasoning** | Duration-aware time slots with overlap detection | |
| | ๐ฏ **Multi-Objective Rewards** | 5 reward components: task, schedule, message, efficiency, preferences | |
| | ๐ค **User Preferences** | Personalization memory (preferred times, focus hours, meeting limits) | |
| | ๐๏ธ **Partial Observability** | Hidden tasks & delayed inbox revealed progressively | |
| | ๐ซ **Action Masking** | Invalid action prevention โ agents only see legal moves | |
| | ๐ **Conflict Graph** | Graph-based modeling of scheduling conflicts | |
| | ๐ **Curriculum Learning** | Auto-scaling difficulty: easy โ medium โ hard | |
| | ๐ **Metrics Tracking** | Completion rate, efficiency score, conflict count, response rate | |
| | ๐
**Gantt Timeline** | Interactive Plotly visualization of the schedule | |
|
|
| --- |
|
|
| ## ๐ Project Structure |
|
|
| ``` |
| ai-executive-assistant-openenv/ |
| โ |
| โโโ openenv.yaml # OpenEnv environment manifest |
| โโโ README.md # This file |
| โโโ requirements.txt # Python dependencies |
| โ |
| โโโ env/ # Core environment |
| โ โโโ assistant_env.py # Main env class (OpenEnv entry point) |
| โ โโโ state.py # State representation + partial observability |
| โ โโโ actions.py # Action definitions + action masking |
| โ โโโ rewards.py # Multi-objective reward engine |
| โ โโโ scheduler.py # Temporal reasoning + conflict resolution |
| โ โโโ scenario_generator.py # Curriculum-aware scenario generation |
| โ โโโ utils.py # Time utilities, conflict detection, metrics |
| โ |
| โโโ agents/ # Agent implementations |
| โ โโโ random_agent.py # Random baseline (lower bound) |
| โ โโโ rule_based_agent.py # Priority heuristic (strong baseline) |
| โ โโโ rl_agent.py # Tabular Q-learning agent |
| โ |
| โโโ training/ # Training & evaluation |
| โ โโโ train_rl.py # Multi-agent training comparison |
| โ โโโ evaluate.py # Evaluation harness |
| โ โโโ plots.py # Visualization utilities |
| โ |
| โโโ ui/ # Interactive demo |
| โ โโโ app.py # Gradio web interface |
| โ โโโ timeline.py # Plotly Gantt timeline |
| โ |
| โโโ logs/ # Training outputs |
| โโโ reward_curves.png |
| โโโ agent_comparison.png |
| โโโ rl_metrics.png |
| ``` |
|
|
| --- |
|
|
| ## ๐ Quick Start |
|
|
| ### Installation |
|
|
| ```bash |
| pip install -r requirements.txt |
| ``` |
|
|
| ### Run Training |
|
|
| ```bash |
| python -m training.train_rl |
| ``` |
|
|
| This trains all 3 agents (Random, Rule-Based, Q-Learning) for 200 episodes and generates comparison plots in `logs/`. |
|
|
| ### Run Evaluation |
|
|
| ```bash |
| python -m training.evaluate |
| ``` |
|
|
| ### Launch Interactive Demo |
|
|
| ```bash |
| python -m ui.app |
| ``` |
|
|
| Then open `http://localhost:7860` in your browser. |
|
|
| --- |
|
|
| ## ๐ฎ Environment API |
|
|
| ```python |
| from env.assistant_env import ExecutiveAssistantEnv |
| |
| env = ExecutiveAssistantEnv(difficulty="medium", max_steps=50) |
| |
| state = env.reset() |
| print(state["tasks"]) # List of task objects |
| print(state["inbox"]) # List of inbox messages |
| print(state["valid_actions"]) # Legal actions (action masking) |
| |
| # Take a step |
| action = ("complete_task", 0) # Complete task with ID 0 |
| next_state, reward, done, info = env.step(action) |
| ``` |
|
|
| ### Action Space |
|
|
| | Action | Description | |
| |--------|-------------| |
| | `schedule_task` | Schedule a pending task into a time slot | |
| | `complete_task` | Mark a task as completed | |
| | `defer_task` | Postpone a task to a later time | |
| | `send_reply` | Reply to an inbox message | |
| | `reject_task` | Cancel a task | |
| | `ask_clarification` | Request more info about a task/message | |
|
|
| ### Observation Space |
|
|
| ```json |
| { |
| "time": "09:30", |
| "tasks": [ |
| {"id": 0, "title": "Q4 Strategy Review", "time": "10:00", |
| "duration": 60, "priority": "high", "type": "meeting", "status": "pending"} |
| ], |
| "inbox": [ |
| {"id": 0, "sender": "CEO", "content": "Need figures ASAP", |
| "urgency": "high", "replied": false} |
| ], |
| "preferences": {"preferred_meeting_times": ["09:00", "14:00"], ...}, |
| "valid_actions": [("complete_task", 0), ("send_reply", 0), ...], |
| "action_mask": [1, 1, 1, 1, 1, 1] |
| } |
| ``` |
|
|
| --- |
|
|
| ## ๐ Results |
|
|
| | Agent | Avg Reward | Task Completion | Message Response | Efficiency | |
| |-------|-----------|----------------|------------------|------------| |
| | ๐ฒ Random | Low | ~30% | ~25% | ~25/100 | |
| | ๐ Rule-Based | Medium | ~65% | ~70% | ~55/100 | |
| | ๐ง Q-Learning | High | ~75% | ~80% | ~70/100 | |
|
|
| *Results vary by difficulty and random seed.* |
|
|
| --- |
|
|
| ## ๐๏ธ System Architecture |
|
|
| ``` |
| User / RL Agent |
| โ |
| โผ |
| โโโโโโโโโโโโโโโโโโโโโโโโ |
| โ ExecutiveAssistantEnv โ |
| โ โโโโโโโโโโโโโโโโโโโ โ |
| โ โ ScenarioGeneratorโ โ โ Curriculum Learning |
| โ โโโโโโโฌโโโโโโโโโโโโ โ |
| โ โโโโโโโผโโโโโโโโโโโโ โ |
| โ โ State โ โ โ Partial Observability |
| โ โ (tasks + inbox) โ โ |
| โ โโโโโโโฌโโโโโโโโโโโโ โ |
| โ โโโโโโโผโโโโโโโโโโโโ โ |
| โ โ Scheduler โ โ โ Temporal Reasoning |
| โ โ (conflict graph) โ โ |
| โ โโโโโโโฌโโโโโโโโโโโโ โ |
| โ โโโโโโโผโโโโโโโโโโโโ โ |
| โ โ RewardEngine โ โ โ Multi-Objective Shaping |
| โ โ (5 components) โ โ |
| โ โโโโโโโโโโโโโโโโโโโ โ |
| โ โโโโโโโโโโโโโโโโโโโ โ |
| โ โ Action Masking โ โ โ Invalid Action Prevention |
| โ โโโโโโโโโโโโโโโโโโโ โ |
| โโโโโโโโโโโโโโโโโโโโโโโโ |
| โ |
| โผ |
| Observation + Reward + Done |
| ``` |
|
|
| --- |
|
|
| ## ๐ License |
|
|
| MIT License |
|
|
| --- |
|
|
| ## ๐ Acknowledgments |
|
|
| - Built for the **OpenEnv** platform |
| - Inspired by real-world executive assistant workflows |
| - Visualization powered by **Plotly** and **Gradio** |