--- title: AI Executive Assistant Simulator emoji: ๐Ÿค– colorFrom: blue colorTo: indigo sdk: gradio sdk_version: 6.13.0 python_version: '3.10' app_file: ui/app.py pinned: false --- # ๐Ÿค– AI Executive Assistant Simulator > **OpenEnv RL Environment** โ€” An advanced reinforcement learning environment that simulates a smart executive assistant managing scheduling, inbox communication, and task prioritization. [![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://python.org) [![OpenEnv](https://img.shields.io/badge/OpenEnv-compatible-green.svg)](https://openenv.ai) [![Gradio](https://img.shields.io/badge/demo-Gradio-orange.svg)](https://gradio.app) --- ## ๐Ÿ”ท Problem Modern professionals struggle with **scheduling overload**, **task prioritization**, and **communication management**. An average executive handles 50+ decisions daily โ€” making this a rich environment for RL agents to learn optimal strategies. ## ๐Ÿ”ท Solution An RL-powered executive assistant built on the **OpenEnv** framework that: - ๐Ÿ“… **Manages schedules** with temporal reasoning and overlap detection - โšก **Resolves conflicts** using conflict graph modeling - ๐Ÿ“ฌ **Handles messages** with urgency-aware prioritization - ๐Ÿง  **Learns personalized strategies** through user preference modeling - ๐Ÿ“ˆ **Improves via curriculum learning** from easy โ†’ hard scenarios --- ## ๐Ÿง  Advanced Features | Feature | Description | |---------|-------------| | ๐Ÿ• **Temporal Reasoning** | Duration-aware time slots with overlap detection | | ๐ŸŽฏ **Multi-Objective Rewards** | 5 reward components: task, schedule, message, efficiency, preferences | | ๐Ÿ‘ค **User Preferences** | Personalization memory (preferred times, focus hours, meeting limits) | | ๐Ÿ‘๏ธ **Partial Observability** | Hidden tasks & delayed inbox revealed progressively | | ๐Ÿšซ **Action Masking** | Invalid action prevention โ€” agents only see legal moves | | ๐Ÿ”— **Conflict Graph** | Graph-based modeling of scheduling conflicts | | ๐Ÿ“š **Curriculum Learning** | Auto-scaling difficulty: easy โ†’ medium โ†’ hard | | ๐Ÿ“Š **Metrics Tracking** | Completion rate, efficiency score, conflict count, response rate | | ๐Ÿ“… **Gantt Timeline** | Interactive Plotly visualization of the schedule | --- ## ๐Ÿ“ Project Structure ``` ai-executive-assistant-openenv/ โ”‚ โ”œโ”€โ”€ openenv.yaml # OpenEnv environment manifest โ”œโ”€โ”€ README.md # This file โ”œโ”€โ”€ requirements.txt # Python dependencies โ”‚ โ”œโ”€โ”€ env/ # Core environment โ”‚ โ”œโ”€โ”€ assistant_env.py # Main env class (OpenEnv entry point) โ”‚ โ”œโ”€โ”€ state.py # State representation + partial observability โ”‚ โ”œโ”€โ”€ actions.py # Action definitions + action masking โ”‚ โ”œโ”€โ”€ rewards.py # Multi-objective reward engine โ”‚ โ”œโ”€โ”€ scheduler.py # Temporal reasoning + conflict resolution โ”‚ โ”œโ”€โ”€ scenario_generator.py # Curriculum-aware scenario generation โ”‚ โ””โ”€โ”€ utils.py # Time utilities, conflict detection, metrics โ”‚ โ”œโ”€โ”€ agents/ # Agent implementations โ”‚ โ”œโ”€โ”€ random_agent.py # Random baseline (lower bound) โ”‚ โ”œโ”€โ”€ rule_based_agent.py # Priority heuristic (strong baseline) โ”‚ โ””โ”€โ”€ rl_agent.py # Tabular Q-learning agent โ”‚ โ”œโ”€โ”€ training/ # Training & evaluation โ”‚ โ”œโ”€โ”€ train_rl.py # Multi-agent training comparison โ”‚ โ”œโ”€โ”€ evaluate.py # Evaluation harness โ”‚ โ””โ”€โ”€ plots.py # Visualization utilities โ”‚ โ”œโ”€โ”€ ui/ # Interactive demo โ”‚ โ”œโ”€โ”€ app.py # Gradio web interface โ”‚ โ””โ”€โ”€ timeline.py # Plotly Gantt timeline โ”‚ โ””โ”€โ”€ logs/ # Training outputs โ”œโ”€โ”€ reward_curves.png โ”œโ”€โ”€ agent_comparison.png โ””โ”€โ”€ rl_metrics.png ``` --- ## ๐Ÿš€ Quick Start ### Installation ```bash pip install -r requirements.txt ``` ### Run Training ```bash python -m training.train_rl ``` This trains all 3 agents (Random, Rule-Based, Q-Learning) for 200 episodes and generates comparison plots in `logs/`. ### Run Evaluation ```bash python -m training.evaluate ``` ### Launch Interactive Demo ```bash python -m ui.app ``` Then open `http://localhost:7860` in your browser. --- ## ๐ŸŽฎ Environment API ```python from env.assistant_env import ExecutiveAssistantEnv env = ExecutiveAssistantEnv(difficulty="medium", max_steps=50) state = env.reset() print(state["tasks"]) # List of task objects print(state["inbox"]) # List of inbox messages print(state["valid_actions"]) # Legal actions (action masking) # Take a step action = ("complete_task", 0) # Complete task with ID 0 next_state, reward, done, info = env.step(action) ``` ### Action Space | Action | Description | |--------|-------------| | `schedule_task` | Schedule a pending task into a time slot | | `complete_task` | Mark a task as completed | | `defer_task` | Postpone a task to a later time | | `send_reply` | Reply to an inbox message | | `reject_task` | Cancel a task | | `ask_clarification` | Request more info about a task/message | ### Observation Space ```json { "time": "09:30", "tasks": [ {"id": 0, "title": "Q4 Strategy Review", "time": "10:00", "duration": 60, "priority": "high", "type": "meeting", "status": "pending"} ], "inbox": [ {"id": 0, "sender": "CEO", "content": "Need figures ASAP", "urgency": "high", "replied": false} ], "preferences": {"preferred_meeting_times": ["09:00", "14:00"], ...}, "valid_actions": [("complete_task", 0), ("send_reply", 0), ...], "action_mask": [1, 1, 1, 1, 1, 1] } ``` --- ## ๐Ÿ“ˆ Results | Agent | Avg Reward | Task Completion | Message Response | Efficiency | |-------|-----------|----------------|------------------|------------| | ๐ŸŽฒ Random | Low | ~30% | ~25% | ~25/100 | | ๐Ÿ“‹ Rule-Based | Medium | ~65% | ~70% | ~55/100 | | ๐Ÿง  Q-Learning | High | ~75% | ~80% | ~70/100 | *Results vary by difficulty and random seed.* --- ## ๐Ÿ—๏ธ System Architecture ``` User / RL Agent โ”‚ โ–ผ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ ExecutiveAssistantEnv โ”‚ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ โ”‚ โ”‚ ScenarioGeneratorโ”‚ โ”‚ โ† Curriculum Learning โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ โ”‚ โ”‚ State โ”‚ โ”‚ โ† Partial Observability โ”‚ โ”‚ (tasks + inbox) โ”‚ โ”‚ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ โ”‚ โ”‚ Scheduler โ”‚ โ”‚ โ† Temporal Reasoning โ”‚ โ”‚ (conflict graph) โ”‚ โ”‚ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ โ”‚ โ”‚ RewardEngine โ”‚ โ”‚ โ† Multi-Objective Shaping โ”‚ โ”‚ (5 components) โ”‚ โ”‚ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚ โ”‚ โ”‚ Action Masking โ”‚ โ”‚ โ† Invalid Action Prevention โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚ โ–ผ Observation + Reward + Done ``` --- ## ๐Ÿ“œ License MIT License --- ## ๐Ÿ™ Acknowledgments - Built for the **OpenEnv** platform - Inspired by real-world executive assistant workflows - Visualization powered by **Plotly** and **Gradio**