mahammadaftab's picture
Update README.md
bf0176a verified
---
title: AI Executive Assistant Simulator
emoji: ๐Ÿค–
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 6.13.0
python_version: '3.10'
app_file: ui/app.py
pinned: false
---
# ๐Ÿค– AI Executive Assistant Simulator
> **OpenEnv RL Environment** โ€” An advanced reinforcement learning environment that simulates a smart executive assistant managing scheduling, inbox communication, and task prioritization.
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://python.org)
[![OpenEnv](https://img.shields.io/badge/OpenEnv-compatible-green.svg)](https://openenv.ai)
[![Gradio](https://img.shields.io/badge/demo-Gradio-orange.svg)](https://gradio.app)
---
## ๐Ÿ”ท Problem
Modern professionals struggle with **scheduling overload**, **task prioritization**, and **communication management**. An average executive handles 50+ decisions daily โ€” making this a rich environment for RL agents to learn optimal strategies.
## ๐Ÿ”ท Solution
An RL-powered executive assistant built on the **OpenEnv** framework that:
- ๐Ÿ“… **Manages schedules** with temporal reasoning and overlap detection
- โšก **Resolves conflicts** using conflict graph modeling
- ๐Ÿ“ฌ **Handles messages** with urgency-aware prioritization
- ๐Ÿง  **Learns personalized strategies** through user preference modeling
- ๐Ÿ“ˆ **Improves via curriculum learning** from easy โ†’ hard scenarios
---
## ๐Ÿง  Advanced Features
| Feature | Description |
|---------|-------------|
| ๐Ÿ• **Temporal Reasoning** | Duration-aware time slots with overlap detection |
| ๐ŸŽฏ **Multi-Objective Rewards** | 5 reward components: task, schedule, message, efficiency, preferences |
| ๐Ÿ‘ค **User Preferences** | Personalization memory (preferred times, focus hours, meeting limits) |
| ๐Ÿ‘๏ธ **Partial Observability** | Hidden tasks & delayed inbox revealed progressively |
| ๐Ÿšซ **Action Masking** | Invalid action prevention โ€” agents only see legal moves |
| ๐Ÿ”— **Conflict Graph** | Graph-based modeling of scheduling conflicts |
| ๐Ÿ“š **Curriculum Learning** | Auto-scaling difficulty: easy โ†’ medium โ†’ hard |
| ๐Ÿ“Š **Metrics Tracking** | Completion rate, efficiency score, conflict count, response rate |
| ๐Ÿ“… **Gantt Timeline** | Interactive Plotly visualization of the schedule |
---
## ๐Ÿ“ Project Structure
```
ai-executive-assistant-openenv/
โ”‚
โ”œโ”€โ”€ openenv.yaml # OpenEnv environment manifest
โ”œโ”€โ”€ README.md # This file
โ”œโ”€โ”€ requirements.txt # Python dependencies
โ”‚
โ”œโ”€โ”€ env/ # Core environment
โ”‚ โ”œโ”€โ”€ assistant_env.py # Main env class (OpenEnv entry point)
โ”‚ โ”œโ”€โ”€ state.py # State representation + partial observability
โ”‚ โ”œโ”€โ”€ actions.py # Action definitions + action masking
โ”‚ โ”œโ”€โ”€ rewards.py # Multi-objective reward engine
โ”‚ โ”œโ”€โ”€ scheduler.py # Temporal reasoning + conflict resolution
โ”‚ โ”œโ”€โ”€ scenario_generator.py # Curriculum-aware scenario generation
โ”‚ โ””โ”€โ”€ utils.py # Time utilities, conflict detection, metrics
โ”‚
โ”œโ”€โ”€ agents/ # Agent implementations
โ”‚ โ”œโ”€โ”€ random_agent.py # Random baseline (lower bound)
โ”‚ โ”œโ”€โ”€ rule_based_agent.py # Priority heuristic (strong baseline)
โ”‚ โ””โ”€โ”€ rl_agent.py # Tabular Q-learning agent
โ”‚
โ”œโ”€โ”€ training/ # Training & evaluation
โ”‚ โ”œโ”€โ”€ train_rl.py # Multi-agent training comparison
โ”‚ โ”œโ”€โ”€ evaluate.py # Evaluation harness
โ”‚ โ””โ”€โ”€ plots.py # Visualization utilities
โ”‚
โ”œโ”€โ”€ ui/ # Interactive demo
โ”‚ โ”œโ”€โ”€ app.py # Gradio web interface
โ”‚ โ””โ”€โ”€ timeline.py # Plotly Gantt timeline
โ”‚
โ””โ”€โ”€ logs/ # Training outputs
โ”œโ”€โ”€ reward_curves.png
โ”œโ”€โ”€ agent_comparison.png
โ””โ”€โ”€ rl_metrics.png
```
---
## ๐Ÿš€ Quick Start
### Installation
```bash
pip install -r requirements.txt
```
### Run Training
```bash
python -m training.train_rl
```
This trains all 3 agents (Random, Rule-Based, Q-Learning) for 200 episodes and generates comparison plots in `logs/`.
### Run Evaluation
```bash
python -m training.evaluate
```
### Launch Interactive Demo
```bash
python -m ui.app
```
Then open `http://localhost:7860` in your browser.
---
## ๐ŸŽฎ Environment API
```python
from env.assistant_env import ExecutiveAssistantEnv
env = ExecutiveAssistantEnv(difficulty="medium", max_steps=50)
state = env.reset()
print(state["tasks"]) # List of task objects
print(state["inbox"]) # List of inbox messages
print(state["valid_actions"]) # Legal actions (action masking)
# Take a step
action = ("complete_task", 0) # Complete task with ID 0
next_state, reward, done, info = env.step(action)
```
### Action Space
| Action | Description |
|--------|-------------|
| `schedule_task` | Schedule a pending task into a time slot |
| `complete_task` | Mark a task as completed |
| `defer_task` | Postpone a task to a later time |
| `send_reply` | Reply to an inbox message |
| `reject_task` | Cancel a task |
| `ask_clarification` | Request more info about a task/message |
### Observation Space
```json
{
"time": "09:30",
"tasks": [
{"id": 0, "title": "Q4 Strategy Review", "time": "10:00",
"duration": 60, "priority": "high", "type": "meeting", "status": "pending"}
],
"inbox": [
{"id": 0, "sender": "CEO", "content": "Need figures ASAP",
"urgency": "high", "replied": false}
],
"preferences": {"preferred_meeting_times": ["09:00", "14:00"], ...},
"valid_actions": [("complete_task", 0), ("send_reply", 0), ...],
"action_mask": [1, 1, 1, 1, 1, 1]
}
```
---
## ๐Ÿ“ˆ Results
| Agent | Avg Reward | Task Completion | Message Response | Efficiency |
|-------|-----------|----------------|------------------|------------|
| ๐ŸŽฒ Random | Low | ~30% | ~25% | ~25/100 |
| ๐Ÿ“‹ Rule-Based | Medium | ~65% | ~70% | ~55/100 |
| ๐Ÿง  Q-Learning | High | ~75% | ~80% | ~70/100 |
*Results vary by difficulty and random seed.*
---
## ๐Ÿ—๏ธ System Architecture
```
User / RL Agent
โ”‚
โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ ExecutiveAssistantEnv โ”‚
โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚ โ”‚ ScenarioGeneratorโ”‚ โ”‚ โ† Curriculum Learning
โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚ โ”‚ State โ”‚ โ”‚ โ† Partial Observability
โ”‚ โ”‚ (tasks + inbox) โ”‚ โ”‚
โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚ โ”‚ Scheduler โ”‚ โ”‚ โ† Temporal Reasoning
โ”‚ โ”‚ (conflict graph) โ”‚ โ”‚
โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚ โ”‚ RewardEngine โ”‚ โ”‚ โ† Multi-Objective Shaping
โ”‚ โ”‚ (5 components) โ”‚ โ”‚
โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚ โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚ โ”‚ Action Masking โ”‚ โ”‚ โ† Invalid Action Prevention
โ”‚ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
โ”‚
โ–ผ
Observation + Reward + Done
```
---
## ๐Ÿ“œ License
MIT License
---
## ๐Ÿ™ Acknowledgments
- Built for the **OpenEnv** platform
- Inspired by real-world executive assistant workflows
- Visualization powered by **Plotly** and **Gradio**