Spaces:

mahammadaftab
/

AI_Executive_Assistant_Simulator

Sleeping

App Files Files Community

AI_Executive_Assistant_Simulator / README.md

mahammadaftab

Update README.md

bf0176a verified 12 days ago

preview code

raw

history blame contribute delete

7.78 kB

	---
	title: AI Executive Assistant Simulator
	emoji: 🤖
	colorFrom: blue
	colorTo: indigo
	sdk: gradio
	sdk_version: 6.13.0
	python_version: '3.10'
	app_file: ui/app.py
	pinned: false
	---

	# 🤖 AI Executive Assistant Simulator

	> OpenEnv RL Environment — An advanced reinforcement learning environment that simulates a smart executive assistant managing scheduling, inbox communication, and task prioritization.

	[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://python.org)
	[![OpenEnv](https://img.shields.io/badge/OpenEnv-compatible-green.svg)](https://openenv.ai)
	[![Gradio](https://img.shields.io/badge/demo-Gradio-orange.svg)](https://gradio.app)

	---

	## 🔷 Problem

	Modern professionals struggle with scheduling overload, task prioritization, and communication management. An average executive handles 50+ decisions daily — making this a rich environment for RL agents to learn optimal strategies.

	## 🔷 Solution

	An RL-powered executive assistant built on the OpenEnv framework that:

	- 📅 Manages schedules with temporal reasoning and overlap detection
	- ⚡ Resolves conflicts using conflict graph modeling
	- 📬 Handles messages with urgency-aware prioritization
	- 🧠 Learns personalized strategies through user preference modeling
	- 📈 Improves via curriculum learning from easy → hard scenarios

	---

	## 🧠 Advanced Features

	\| Feature \| Description \|
	\|---------\|-------------\|
	\| 🕐 Temporal Reasoning \| Duration-aware time slots with overlap detection \|
	\| 🎯 Multi-Objective Rewards \| 5 reward components: task, schedule, message, efficiency, preferences \|
	\| 👤 User Preferences \| Personalization memory (preferred times, focus hours, meeting limits) \|
	\| 👁️ Partial Observability \| Hidden tasks & delayed inbox revealed progressively \|
	\| 🚫 Action Masking \| Invalid action prevention — agents only see legal moves \|
	\| 🔗 Conflict Graph \| Graph-based modeling of scheduling conflicts \|
	\| 📚 Curriculum Learning \| Auto-scaling difficulty: easy → medium → hard \|
	\| 📊 Metrics Tracking \| Completion rate, efficiency score, conflict count, response rate \|
	\| 📅 Gantt Timeline \| Interactive Plotly visualization of the schedule \|

	---

	## 📁 Project Structure

	```
	ai-executive-assistant-openenv/
	│
	├── openenv.yaml # OpenEnv environment manifest
	├── README.md # This file
	├── requirements.txt # Python dependencies
	│
	├── env/ # Core environment
	│ ├── assistant_env.py # Main env class (OpenEnv entry point)
	│ ├── state.py # State representation + partial observability
	│ ├── actions.py # Action definitions + action masking
	│ ├── rewards.py # Multi-objective reward engine
	│ ├── scheduler.py # Temporal reasoning + conflict resolution
	│ ├── scenario_generator.py # Curriculum-aware scenario generation
	│ └── utils.py # Time utilities, conflict detection, metrics
	│
	├── agents/ # Agent implementations
	│ ├── random_agent.py # Random baseline (lower bound)
	│ ├── rule_based_agent.py # Priority heuristic (strong baseline)
	│ └── rl_agent.py # Tabular Q-learning agent
	│
	├── training/ # Training & evaluation
	│ ├── train_rl.py # Multi-agent training comparison
	│ ├── evaluate.py # Evaluation harness
	│ └── plots.py # Visualization utilities
	│
	├── ui/ # Interactive demo
	│ ├── app.py # Gradio web interface
	│ └── timeline.py # Plotly Gantt timeline
	│
	└── logs/ # Training outputs
	├── reward_curves.png
	├── agent_comparison.png
	└── rl_metrics.png
	```

	---

	## 🚀 Quick Start

	### Installation

	```bash
	pip install -r requirements.txt
	```

	### Run Training

	```bash
	python -m training.train_rl
	```

	This trains all 3 agents (Random, Rule-Based, Q-Learning) for 200 episodes and generates comparison plots in `logs/`.

	### Run Evaluation

	```bash
	python -m training.evaluate
	```

	### Launch Interactive Demo

	```bash
	python -m ui.app
	```

	Then open `http://localhost:7860` in your browser.

	---

	## 🎮 Environment API

	```python
	from env.assistant_env import ExecutiveAssistantEnv

	env = ExecutiveAssistantEnv(difficulty="medium", max_steps=50)

	state = env.reset()
	print(state["tasks"]) # List of task objects
	print(state["inbox"]) # List of inbox messages
	print(state["valid_actions"]) # Legal actions (action masking)

	# Take a step
	action = ("complete_task", 0) # Complete task with ID 0
	next_state, reward, done, info = env.step(action)
	```

	### Action Space

	\| Action \| Description \|
	\|--------\|-------------\|
	\| `schedule_task` \| Schedule a pending task into a time slot \|
	\| `complete_task` \| Mark a task as completed \|
	\| `defer_task` \| Postpone a task to a later time \|
	\| `send_reply` \| Reply to an inbox message \|
	\| `reject_task` \| Cancel a task \|
	\| `ask_clarification` \| Request more info about a task/message \|

	### Observation Space

	```json
	{
	"time": "09:30",
	"tasks": [
	{"id": 0, "title": "Q4 Strategy Review", "time": "10:00",
	"duration": 60, "priority": "high", "type": "meeting", "status": "pending"}
	],
	"inbox": [
	{"id": 0, "sender": "CEO", "content": "Need figures ASAP",
	"urgency": "high", "replied": false}
	],
	"preferences": {"preferred_meeting_times": ["09:00", "14:00"], ...},
	"valid_actions": [("complete_task", 0), ("send_reply", 0), ...],
	"action_mask": [1, 1, 1, 1, 1, 1]
	}
	```

	---

	## 📈 Results

	\| Agent \| Avg Reward \| Task Completion \| Message Response \| Efficiency \|
	\|-------\|-----------\|----------------\|------------------\|------------\|
	\| 🎲 Random \| Low \| ~30% \| ~25% \| ~25/100 \|
	\| 📋 Rule-Based \| Medium \| ~65% \| ~70% \| ~55/100 \|
	\| 🧠 Q-Learning \| High \| ~75% \| ~80% \| ~70/100 \|

	Results vary by difficulty and random seed.

	---

	## 🏗️ System Architecture

	```
	User / RL Agent
	│
	▼
	┌──────────────────────┐
	│ ExecutiveAssistantEnv │
	│ ┌─────────────────┐ │
	│ │ ScenarioGenerator│ │ ← Curriculum Learning
	│ └─────┬───────────┘ │
	│ ┌─────▼───────────┐ │
	│ │ State │ │ ← Partial Observability
	│ │ (tasks + inbox) │ │
	│ └─────┬───────────┘ │
	│ ┌─────▼───────────┐ │
	│ │ Scheduler │ │ ← Temporal Reasoning
	│ │ (conflict graph) │ │
	│ └─────┬───────────┘ │
	│ ┌─────▼───────────┐ │
	│ │ RewardEngine │ │ ← Multi-Objective Shaping
	│ │ (5 components) │ │
	│ └─────────────────┘ │
	│ ┌─────────────────┐ │
	│ │ Action Masking │ │ ← Invalid Action Prevention
	│ └─────────────────┘ │
	└──────────────────────┘
	│
	▼
	Observation + Reward + Done
	```

	---

	## 📜 License

	MIT License

	---

	## 🙏 Acknowledgments

	- Built for the OpenEnv platform
	- Inspired by real-world executive assistant workflows
	- Visualization powered by Plotly and Gradio