mahammadaftab's picture
Update README.md
bf0176a verified

A newer version of the Gradio SDK is available: 6.14.0

Upgrade
metadata
title: AI Executive Assistant Simulator
emoji: ๐Ÿค–
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 6.13.0
python_version: '3.10'
app_file: ui/app.py
pinned: false

๐Ÿค– AI Executive Assistant Simulator

OpenEnv RL Environment โ€” An advanced reinforcement learning environment that simulates a smart executive assistant managing scheduling, inbox communication, and task prioritization.

Python 3.8+ OpenEnv Gradio


๐Ÿ”ท Problem

Modern professionals struggle with scheduling overload, task prioritization, and communication management. An average executive handles 50+ decisions daily โ€” making this a rich environment for RL agents to learn optimal strategies.

๐Ÿ”ท Solution

An RL-powered executive assistant built on the OpenEnv framework that:

  • ๐Ÿ“… Manages schedules with temporal reasoning and overlap detection
  • โšก Resolves conflicts using conflict graph modeling
  • ๐Ÿ“ฌ Handles messages with urgency-aware prioritization
  • ๐Ÿง  Learns personalized strategies through user preference modeling
  • ๐Ÿ“ˆ Improves via curriculum learning from easy โ†’ hard scenarios

๐Ÿง  Advanced Features

Feature Description
๐Ÿ• Temporal Reasoning Duration-aware time slots with overlap detection
๐ŸŽฏ Multi-Objective Rewards 5 reward components: task, schedule, message, efficiency, preferences
๐Ÿ‘ค User Preferences Personalization memory (preferred times, focus hours, meeting limits)
๐Ÿ‘๏ธ Partial Observability Hidden tasks & delayed inbox revealed progressively
๐Ÿšซ Action Masking Invalid action prevention โ€” agents only see legal moves
๐Ÿ”— Conflict Graph Graph-based modeling of scheduling conflicts
๐Ÿ“š Curriculum Learning Auto-scaling difficulty: easy โ†’ medium โ†’ hard
๐Ÿ“Š Metrics Tracking Completion rate, efficiency score, conflict count, response rate
๐Ÿ“… Gantt Timeline Interactive Plotly visualization of the schedule

๐Ÿ“ Project Structure

ai-executive-assistant-openenv/
โ”‚
โ”œโ”€โ”€ openenv.yaml              # OpenEnv environment manifest
โ”œโ”€โ”€ README.md                 # This file
โ”œโ”€โ”€ requirements.txt          # Python dependencies
โ”‚
โ”œโ”€โ”€ env/                      # Core environment
โ”‚   โ”œโ”€โ”€ assistant_env.py      # Main env class (OpenEnv entry point)
โ”‚   โ”œโ”€โ”€ state.py              # State representation + partial observability
โ”‚   โ”œโ”€โ”€ actions.py            # Action definitions + action masking
โ”‚   โ”œโ”€โ”€ rewards.py            # Multi-objective reward engine
โ”‚   โ”œโ”€โ”€ scheduler.py          # Temporal reasoning + conflict resolution
โ”‚   โ”œโ”€โ”€ scenario_generator.py # Curriculum-aware scenario generation
โ”‚   โ””โ”€โ”€ utils.py              # Time utilities, conflict detection, metrics
โ”‚
โ”œโ”€โ”€ agents/                   # Agent implementations
โ”‚   โ”œโ”€โ”€ random_agent.py       # Random baseline (lower bound)
โ”‚   โ”œโ”€โ”€ rule_based_agent.py   # Priority heuristic (strong baseline)
โ”‚   โ””โ”€โ”€ rl_agent.py           # Tabular Q-learning agent
โ”‚
โ”œโ”€โ”€ training/                 # Training & evaluation
โ”‚   โ”œโ”€โ”€ train_rl.py           # Multi-agent training comparison
โ”‚   โ”œโ”€โ”€ evaluate.py           # Evaluation harness
โ”‚   โ””โ”€โ”€ plots.py              # Visualization utilities
โ”‚
โ”œโ”€โ”€ ui/                       # Interactive demo
โ”‚   โ”œโ”€โ”€ app.py                # Gradio web interface
โ”‚   โ””โ”€โ”€ timeline.py           # Plotly Gantt timeline
โ”‚
โ””โ”€โ”€ logs/                     # Training outputs
    โ”œโ”€โ”€ reward_curves.png
    โ”œโ”€โ”€ agent_comparison.png
    โ””โ”€โ”€ rl_metrics.png

๐Ÿš€ Quick Start

Installation

pip install -r requirements.txt

Run Training

python -m training.train_rl

This trains all 3 agents (Random, Rule-Based, Q-Learning) for 200 episodes and generates comparison plots in logs/.

Run Evaluation

python -m training.evaluate

Launch Interactive Demo

python -m ui.app

Then open http://localhost:7860 in your browser.


๐ŸŽฎ Environment API

from env.assistant_env import ExecutiveAssistantEnv

env = ExecutiveAssistantEnv(difficulty="medium", max_steps=50)

state = env.reset()
print(state["tasks"])      # List of task objects
print(state["inbox"])       # List of inbox messages
print(state["valid_actions"])  # Legal actions (action masking)

# Take a step
action = ("complete_task", 0)  # Complete task with ID 0
next_state, reward, done, info = env.step(action)

Action Space

Action Description
schedule_task Schedule a pending task into a time slot
complete_task Mark a task as completed
defer_task Postpone a task to a later time
send_reply Reply to an inbox message
reject_task Cancel a task
ask_clarification Request more info about a task/message

Observation Space

{
  "time": "09:30",
  "tasks": [
    {"id": 0, "title": "Q4 Strategy Review", "time": "10:00",
     "duration": 60, "priority": "high", "type": "meeting", "status": "pending"}
  ],
  "inbox": [
    {"id": 0, "sender": "CEO", "content": "Need figures ASAP",
     "urgency": "high", "replied": false}
  ],
  "preferences": {"preferred_meeting_times": ["09:00", "14:00"], ...},
  "valid_actions": [("complete_task", 0), ("send_reply", 0), ...],
  "action_mask": [1, 1, 1, 1, 1, 1]
}

๐Ÿ“ˆ Results

Agent Avg Reward Task Completion Message Response Efficiency
๐ŸŽฒ Random Low ~30% ~25% ~25/100
๐Ÿ“‹ Rule-Based Medium ~65% ~70% ~55/100
๐Ÿง  Q-Learning High ~75% ~80% ~70/100

Results vary by difficulty and random seed.


๐Ÿ—๏ธ System Architecture

User / RL Agent
       โ”‚
       โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  ExecutiveAssistantEnv โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚ ScenarioGeneratorโ”‚ โ”‚ โ† Curriculum Learning
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚     State        โ”‚ โ”‚ โ† Partial Observability
โ”‚  โ”‚ (tasks + inbox)  โ”‚ โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚   Scheduler      โ”‚ โ”‚ โ† Temporal Reasoning
โ”‚  โ”‚ (conflict graph) โ”‚ โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚  RewardEngine    โ”‚ โ”‚ โ† Multi-Objective Shaping
โ”‚  โ”‚ (5 components)   โ”‚ โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚  Action Masking  โ”‚ โ”‚ โ† Invalid Action Prevention
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
       โ”‚
       โ–ผ
   Observation + Reward + Done

๐Ÿ“œ License

MIT License


๐Ÿ™ Acknowledgments

  • Built for the OpenEnv platform
  • Inspired by real-world executive assistant workflows
  • Visualization powered by Plotly and Gradio