Spaces:
Sleeping
title: Smart Calendar Resolver
emoji: π
colorFrom: blue
colorTo: green
sdk: docker
app_file: app.py
pinned: false
Smart Calendar Resolver β OpenEnv Environment
A deterministic, multi-step OpenEnv environment for evaluating agent reasoning in real-world scheduling workflows.
This environment models a constrained meeting scheduling problem where an agent must interpret user intent, reason over structured availability, and produce a valid, verified outcome through a staged interaction loop.
Problem Definition
Given:
- a natural language meeting request
- multiple participants with availability windows
- constraints (duration, deadline, priority, timezone)
The agent must:
- Interpret the request
- Aggregate and reason over availability
- Select a valid time slot
- Confirm and finalize the schedule
This reflects real-world calendar coordination tasks commonly handled by assistants and productivity tools.
Environment Design
Core Loop
The environment follows the standard OpenEnv interface:
reset()β returns initial observationstep(action)β returns (observation, reward, done, info)stateβ internal environment state
Stage-Based Interaction
The task is decomposed into explicit stages:
understand_requestevaluate_availabilitypropose_slotconfirm_schedule
Agents are expected to follow this progression. Out-of-order or invalid transitions are penalized.
Dataset
A small, fully deterministic, in-memory dataset is used.
Each scenario includes:
- request text
- participants
- availability windows
- constraints (deadline, duration, priority)
- ground-truth valid slot
Difficulty levels:
- Easy: single valid slot, minimal reasoning
- Medium: conflicting availability with constraint filtering
- Hard: multiple candidates requiring prioritization and constraint trade-offs
Design choice:
- Small dataset ensures reproducibility
- No randomness ensures stable evaluation and debugging
State Representation
The environment maintains:
episode_idstep_countcurrent_scenarioselected_slotaction_historysolvedflag
This enables:
- trajectory-based evaluation
- reward shaping across steps
- deterministic replay
Observation Space
Each observation contains:
- request (natural language)
- structured availability
- constraints
- current step index
- feedback signal
- action history
- next expected stage
- reward
- done flag
Observations are designed to balance:
- realism (semi-structured inputs)
- controllability (no external dependencies)
Action Space
Typed via Pydantic models:
Fields include:
stageproposed_time_slotconfirm_schedulefinal_note
Actions are structured but flexible enough to simulate agent reasoning.
Reward Function
Shaped reward encourages incremental progress:
- correct interpretation of request
- correct use of availability constraints
- valid slot selection
- correct final confirmation
- concise and relevant final note
Penalties:
- invalid stage transitions
- incorrect slot selection
- repeated or redundant actions
Properties:
- dense (not sparse)
- deterministic
- aligned with task completion
Determinism & Reproducibility
- No randomness in dataset or transitions
- Fixed scenario ordering
- Identical rewards for identical actions
- Deterministic baseline policy
This ensures:
- reproducible scoring
- stable evaluation across runs
- compatibility with automated grading
Baseline (Inference)
A deterministic baseline is provided.
Characteristics:
- follows correct stage sequence
- selects known valid slot
- produces consistent output
- uses the injected OpenAI-compatible proxy when
API_BASE_URL,API_KEY, andMODEL_NAMEare present - falls back to the deterministic local baseline when those submission env vars are absent
Required Output Format
The script emits strictly formatted logs:
[START] task= env= model= [STEP] step= action= reward=<0.00> done=<true|false> error=<msg|null> [END] success=<true|false> steps= rewards=<r1,r2,...,rn>
This format is required for evaluation pipelines.
Validation & Testing
The environment has been verified with:
uv run openenv validate .- deterministic baseline execution
- pytest suite covering:
- environment flow
- state transitions
- reward correctness
- inference execution
- API health
All tests pass from repository root.
Deployment
Docker
docker build -t smart-calendar-env .
docker run -p 8000:8000 smart-calendar-env
Health check:
curl http://localhost:8000/health
Expected:
{"status":"healthy"} Hugging Face Spaces Deploy using Docker SDK Use repository root as build context Verify /health endpoint Ensure logs show clean startup
Key Design Decisions Stage-based decomposition β improves interpretability and grading Small synthetic dataset β ensures determinism and fast validation Structured actions β enables consistent evaluation Shaped rewards β provides meaningful learning signal Root-level Dockerfile β simplifies deployment pipeline Evaluation Alignment
This environment directly satisfies OpenEnv requirements:
real-world task simulation multi-step agent interaction deterministic graders meaningful reward shaping reproducible baseline Docker + HF Spaces deployability Summary
Smart Calendar Resolver is a compact, deterministic environment that captures a realistic scheduling workflow while remaining easy to validate, deploy, and evaluate.
It is designed to test:
multi-step reasoning constraint handling structured decision making trajectory-based agent performance
I also pushed this to huggingface spaces