Spaces:
Sleeping
Sleeping
| title: Openenv Workflow Agent | |
| emoji: π | |
| colorFrom: green | |
| colorTo: green | |
| sdk: docker | |
| pinned: false | |
| license: mit | |
| # π§ OpenEnv Workflow Agent β Decision-Making Under Uncertainty | |
| ## π Overview | |
| We present a **real-world OpenEnv environment** that simulates workflow management tasks such as email triage, scheduling, and task handling under **partial observability**. | |
| Unlike typical environments, this benchmark focuses on a critical but underexplored capability: | |
| > π₯ **Cost-aware information gathering in sequential decision-making** | |
| Agents must decide: | |
| - When to act immediately | |
| - When to request additional information | |
| - Whether the cost of uncertainty reduction is justified | |
| --- | |
| ## π― Why This Matters | |
| Modern AI agents (LLMs, assistants, copilots) operate in **uncertain environments**: | |
| - Emails are ambiguous | |
| - User intent is hidden | |
| - Context is incomplete | |
| Our environment models this realistically by enforcing: | |
| - β Incorrect actions under uncertainty β penalized | |
| - β Information gathering β beneficial but costly | |
| - β Multi-step reasoning required for optimal decisions | |
| --- | |
| ## π§ Core Idea | |
| We introduce a **POMDP-style workflow environment** where: | |
| - The true state is partially hidden | |
| - Agents must **actively reduce uncertainty** | |
| - Information acquisition has a **non-zero cost** | |
| ### Key Property: | |
| > An optimal agent follows: | |
| > | |
| > **βRequest information only when expected benefit exceeds cost.β** | |
| --- | |
| ## βοΈ Environment Design | |
| ### πΉ State | |
| - Emails (observed) | |
| - Tasks & calendar (observed) | |
| - Hidden attributes: | |
| - true intent | |
| - urgency | |
| - missing information | |
| --- | |
| ### πΉ Actions | |
| - `classify` | |
| - `reply` | |
| - `schedule` | |
| - `request_info` | |
| - `archive` | |
| - `prioritize` | |
| --- | |
| ### πΉ Reward Function | |
| \[ | |
| r_t = r_{correct} + r_{progress} - r_{cost} - r_{penalty} | |
| \] | |
| - Correct action β +0.3 | |
| - Task progress β +0.2 | |
| - Step penalty β β0.01 | |
| - Information request cost β β0.05 | |
| - Incorrect action β β0.2 | |
| --- | |
| ## π§ͺ Tasks | |
| ### π’ Easy | |
| - Clear intent | |
| - Single-step decision | |
| ### π‘ Medium | |
| - Multi-step workflow | |
| - Requires sequencing | |
| ### π΄ Hard | |
| - Ambiguous input | |
| - Requires **information gathering before acting** | |
| --- | |
| ## π Baseline Results | |
| ``` | |
| easy: 1.00 | |
| medium: 0.50 | |
| hard: 0.13 | |
| ``` | |
| ### π Interpretation | |
| - Baseline performs well on simple tasks | |
| - Fails on ambiguous scenarios | |
| - Demonstrates need for **information-aware policies** | |
| --- | |
| ## π₯ Key Insight | |
| Standard agents fail because they **act too early under uncertainty**. | |
| Agents that act immediately under uncertainty fail. | |
| Agents that strategically gather information succeed. | |
| This environment makes that tradeoff explicit and measurable. | |
| Our environment exposes this failure mode clearly. | |
| --- | |
| ## π§© Novel Contribution | |
| We introduce: | |
| ### β Cost-sensitive information gathering | |
| - Asking questions is beneficial but not free | |
| ### β Enforced uncertainty | |
| - Actions without information are penalized | |
| ### β Sequential dependency | |
| - Early decisions affect future rewards | |
| --- | |
| ## π§ͺ Validation | |
| We verify: | |
| - β Classification fails under missing information | |
| - β Requesting info enables correct decisions | |
| - β Tradeoff emerges between cost and accuracy | |
| --- | |
| ## π¦ Project Structure | |
| ``` | |
| app/ | |
| tasks/ | |
| graders/ | |
| baseline/ | |
| scripts/ | |
| openenv.yaml | |
| Dockerfile | |
| inference.py | |
| ```` | |
| --- | |
| ## βΆοΈ Run Locally | |
| You can pull the pre-built Docker image directly from Docker Hub and run it: | |
| ```bash | |
| docker pull imsachin010/openenv-workflow-agent:latest | |
| docker run -d -p 7860:7860 --name openenv-agent imsachin010/openenv-workflow-agent:latest | |
| ``` | |
| Test endpoint: | |
| ```bash | |
| curl -X POST http://localhost:7860/reset | |
| ``` | |
| --- | |
| ## π€ Inference | |
| Run the inference script inside the environment: | |
| ```bash | |
| python -m inference | |
| ``` | |
| Outputs: | |
| ``` | |
| [START] | |
| [STEP] | |
| [END] | |
| ``` | |
| --- | |
| ## π§ Conclusion | |
| This environment highlights a key gap in current agents: | |
| > β They do not reason about **when to gather information** | |
| We provide a benchmark to evaluate and improve: | |
| * decision-making under uncertainty | |
| * information-seeking behavior | |
| * sequential reasoning | |
| --- | |
| ## π Submission Notes | |
| * β Fully OpenEnv compliant | |
| * β Deterministic graders | |
| * β Reproducible via Docker | |
| * β HF Space endpoint available | |