Imsachin010's picture
restore hf spaces config block
aab83b4
---
title: Openenv Workflow Agent
emoji: πŸ“ˆ
colorFrom: green
colorTo: green
sdk: docker
pinned: false
license: mit
---
# 🧠 OpenEnv Workflow Agent β€” Decision-Making Under Uncertainty
## πŸš€ Overview
We present a **real-world OpenEnv environment** that simulates workflow management tasks such as email triage, scheduling, and task handling under **partial observability**.
Unlike typical environments, this benchmark focuses on a critical but underexplored capability:
> πŸ”₯ **Cost-aware information gathering in sequential decision-making**
Agents must decide:
- When to act immediately
- When to request additional information
- Whether the cost of uncertainty reduction is justified
---
## 🎯 Why This Matters
Modern AI agents (LLMs, assistants, copilots) operate in **uncertain environments**:
- Emails are ambiguous
- User intent is hidden
- Context is incomplete
Our environment models this realistically by enforcing:
- ❗ Incorrect actions under uncertainty β†’ penalized
- ❗ Information gathering β†’ beneficial but costly
- ❗ Multi-step reasoning required for optimal decisions
---
## 🧠 Core Idea
We introduce a **POMDP-style workflow environment** where:
- The true state is partially hidden
- Agents must **actively reduce uncertainty**
- Information acquisition has a **non-zero cost**
### Key Property:
> An optimal agent follows:
>
> **β€œRequest information only when expected benefit exceeds cost.”**
---
## βš™οΈ Environment Design
### πŸ”Ή State
- Emails (observed)
- Tasks & calendar (observed)
- Hidden attributes:
- true intent
- urgency
- missing information
---
### πŸ”Ή Actions
- `classify`
- `reply`
- `schedule`
- `request_info`
- `archive`
- `prioritize`
---
### πŸ”Ή Reward Function
\[
r_t = r_{correct} + r_{progress} - r_{cost} - r_{penalty}
\]
- Correct action β†’ +0.3
- Task progress β†’ +0.2
- Step penalty β†’ βˆ’0.01
- Information request cost β†’ βˆ’0.05
- Incorrect action β†’ βˆ’0.2
---
## πŸ§ͺ Tasks
### 🟒 Easy
- Clear intent
- Single-step decision
### 🟑 Medium
- Multi-step workflow
- Requires sequencing
### πŸ”΄ Hard
- Ambiguous input
- Requires **information gathering before acting**
---
## πŸ“Š Baseline Results
```
easy: 1.00
medium: 0.50
hard: 0.13
```
### πŸ” Interpretation
- Baseline performs well on simple tasks
- Fails on ambiguous scenarios
- Demonstrates need for **information-aware policies**
---
## πŸ”₯ Key Insight
Standard agents fail because they **act too early under uncertainty**.
Agents that act immediately under uncertainty fail.
Agents that strategically gather information succeed.
This environment makes that tradeoff explicit and measurable.
Our environment exposes this failure mode clearly.
---
## 🧩 Novel Contribution
We introduce:
### βœ… Cost-sensitive information gathering
- Asking questions is beneficial but not free
### βœ… Enforced uncertainty
- Actions without information are penalized
### βœ… Sequential dependency
- Early decisions affect future rewards
---
## πŸ§ͺ Validation
We verify:
- βœ” Classification fails under missing information
- βœ” Requesting info enables correct decisions
- βœ” Tradeoff emerges between cost and accuracy
---
## πŸ“¦ Project Structure
```
app/
tasks/
graders/
baseline/
scripts/
openenv.yaml
Dockerfile
inference.py
````
---
## ▢️ Run Locally
You can pull the pre-built Docker image directly from Docker Hub and run it:
```bash
docker pull imsachin010/openenv-workflow-agent:latest
docker run -d -p 7860:7860 --name openenv-agent imsachin010/openenv-workflow-agent:latest
```
Test endpoint:
```bash
curl -X POST http://localhost:7860/reset
```
---
## πŸ€– Inference
Run the inference script inside the environment:
```bash
python -m inference
```
Outputs:
```
[START]
[STEP]
[END]
```
---
## 🧠 Conclusion
This environment highlights a key gap in current agents:
> ❗ They do not reason about **when to gather information**
We provide a benchmark to evaluate and improve:
* decision-making under uncertainty
* information-seeking behavior
* sequential reasoning
---
## 🏁 Submission Notes
* βœ” Fully OpenEnv compliant
* βœ” Deterministic graders
* βœ” Reproducible via Docker
* βœ” HF Space endpoint available