File size: 4,297 Bytes
aab83b4
 
 
 
 
 
 
 
 
1b64cba
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0fca933
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
---
title: Openenv Workflow Agent
emoji: πŸ“ˆ
colorFrom: green
colorTo: green
sdk: docker
pinned: false
license: mit
---
# 🧠 OpenEnv Workflow Agent β€” Decision-Making Under Uncertainty

## πŸš€ Overview

We present a **real-world OpenEnv environment** that simulates workflow management tasks such as email triage, scheduling, and task handling under **partial observability**.

Unlike typical environments, this benchmark focuses on a critical but underexplored capability:

> πŸ”₯ **Cost-aware information gathering in sequential decision-making**

Agents must decide:
- When to act immediately
- When to request additional information
- Whether the cost of uncertainty reduction is justified

---

## 🎯 Why This Matters

Modern AI agents (LLMs, assistants, copilots) operate in **uncertain environments**:
- Emails are ambiguous  
- User intent is hidden  
- Context is incomplete  

Our environment models this realistically by enforcing:

- ❗ Incorrect actions under uncertainty β†’ penalized  
- ❗ Information gathering β†’ beneficial but costly  
- ❗ Multi-step reasoning required for optimal decisions  

---

## 🧠 Core Idea

We introduce a **POMDP-style workflow environment** where:

- The true state is partially hidden
- Agents must **actively reduce uncertainty**
- Information acquisition has a **non-zero cost**

### Key Property:

> An optimal agent follows:
>
> **β€œRequest information only when expected benefit exceeds cost.”**

---

## βš™οΈ Environment Design

### πŸ”Ή State

- Emails (observed)
- Tasks & calendar (observed)
- Hidden attributes:
  - true intent
  - urgency
  - missing information

---

### πŸ”Ή Actions

- `classify`
- `reply`
- `schedule`
- `request_info`
- `archive`
- `prioritize`

---

### πŸ”Ή Reward Function

\[
r_t = r_{correct} + r_{progress} - r_{cost} - r_{penalty}
\]

- Correct action β†’ +0.3  
- Task progress β†’ +0.2  
- Step penalty β†’ βˆ’0.01  
- Information request cost β†’ βˆ’0.05  
- Incorrect action β†’ βˆ’0.2  

---

## πŸ§ͺ Tasks

### 🟒 Easy
- Clear intent
- Single-step decision

### 🟑 Medium
- Multi-step workflow
- Requires sequencing

### πŸ”΄ Hard
- Ambiguous input
- Requires **information gathering before acting**

---

## πŸ“Š Baseline Results

```

easy:   1.00
medium: 0.50
hard:   0.13

```

### πŸ” Interpretation

- Baseline performs well on simple tasks  
- Fails on ambiguous scenarios  
- Demonstrates need for **information-aware policies**

---

## πŸ”₯ Key Insight

Standard agents fail because they **act too early under uncertainty**.

Agents that act immediately under uncertainty fail.
Agents that strategically gather information succeed.

This environment makes that tradeoff explicit and measurable.

Our environment exposes this failure mode clearly.

---

## 🧩 Novel Contribution

We introduce:

### βœ… Cost-sensitive information gathering
- Asking questions is beneficial but not free

### βœ… Enforced uncertainty
- Actions without information are penalized

### βœ… Sequential dependency
- Early decisions affect future rewards

---

## πŸ§ͺ Validation

We verify:

- βœ” Classification fails under missing information  
- βœ” Requesting info enables correct decisions  
- βœ” Tradeoff emerges between cost and accuracy  

---

## πŸ“¦ Project Structure

```

app/
tasks/
graders/
baseline/
scripts/
openenv.yaml
Dockerfile
inference.py

````

---

## ▢️ Run Locally

You can pull the pre-built Docker image directly from Docker Hub and run it:

```bash
docker pull imsachin010/openenv-workflow-agent:latest
docker run -d -p 7860:7860 --name openenv-agent imsachin010/openenv-workflow-agent:latest
```

Test endpoint:

```bash
curl -X POST http://localhost:7860/reset
```

---

## πŸ€– Inference

Run the inference script inside the environment:

```bash
python -m inference
```

Outputs:

```
[START]
[STEP]
[END]
```

---

## 🧠 Conclusion

This environment highlights a key gap in current agents:

> ❗ They do not reason about **when to gather information**

We provide a benchmark to evaluate and improve:

* decision-making under uncertainty
* information-seeking behavior
* sequential reasoning

---

## 🏁 Submission Notes

* βœ” Fully OpenEnv compliant
* βœ” Deterministic graders
* βœ” Reproducible via Docker
* βœ” HF Space endpoint available