Imsachin010's picture
restore hf spaces config block
aab83b4
metadata
title: Openenv Workflow Agent
emoji: πŸ“ˆ
colorFrom: green
colorTo: green
sdk: docker
pinned: false
license: mit

🧠 OpenEnv Workflow Agent β€” Decision-Making Under Uncertainty

πŸš€ Overview

We present a real-world OpenEnv environment that simulates workflow management tasks such as email triage, scheduling, and task handling under partial observability.

Unlike typical environments, this benchmark focuses on a critical but underexplored capability:

πŸ”₯ Cost-aware information gathering in sequential decision-making

Agents must decide:

  • When to act immediately
  • When to request additional information
  • Whether the cost of uncertainty reduction is justified

🎯 Why This Matters

Modern AI agents (LLMs, assistants, copilots) operate in uncertain environments:

  • Emails are ambiguous
  • User intent is hidden
  • Context is incomplete

Our environment models this realistically by enforcing:

  • ❗ Incorrect actions under uncertainty β†’ penalized
  • ❗ Information gathering β†’ beneficial but costly
  • ❗ Multi-step reasoning required for optimal decisions

🧠 Core Idea

We introduce a POMDP-style workflow environment where:

  • The true state is partially hidden
  • Agents must actively reduce uncertainty
  • Information acquisition has a non-zero cost

Key Property:

An optimal agent follows:

β€œRequest information only when expected benefit exceeds cost.”


βš™οΈ Environment Design

πŸ”Ή State

  • Emails (observed)
  • Tasks & calendar (observed)
  • Hidden attributes:
    • true intent
    • urgency
    • missing information

πŸ”Ή Actions

  • classify
  • reply
  • schedule
  • request_info
  • archive
  • prioritize

πŸ”Ή Reward Function

[ r_t = r_{correct} + r_{progress} - r_{cost} - r_{penalty} ]

  • Correct action β†’ +0.3
  • Task progress β†’ +0.2
  • Step penalty β†’ βˆ’0.01
  • Information request cost β†’ βˆ’0.05
  • Incorrect action β†’ βˆ’0.2

πŸ§ͺ Tasks

🟒 Easy

  • Clear intent
  • Single-step decision

🟑 Medium

  • Multi-step workflow
  • Requires sequencing

πŸ”΄ Hard

  • Ambiguous input
  • Requires information gathering before acting

πŸ“Š Baseline Results


easy:   1.00
medium: 0.50
hard:   0.13

πŸ” Interpretation

  • Baseline performs well on simple tasks
  • Fails on ambiguous scenarios
  • Demonstrates need for information-aware policies

πŸ”₯ Key Insight

Standard agents fail because they act too early under uncertainty.

Agents that act immediately under uncertainty fail. Agents that strategically gather information succeed.

This environment makes that tradeoff explicit and measurable.

Our environment exposes this failure mode clearly.


🧩 Novel Contribution

We introduce:

βœ… Cost-sensitive information gathering

  • Asking questions is beneficial but not free

βœ… Enforced uncertainty

  • Actions without information are penalized

βœ… Sequential dependency

  • Early decisions affect future rewards

πŸ§ͺ Validation

We verify:

  • βœ” Classification fails under missing information
  • βœ” Requesting info enables correct decisions
  • βœ” Tradeoff emerges between cost and accuracy

πŸ“¦ Project Structure


app/
tasks/
graders/
baseline/
scripts/
openenv.yaml
Dockerfile
inference.py

▢️ Run Locally

You can pull the pre-built Docker image directly from Docker Hub and run it:

docker pull imsachin010/openenv-workflow-agent:latest
docker run -d -p 7860:7860 --name openenv-agent imsachin010/openenv-workflow-agent:latest

Test endpoint:

curl -X POST http://localhost:7860/reset

πŸ€– Inference

Run the inference script inside the environment:

python -m inference

Outputs:

[START]
[STEP]
[END]

🧠 Conclusion

This environment highlights a key gap in current agents:

❗ They do not reason about when to gather information

We provide a benchmark to evaluate and improve:

  • decision-making under uncertainty
  • information-seeking behavior
  • sequential reasoning

🏁 Submission Notes

  • βœ” Fully OpenEnv compliant
  • βœ” Deterministic graders
  • βœ” Reproducible via Docker
  • βœ” HF Space endpoint available