Spaces:

Imsachin010
/

openenv-workflow-agent

Sleeping

App Files Files Community

Imsachin010 commited on Apr 5

Commit

1b64cba

1 Parent(s): 0fca933

initial deployment

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitignore +1 -0
Dockerfile +18 -0
README.md +231 -1
__pycache__/normaltest.cpython-313.pyc +0 -0
app/__init__.py +0 -0
app/__pycache__/__init__.cpython-313.pyc +0 -0
app/__pycache__/actions.cpython-313.pyc +0 -0
app/__pycache__/env.cpython-313.pyc +0 -0
app/__pycache__/observation.cpython-313.pyc +0 -0
app/__pycache__/reward.cpython-313.pyc +0 -0
app/__pycache__/state.cpython-313.pyc +0 -0
app/__pycache__/transition.cpython-313.pyc +0 -0
app/actions.py +18 -0
app/env.py +67 -0
app/observation.py +11 -0
app/reward.py +29 -0
app/state.py +55 -0
app/transition.py +77 -0
app/utils.py +0 -0
app_server.py +22 -0
baseline/__init__.py +0 -0
baseline/__pycache__/__init__.cpython-313.pyc +0 -0
baseline/__pycache__/policy.cpython-313.pyc +0 -0
baseline/__pycache__/run_baseline.cpython-313.pyc +0 -0
baseline/policy.py +30 -0
baseline/run_baseline.py +54 -0
graders/__init__.py +0 -0
graders/__pycache__/__init__.cpython-313.pyc +0 -0
graders/__pycache__/base.cpython-313.pyc +0 -0
graders/__pycache__/easy_grader.cpython-313.pyc +0 -0
graders/__pycache__/hard_grader.cpython-313.pyc +0 -0
graders/__pycache__/medium_grader.cpython-313.pyc +0 -0
graders/base.py +3 -0
graders/easy_grader.py +17 -0
graders/hard_grader.py +26 -0
graders/medium_grader.py +22 -0
inference.py +114 -0
normaltest.py +21 -0
openenv.yaml +53 -0
requirements.txt +8 -0
scripts/__init__.py +0 -0
scripts/__pycache__/__init__.cpython-313.pyc +0 -0
scripts/__pycache__/validate_env.cpython-313.pyc +0 -0
scripts/run_all_tasks.py +0 -0
scripts/validate_env.py +35 -0
tasks/__init__.py +0 -0
tasks/__pycache__/__init__.cpython-313.pyc +0 -0
tasks/__pycache__/easy.cpython-313.pyc +0 -0
tasks/__pycache__/hard.cpython-313.pyc +0 -0
tasks/__pycache__/medium.cpython-313.pyc +0 -0

.gitignore ADDED Viewed

	@@ -0,0 +1 @@


1	+ .env

Dockerfile ADDED Viewed

	@@ -0,0 +1,18 @@

+FROM python:3.10-slim
+WORKDIR /app
+# Copy all files
+COPY . .
+# Install dependencies
+RUN pip install --no-cache-dir -r requirements.txt
+# Install extra needed libs
+RUN pip install pyyaml
+# Set environment
+ENV PYTHONPATH=/app
+# Default command
+CMD ["uvicorn", "app_server:app", "--host", "0.0.0.0", "--port", "7860"]

README.md CHANGED Viewed

@@ -7,5 +7,235 @@ sdk: docker
 pinned: false
 license: mit
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 pinned: false
 license: mit
 ---
+# 🧠 OpenEnv Workflow Agent — Decision-Making Under Uncertainty
+## 🚀 Overview
+We present a **real-world OpenEnv environment** that simulates workflow management tasks such as email triage, scheduling, and task handling under **partial observability**.
+Unlike typical environments, this benchmark focuses on a critical but underexplored capability:
+> 🔥 **Cost-aware information gathering in sequential decision-making**
+Agents must decide:
+- When to act immediately
+- When to request additional information
+- Whether the cost of uncertainty reduction is justified
+---
+## 🎯 Why This Matters
+Modern AI agents (LLMs, assistants, copilots) operate in **uncertain environments**:
+- Emails are ambiguous
+- User intent is hidden
+- Context is incomplete
+Our environment models this realistically by enforcing:
+- ❗ Incorrect actions under uncertainty → penalized
+- ❗ Information gathering → beneficial but costly
+- ❗ Multi-step reasoning required for optimal decisions
+---
+## 🧠 Core Idea
+We introduce a **POMDP-style workflow environment** where:
+- The true state is partially hidden
+- Agents must **actively reduce uncertainty**
+- Information acquisition has a **non-zero cost**
+### Key Property:
+> An optimal agent follows:
+>
+> **“Request information only when expected benefit exceeds cost.”**
+---
+## ⚙️ Environment Design
+### 🔹 State
+- Emails (observed)
+- Tasks & calendar (observed)
+- Hidden attributes:
+  - true intent
+  - urgency
+  - missing information
+---
+### 🔹 Actions
+- `classify`
+- `reply`
+- `schedule`
+- `request_info`
+- `archive`
+- `prioritize`
+---
+### 🔹 Reward Function
+\[
+r_t = r_{correct} + r_{progress} - r_{cost} - r_{penalty}
+\]
+- Correct action → +0.3
+- Task progress → +0.2
+- Step penalty → −0.01
+- Information request cost → −0.05
+- Incorrect action → −0.2
+---
+## 🧪 Tasks
+### 🟢 Easy
+- Clear intent
+- Single-step decision
+### 🟡 Medium
+- Multi-step workflow
+- Requires sequencing
+### 🔴 Hard
+- Ambiguous input
+- Requires **information gathering before acting**
+---
+## 📊 Baseline Results
+```
+easy:   1.00
+medium: 0.50
+hard:   0.13
+```
+### 🔍 Interpretation
+- Baseline performs well on simple tasks
+- Fails on ambiguous scenarios
+- Demonstrates need for **information-aware policies**
+---
+## 🔥 Key Insight
+Standard agents fail because they **act too early under uncertainty**.
+Agents that act immediately under uncertainty fail.
+Agents that strategically gather information succeed.
+This environment makes that tradeoff explicit and measurable.
+Our environment exposes this failure mode clearly.
+---
+## 🧩 Novel Contribution
+We introduce:
+### ✅ Cost-sensitive information gathering
+- Asking questions is beneficial but not free
+### ✅ Enforced uncertainty
+- Actions without information are penalized
+### ✅ Sequential dependency
+- Early decisions affect future rewards
+---
+## 🧪 Validation
+We verify:
+- ✔ Classification fails under missing information
+- ✔ Requesting info enables correct decisions
+- ✔ Tradeoff emerges between cost and accuracy
+---
+## 📦 Project Structure
+```
+app/
+tasks/
+graders/
+baseline/
+scripts/
+openenv.yaml
+Dockerfile
+inference.py
+````
+---
+## ▶️ Run Locally
+You can pull the pre-built Docker image directly from Docker Hub and run it:
+```bash
+docker pull imsachin010/openenv-workflow-agent:latest
+docker run -d -p 7860:7860 --name openenv-agent imsachin010/openenv-workflow-agent:latest
+```
+Test endpoint:
+```bash
+curl -X POST http://localhost:7860/reset
+```
+---
+## 🤖 Inference
+Run the inference script inside the environment:
+```bash
+python -m inference
+```
+Outputs:
+```
+[START]
+[STEP]
+[END]
+```
+---
+## 🧠 Conclusion
+This environment highlights a key gap in current agents:
+> ❗ They do not reason about **when to gather information**
+We provide a benchmark to evaluate and improve:
+* decision-making under uncertainty
+* information-seeking behavior
+* sequential reasoning
+---
+## 🏁 Submission Notes
+* ✔ Fully OpenEnv compliant
+* ✔ Deterministic graders
+* ✔ Reproducible via Docker
+* ✔ HF Space endpoint available

__pycache__/normaltest.cpython-313.pyc ADDED Viewed

Binary file (431 Bytes). View file

app/__init__.py ADDED Viewed

File without changes

app/__pycache__/__init__.cpython-313.pyc ADDED Viewed

Binary file (176 Bytes). View file

app/__pycache__/actions.cpython-313.pyc ADDED Viewed

Binary file (764 Bytes). View file

app/__pycache__/env.cpython-313.pyc ADDED Viewed

Binary file (3.16 kB). View file

app/__pycache__/observation.cpython-313.pyc ADDED Viewed

Binary file (806 Bytes). View file

app/__pycache__/reward.cpython-313.pyc ADDED Viewed

Binary file (1.13 kB). View file

app/__pycache__/state.cpython-313.pyc ADDED Viewed

Binary file (2.51 kB). View file

app/__pycache__/transition.cpython-313.pyc ADDED Viewed

Binary file (2.16 kB). View file

app/actions.py ADDED Viewed

	@@ -0,0 +1,18 @@

+from pydantic import BaseModel
+from typing import Optional, Literal, Dict
+ActionType = Literal[
+    "classify",
+    "reply",
+    "schedule",
+    "prioritize",
+    "request_info",
+    "archive"
+]
+class Action(BaseModel):
+    type: ActionType
+    target_id: str                  # email/task id
+    payload: Optional[Dict] = None  # flexible for different actions

app/env.py ADDED Viewed

	@@ -0,0 +1,67 @@

+from typing import Tuple, Dict, Any
+from copy import deepcopy
+from .state import EnvironmentState
+from .observation import Observation
+from .actions import Action
+from .transition import apply_action
+from .reward import compute_reward
+class WorkflowEnv:
+    def __init__(self, initial_state: EnvironmentState):
+        self.initial_state = deepcopy(initial_state)
+        self._state = deepcopy(initial_state)
+    # -----------------------------
+    # RESET
+    # -----------------------------
+    def reset(self) -> Observation:
+        self._state = deepcopy(self.initial_state)
+        return self._get_observation()
+    # -----------------------------
+    # STEP
+    # -----------------------------
+    def step(self, action: Action) -> Tuple[Observation, float, bool, Dict[str, Any]]:
+        if self._state.done:
+            raise Exception("Episode already finished. Call reset().")
+        # Log action
+        self._state.history.append({
+            "timestep": self._state.timestep,
+            "action": action.model_dump()
+        })
+        # ✅ APPLY TRANSITION (NEW)
+        self._state, info = apply_action(self._state, action)
+        # ✅ COMPUTE REWARD (NEW)
+        reward = compute_reward(self._state, action.type, info)
+        # Increment timestep
+        self._state.timestep += 1
+        # Episode termination
+        if self._state.timestep >= 10:
+            self._state.done = True
+        return self._get_observation(), reward, self._state.done, {}
+    # -----------------------------
+    # STATE ACCESS
+    # -----------------------------
+    def state(self) -> EnvironmentState:
+        return self._state
+    # -----------------------------
+    # OBSERVATION
+    # -----------------------------
+    def _get_observation(self) -> Observation:
+        return Observation(
+            emails=self._state.emails,
+            tasks=self._state.tasks,
+            calendar=self._state.calendar,
+            history=self._state.history,
+            timestep=self._state.timestep
+        )

app/observation.py ADDED Viewed

	@@ -0,0 +1,11 @@

+from pydantic import BaseModel
+from typing import List, Dict
+from .state import Email, Task, CalendarEvent
+class Observation(BaseModel):
+    emails: List[Email]
+    tasks: List[Task]
+    calendar: List[CalendarEvent]
+    history: List[Dict]
+    timestep: int

app/reward.py ADDED Viewed

	@@ -0,0 +1,29 @@

+from app.state import EnvironmentState
+def compute_reward(state: EnvironmentState, action_type: str, info: dict) -> float:
+    reward = 0.0
+    # --- Correctness ---
+    if info.get("correct_action"):
+        reward += 0.2
+    # Cost for asking info (tradeoff)
+    if action_type == "request_info":
+        reward -= 0.05  # cost for querying
+    elif info.get("incorrect_action"):
+        reward -= 0.2
+    # --- Progress ---
+    if info.get("task_progress"):
+        reward += 0.2
+    # --- Step penalty (efficiency)
+    reward -= 0.01
+    # --- Deadline penalty ---
+    for hidden in state.hidden_email_states:
+        if hidden.deadline and state.timestep > hidden.deadline:
+            reward -= 0.5
+    return reward

app/state.py ADDED Viewed

	@@ -0,0 +1,55 @@

+from pydantic import BaseModel, Field
+from typing import List, Optional, Dict
+from enum import Enum
+class EmailPriority(str, Enum):
+    LOW = "low"
+    MEDIUM = "medium"
+    HIGH = "high"
+class Email(BaseModel):
+    id: str
+    sender: str
+    subject: str
+    body: str
+class HiddenEmailState(BaseModel):
+    email_id: str
+    true_intent: str               # e.g., "meeting_request", "spam", "task"
+    urgency: EmailPriority
+    requires_response: bool
+    deadline: Optional[int]        # timestep deadline
+    missing_information: bool      # does agent need to ask clarification?
+class Task(BaseModel):
+    id: str
+    description: str
+    completed: bool = False
+    deadline: Optional[int]
+class CalendarEvent(BaseModel):
+    id: str
+    title: str
+    time: int
+class EnvironmentState(BaseModel):
+    # Observed components
+    emails: List[Email]
+    tasks: List[Task]
+    calendar: List[CalendarEvent]
+    history: List[Dict] = Field(default_factory=list)
+    # Hidden components (NOT exposed to agent)
+    hidden_email_states: List[HiddenEmailState]
+    # Global timestep
+    timestep: int = 0
+    # Episode termination
+    done: bool = False

app/transition.py ADDED Viewed

	@@ -0,0 +1,77 @@

+from app.state import EnvironmentState
+def apply_action(state: EnvironmentState, action):
+    info = {
+        "correct_action": False,
+        "incorrect_action": False,
+        "task_progress": False
+    }
+    # Find hidden truth
+    hidden = next(
+        (h for h in state.hidden_email_states if h.email_id == action.target_id),
+        None
+    )
+    # ----------------------------
+    # CLASSIFY
+    # ----------------------------
+    # if action.type == "classify":
+    #     predicted = action.payload.get("label") if action.payload else None
+    #     if hidden:
+    #         # 🔥 NEW: penalize guessing under uncertainty
+    #         if hidden.missing_information:
+    #             info["incorrect_action"] = True  # cannot classify correctly without info
+    #     elif predicted == hidden.true_intent:
+    #         info["correct_action"] = True
+    #         info["task_progress"] = True
+    #     else:
+    #         info["incorrect_action"] = True
+    if action.type == "classify":
+        predicted = action.payload.get("label") if action.payload else None
+        if not hidden:
+            info["incorrect_action"] = True
+        elif hidden.missing_information:
+            # ❌ Cannot classify without info
+            info["incorrect_action"] = True
+        else:
+            # ✅ Now classification is allowed
+            if predicted == hidden.true_intent:
+                info["correct_action"] = True
+                info["task_progress"] = True
+            else:
+                info["incorrect_action"] = True
+    # ----------------------------
+    # ARCHIVE
+    # ----------------------------
+    elif action.type == "archive":
+        state.emails = [e for e in state.emails if e.id != action.target_id]
+        info["task_progress"] = True
+    # ----------------------------
+    # REQUEST INFO
+    # ----------------------------
+    elif action.type == "request_info":
+        if hidden and hidden.missing_information:
+            hidden.missing_information = False
+            info["correct_action"] = True
+        else:
+            info["incorrect_action"] = True
+    # ----------------------------
+    # REPLY
+    # ----------------------------
+    elif action.type == "reply":
+        if hidden and hidden.requires_response:
+            hidden.requires_response = False
+            info["correct_action"] = True
+        else:
+            info["incorrect_action"] = True
+    return state, info

app/utils.py ADDED Viewed

File without changes

app_server.py ADDED Viewed

	@@ -0,0 +1,22 @@

+from fastapi import FastAPI
+from app.env import WorkflowEnv
+from tasks.easy import create_easy_task
+app = FastAPI()
+@app.post("/reset")
+def reset():
+    state, _ = create_easy_task()
+    env = WorkflowEnv(state)
+    obs = env.reset()
+    return {"status": "ok"}
+@app.get("/")
+def root():
+    return {"message": "Workflow Env is running"}
+@app.get("/")
+def home():
+    return {"message": "Workflow Env API running"}

baseline/__init__.py ADDED Viewed

File without changes

baseline/__pycache__/__init__.cpython-313.pyc ADDED Viewed

Binary file (181 Bytes). View file

baseline/__pycache__/policy.cpython-313.pyc ADDED Viewed

Binary file (1.25 kB). View file

baseline/__pycache__/run_baseline.cpython-313.pyc ADDED Viewed

Binary file (2.19 kB). View file

baseline/policy.py ADDED Viewed

	@@ -0,0 +1,30 @@

+from app.actions import Action
+class BaselinePolicy:
+    def act(self, observation):
+        if not observation.emails:
+            return None
+        email = observation.emails[0]
+        text = (email.subject + " " + email.body).lower()
+        # Heuristic rules
+        if "meet" in text:
+            return Action(
+                type="classify",
+                target_id=email.id,
+                payload={"label": "meeting_request"}
+            )
+        elif "report" in text or "update" in text:
+            return Action(
+                type="classify",
+                target_id=email.id,
+                payload={"label": "task_request"}
+            )
+        return Action(
+            type="archive",
+            target_id=email.id
+        )

baseline/run_baseline.py ADDED Viewed

	@@ -0,0 +1,54 @@

+from tasks.easy import create_easy_task
+from tasks.medium import create_medium_task
+from tasks.hard import create_hard_task
+from graders.easy_grader import EasyGrader
+from graders.medium_grader import MediumGrader
+from graders.hard_grader import HardGrader
+from app.env import WorkflowEnv
+from baseline.policy import BaselinePolicy
+def run_task(task_name, create_task_fn, grader_cls):
+    state, ground_truth = create_task_fn()
+    env = WorkflowEnv(state)
+    policy = BaselinePolicy()
+    obs = env.reset()
+    done = False
+    steps = 0
+    while not done and steps < 10:
+        action = policy.act(obs)
+        if action is None:
+            break
+        obs, reward, done, _ = env.step(action)
+        steps += 1
+    trajectory = env.state().history
+    print(f"{task_name} trajectory:", trajectory)
+    grader = grader_cls()
+    score = grader.grade(trajectory, ground_truth)
+    return score
+def main():
+    results = {}
+    results["easy"] = run_task("easy", create_easy_task, EasyGrader)
+    results["medium"] = run_task("medium", create_medium_task, MediumGrader)
+    results["hard"] = run_task("hard", create_hard_task, HardGrader)
+    print("\n===== BASELINE RESULTS =====")
+    for k, v in results.items():
+        print(f"{k}: {round(v, 3)}")
+if __name__ == "__main__":
+    main()

graders/__init__.py ADDED Viewed

File without changes

graders/__pycache__/__init__.cpython-313.pyc ADDED Viewed

Binary file (180 Bytes). View file

graders/__pycache__/base.cpython-313.pyc ADDED Viewed

Binary file (609 Bytes). View file

graders/__pycache__/easy_grader.cpython-313.pyc ADDED Viewed

Binary file (948 Bytes). View file

graders/__pycache__/hard_grader.cpython-313.pyc ADDED Viewed

Binary file (1.2 kB). View file

graders/__pycache__/medium_grader.cpython-313.pyc ADDED Viewed

Binary file (1.07 kB). View file

graders/base.py ADDED Viewed

	@@ -0,0 +1,3 @@

+class BaseGrader:
+    def grade(self, trajectory, ground_truth) -> float:
+        raise NotImplementedError

graders/easy_grader.py ADDED Viewed

	@@ -0,0 +1,17 @@

+from graders.base import BaseGrader
+class EasyGrader(BaseGrader):
+    def grade(self, trajectory, ground_truth) -> float:
+        correct_label = ground_truth["label"]
+        for step in trajectory:
+            action = step["action"]
+            if action["type"] == "classify":
+                if action.get("payload", {}).get("label") == correct_label:
+                    return 1.0
+                else:
+                    return 0.0
+        return 0.0

graders/hard_grader.py ADDED Viewed

	@@ -0,0 +1,26 @@

+from graders.base import BaseGrader
+class HardGrader(BaseGrader):
+    def grade(self, trajectory, ground_truth) -> float:
+        expected_sequence = ground_truth["sequence"]
+        matched = 0
+        penalty = 0
+        for i, step in enumerate(trajectory):
+            if i >= len(expected_sequence):
+                break
+            action = step["action"]
+            expected = expected_sequence[i]
+            if action["type"] == expected["type"]:
+                matched += 1
+            else:
+                penalty += 1
+        score = matched / len(expected_sequence)
+        score -= 0.1 * penalty
+        return max(0.0, min(1.0, score))

graders/medium_grader.py ADDED Viewed

	@@ -0,0 +1,22 @@

+from graders.base import BaseGrader
+class MediumGrader(BaseGrader):
+    def grade(self, trajectory, ground_truth) -> float:
+        expected_sequence = ground_truth["sequence"]
+        score = 0.0
+        matched = 0
+        for i, step in enumerate(trajectory):
+            if i >= len(expected_sequence):
+                break
+            action = step["action"]
+            expected = expected_sequence[i]
+            if action["type"] == expected["type"]:
+                matched += 1
+        score = matched / len(expected_sequence)
+        return score

inference.py ADDED Viewed

	@@ -0,0 +1,114 @@

+import os
+from dotenv import load_dotenv
+from openai import OpenAI
+load_dotenv()
+from app.env import WorkflowEnv
+from app.actions import Action
+from tasks.hard import create_hard_task
+from graders.hard_grader import HardGrader
+# ---------------- ENV CONFIG ----------------
+API_BASE_URL = os.getenv("API_BASE_URL", "https://router.huggingface.co/v1")
+MODEL_NAME = os.getenv("MODEL_NAME", "Qwen/Qwen2.5-72B-Instruct")
+HF_TOKEN = os.getenv("HF_TOKEN")
+LOCAL_IMAGE_NAME = os.getenv("LOCAL_IMAGE_NAME")
+client = OpenAI(base_url=API_BASE_URL, api_key=HF_TOKEN)
+# ---------------- LOGGING ----------------
+def log_start(task, env, model):
+    print(f"[START] task={task} env={env} model={model}", flush=True)
+def log_step(step, action, reward, done, error):
+    print(
+        f"[STEP] step={step} action={action} reward={reward:.2f} done={str(done).lower()} error={error or 'null'}",
+        flush=True,
+    )
+def log_end(success, steps, score, rewards):
+    rewards_str = ",".join(f"{r:.2f}" for r in rewards)
+    print(
+        f"[END] success={str(success).lower()} steps={steps} score={score:.2f} rewards={rewards_str}",
+        flush=True,
+    )
+# ---------------- SIMPLE POLICY ----------------
+def get_action(obs):
+    if not obs.emails:
+        return None
+    email = obs.emails[0]
+    # 🔥 IMPORTANT: detect if we already asked info
+    already_asked = any(
+        h["action"]["type"] == "request_info"
+        for h in obs.history
+    )
+    text = (email.subject + " " + email.body).lower()
+    # If info already requested → do NOT ask again
+    if already_asked:
+        return Action(
+            type="classify",
+            target_id=email.id,
+            payload={"label": "meeting_request"}
+        )
+    # First step: ask info if ambiguous
+    if "sometime" in text or "next week" in text:
+        return Action(type="request_info", target_id=email.id)
+    return Action(type="archive", target_id=email.id)
+# ---------------- MAIN ----------------
+def main():
+    state, gt = create_hard_task()
+    env = WorkflowEnv(state)
+    grader = HardGrader()
+    obs = env.reset()
+    rewards = []
+    steps = 0
+    log_start("hard", "workflow-env", MODEL_NAME)
+    try:
+        done = False
+        while not done and steps < 10:
+            action = get_action(obs)
+            if action is None:
+                break
+            obs, reward, done, _ = env.step(action)
+            rewards.append(reward)
+            steps += 1
+            log_step(steps, action.type, reward, done, None)
+            # 🔥 STOP CONDITION (IMPORTANT)
+            if action.type == "classify":
+                break
+        trajectory = env.state().history
+        score = grader.grade(trajectory, gt)
+        score = max(0.0, min(1.0, score))
+        success = score > 0.3
+    finally:
+        log_end(success, steps, score, rewards)
+if __name__ == "__main__":
+    main()

normaltest.py ADDED Viewed

	@@ -0,0 +1,21 @@

+from tasks.easy import create_easy_task
+from app.env import WorkflowEnv
+from app.actions import Action
+state, gt = create_easy_task()
+env = WorkflowEnv(state)
+obs = env.reset()
+print("Initial:", obs)
+# Try correct classify
+action = Action(
+    type="classify",
+    target_id="1",
+    payload={"label": "meeting_request"}
+)
+obs, reward, done, _ = env.step(action)
+print("After step:", obs)
+print("Reward:", reward)

openenv.yaml ADDED Viewed

	@@ -0,0 +1,53 @@

+name: workflow-agent-env
+description: >
+  A real-world environment simulating email and workflow management under partial observability.
+  Agents must classify, respond, and manage tasks with incomplete information.
+version: "1.0"
+entry_point: app.env:WorkflowEnv
+observation_space:
+  type: object
+  properties:
+    emails:
+      type: array
+      description: List of emails in inbox
+    tasks:
+      type: array
+    calendar:
+      type: array
+    history:
+      type: array
+    timestep:
+      type: integer
+action_space:
+  type: object
+  properties:
+    type:
+      type: string
+      enum:
+        - classify
+        - reply
+        - schedule
+        - prioritize
+        - request_info
+        - archive
+    target_id:
+      type: string
+    payload:
+      type: object
+tasks:
+  - name: easy
+    generator: tasks.easy:create_easy_task
+    grader: graders.easy_grader:EasyGrader
+  - name: medium
+    generator: tasks.medium:create_medium_task
+    grader: graders.medium_grader:MediumGrader
+  - name: hard
+    generator: tasks.hard:create_hard_task
+    grader: graders.hard_grader:HardGrader

requirements.txt ADDED Viewed

	@@ -0,0 +1,8 @@

+pydantic==2.7.1
+typing-extensions
+python-dotenv
+pytest
+pyyaml
+fastapi
+uvicorn
+openai

scripts/__init__.py ADDED Viewed

File without changes

scripts/__pycache__/__init__.cpython-313.pyc ADDED Viewed

Binary file (180 Bytes). View file

scripts/__pycache__/validate_env.cpython-313.pyc ADDED Viewed

Binary file (1.59 kB). View file

scripts/run_all_tasks.py ADDED Viewed

File without changes

scripts/validate_env.py ADDED Viewed

	@@ -0,0 +1,35 @@

+import importlib
+import yaml
+def validate_yaml():
+    with open("openenv.yaml", "r") as f:
+        config = yaml.safe_load(f)
+    print("✔ YAML loaded")
+    # Check entry point
+    module_name, class_name = config["entry_point"].split(":")
+    module = importlib.import_module(module_name)
+    getattr(module, class_name)
+    print("✔ Entry point valid")
+    # Check tasks
+    for task in config["tasks"]:
+        gen_module, gen_fn = task["generator"].split(":")
+        grader_module, grader_cls = task["grader"].split(":")
+        gen_mod = importlib.import_module(gen_module)
+        getattr(gen_mod, gen_fn)
+        grader_mod = importlib.import_module(grader_module)
+        getattr(grader_mod, grader_cls)
+        print(f"✔ Task validated: {task['name']}")
+    print("\n✅ All validations passed!")
+if __name__ == "__main__":
+    validate_yaml()

tasks/__init__.py ADDED Viewed

File without changes

tasks/__pycache__/__init__.cpython-313.pyc ADDED Viewed

Binary file (178 Bytes). View file

tasks/__pycache__/easy.cpython-313.pyc ADDED Viewed

Binary file (928 Bytes). View file

tasks/__pycache__/hard.cpython-313.pyc ADDED Viewed

Binary file (996 Bytes). View file

tasks/__pycache__/medium.cpython-313.pyc ADDED Viewed

Binary file (992 Bytes). View file