Spaces:

Freakdivi
/

helpdesk_env

Sleeping

App Files Files Community

Freakdivi commited on 21 days ago

Commit

026df2c

verified ·

1 Parent(s): 072bcad

Upload folder using huggingface_hub

Browse files

Files changed (30) hide show

Dockerfile +20 -0
README.md +313 -5
__init__.py +16 -0
client.py +97 -0
data/knowledge_base.json +62 -0
data/tickets/easy.json +62 -0
data/tickets/hard.json +92 -0
data/tickets/medium.json +72 -0
graders/__init__.py +1 -0
graders/category_grader.py +40 -0
graders/faq_grader.py +30 -0
graders/resolution_grader.py +30 -0
graders/score_utils.py +24 -0
helpdesk_env.egg-info/PKG-INFO +321 -0
helpdesk_env.egg-info/SOURCES.txt +24 -0
helpdesk_env.egg-info/dependency_links.txt +1 -0
helpdesk_env.egg-info/entry_points.txt +2 -0
helpdesk_env.egg-info/requires.txt +10 -0
helpdesk_env.egg-info/top_level.txt +1 -0
inference.py +329 -0
models.py +150 -0
openenv.yaml +113 -0
pyproject.toml +43 -0
pyrightconfig.json +7 -0
requirements.txt +5 -0
server/__init__.py +1 -0
server/app.py +88 -0
server/helpdesk_environment.py +360 -0
user_simulator.py +46 -0
uv.lock +0 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,20 @@

+FROM python:3.11-slim
+WORKDIR /app
+ENV PYTHONDONTWRITEBYTECODE=1
+ENV PYTHONUNBUFFERED=1
+ENV PYTHONPATH=/app
+COPY requirements.txt /app/requirements.txt
+RUN pip install --no-cache-dir -r /app/requirements.txt
+COPY . /app/helpdesk_env
+EXPOSE 8000
+HEALTHCHECK --interval=30s --timeout=5s --start-period=5s --retries=3 \
+    CMD python -c "import urllib.request; urllib.request.urlopen('http://127.0.0.1:8000/health')" || exit 1
+ENV ENABLE_WEB_INTERFACE=true
+CMD ["uvicorn", "helpdesk_env.server.app:app", "--host", "0.0.0.0", "--port", "8000"]

README.md CHANGED Viewed

@@ -1,10 +1,318 @@
 ---
-title: Helpdesk Env
-emoji: 🐠
-colorFrom: pink
-colorTo: green
 sdk: docker
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: UPI Banking Support Environment
+emoji: 🏦
+colorFrom: blue
+colorTo: indigo
 sdk: docker
 pinned: false
+app_port: 8000
+tags:
+  - openenv
+  - banking
+  - upi
+  - customer-support
+base_path: /web
 ---
+# UPI Banking Support Environment
+OpenEnv-style environment for evaluating agents on UPI customer support workflows. The benchmark focuses on realistic banking support decisions rather than generic FAQ matching.
+## Motivation
+This environment is designed to test whether an agent can behave like a safe and useful support assistant for a UPI payments product such as Paytm, PhonePe, or Google Pay style support flows.
+The goal is not only to answer customers correctly, but also to:
+- identify the right issue type
+- retrieve the right knowledge entry
+- escalate fraud or overdue review cases when needed
+- avoid unsafe behavior such as asking for PINs or OTPs
+- handle multi-turn conversations before closing a case
+## Environment Description
+The environment uses three tasks with increasing difficulty:
+- `easy`: classify a customer issue into the correct support track
+- `medium`: choose the right FAQ or escalate when human/manual review is required
+- `hard`: run a short multi-turn support conversation with clarification, guidance, and closure
+The current support tracks are:
+- `payment_failure`
+- `refund_delay`
+- `fraud_complaint`
+- `kyc_account_restriction`
+- `upi_pin_or_bank_linking`
+The dataset includes:
+- 10 banking FAQ entries in [data/knowledge_base.json](data/knowledge_base.json)
+- 10 `easy` tickets in [data/tickets/easy.json](data/tickets/easy.json)
+- 10 `medium` tickets in [data/tickets/medium.json](data/tickets/medium.json)
+- 10 `hard` tickets in [data/tickets/hard.json](data/tickets/hard.json)
+## Action Space
+The public inference script and server accept the legacy action names below, which are internally mapped to the compact action model in [models.py](models.py).
+| Action | Parameters | Purpose |
+|---|---|---|
+| `classify` | `category` | Predict the correct support track for an `easy` ticket |
+| `lookup_faq` | `faq_id` | Choose the best FAQ entry for `medium` or `hard` |
+| `ask_clarification` | `message` | Ask a question to gather missing details in `hard` |
+| `reply` | `message` | Provide safe support guidance to the user |
+| `escalate` | `message` | Escalate a case that should not be fully handled automatically |
+| `resolve_ticket` | none | Close the case when it appears correctly resolved |
+Internally, these are normalized to:
+- `ask_for_details`
+- `take_action`
+- `respond_to_user`
+- `escalate_case`
+- `close_case`
+## Observation Space
+The model receives an `Observation` object from [models.py](models.py).
+| Field | Type | Description |
+|---|---|---|
+| `case_id` | `str` | Unique identifier for the active ticket |
+| `track` | `str` | Task split only: `easy`, `medium`, or `hard` |
+| `customer_message` | `str` | Current customer issue text shown to the agent |
+| `conversation_history` | `list[dict]` | Prior user/agent turns |
+| `known_facts` | `dict` | Agent-visible state such as FAQ set, available categories, and progress flags |
+| `required_slots` | `list[str]` | High-level missing information requirements for the episode |
+| `available_actions` | `list[str]` | Actions allowed by the environment |
+| `turn_number` | `int` | Current turn count |
+Important evaluation detail:
+- hidden gold labels such as the correct FAQ id and escalation label are not exposed to the model in the observation
+## Reward
+Rewards are normalized to the range `0.0` to `1.0` in [server/helpdesk_environment.py](server/helpdesk_environment.py).
+The final reward is shaped rather than purely binary. It combines:
+- `correctness`
+- `safety`
+- `resolution`
+- `efficiency`
+- `penalties`
+Weighted reward:
+```text
+0.35 * correctness
++ 0.30 * safety
++ 0.20 * resolution
++ 0.15 * efficiency
++ penalties
+```
+Examples:
+- correct classification gives a strong `easy` reward
+- correct FAQ retrieval gives partial progress on `medium`
+- correct escalation gives reward on `medium`
+- clarification plus guidance plus successful closure raises `hard` reward
+- unsafe prompts such as asking for PIN or OTP reduce reward sharply
+## Task Difficulty
+| Task | Difficulty | Description | Expected Agent Behavior |
+|---|---|---|---|
+| `easy` | Low | Single-turn issue classification | Identify the correct banking support track |
+| `medium` | Medium | FAQ retrieval or escalation decision | Select the right FAQ or escalate fraud / overdue review cases |
+| `hard` | High | Multi-turn support conversation | Ask clarification, guide safely, and close only when appropriate |
+## Setup
+From the package root:
+```bash
+cd /path/to/helpdesk_env
+uv sync
+```
+Runtime configuration is read from `.env`.
+The environment currently uses:
+- `API_BASE_URL` for the provider endpoint
+- `MODEL` or `MODEL_NAME` for the selected model
+- `API_KEY` as the primary model credential
+- `OPENAI_API_KEY` and `GROQ_API_KEY` are also supported as compatibility aliases
+- `HF_SPACE_URL` for the deployed Space runtime URL
+- `HF_SPACE_TOKEN` for protected Space access when required
+## Usage
+### Using Docker
+```bash
+# Build the image from the repository root
+docker build -t helpdesk-openenv:latest .
+# Run the server
+docker run -p 8000:8000 helpdesk-openenv:latest
+```
+Docker smoke test:
+```bash
+curl http://127.0.0.1:8000/health
+curl http://127.0.0.1:8000/
+curl -X POST http://127.0.0.1:8000/reset \
+  -H "Content-Type: application/json" \
+  -d '{}'
+curl -X POST http://127.0.0.1:8000/step \
+  -H "Content-Type: application/json" \
+  -d '{"action":{"action_type":"classify","category":"payment_failure"}}'
+curl http://127.0.0.1:8000/state
+```
+### Local Development
+```bash
+# Quick compile check
+PYTHONPYCACHEPREFIX=/tmp/pycache python3 -m py_compile \
+  inference.py server/app.py server/helpdesk_environment.py
+# Run the server locally
+uv run server
+```
+`uv run server` smoke test:
+```bash
+curl http://127.0.0.1:8000/health
+curl http://127.0.0.1:8000/
+curl -X POST http://127.0.0.1:8000/reset \
+  -H "Content-Type: application/json" \
+  -d '{}'
+curl -X POST http://127.0.0.1:8000/step \
+  -H "Content-Type: application/json" \
+  -d '{"action":{"action_type":"classify","category":"payment_failure"}}'
+curl http://127.0.0.1:8000/state
+```
+### Run Inference
+```bash
+API_BASE_URL=https://api.openai.com/v1 \
+API_KEY=$OPENAI_API_KEY \
+MODEL=gpt-5 \
+TASK_NAME=easy \
+python3 inference.py
+```
+```bash
+API_BASE_URL=https://api.groq.com/openai/v1 \
+API_KEY=$GROQ_API_KEY \
+MODEL=llama-3.3-70b-versatile \
+TASK_NAME=easy \
+python3 inference.py
+```
+`inference.py` reads configuration from `.env`.
+The script prints structured logs in the required format:
+```text
+[START] task=easy env=helpdesk_env model=llama-3.3-70b-versatile
+[STEP] step=1 action={"action_type":"classify","category":"payment_failure"} reward=1.00 done=true error=null
+[END] success=true steps=1 score=1.000 rewards=1.00
+```
+### Use the Python Client
+```python
+from helpdesk_env.client import HelpdeskEnvClient
+client = HelpdeskEnvClient("http://127.0.0.1:8000")
+result = client.reset("easy")
+print(result.observation.customer_message)
+```
+For a deployed HF Space:
+```python
+from helpdesk_env.client import HelpdeskEnvClient
+client = HelpdeskEnvClient.from_env()
+print(client.health())
+```
+### Test the Live HF Space
+```bash
+curl -X POST "https://freakdivi-helpdesk.hf.space/reset" \
+  -H "Content-Type: application/json" \
+  -d '{"task_id":"easy"}'
+curl -X POST "https://freakdivi-helpdesk.hf.space/step" \
+  -H "Content-Type: application/json" \
+  -d '{"action":{"action_type":"classify","category":"payment_failure"}}'
+```
+## Hugging Face Space Deployment
+This repo is configured as a Docker-based HF Space through the YAML frontmatter at the top of this README:
+- `sdk: docker`
+- `app_port: 8000`
+- `tags` include `openenv`
+Live Space:
+- https://huggingface.co/spaces/Freakdivi/HelpDesk
+## Baseline Scores
+Latest observed Groq baseline run after removing answer leakage from the observation:
+| Model | Easy | Medium | Hard |
+|---|---:|---:|---:|
+| `llama-3.3-70b-versatile` | 0.98 | 0.67 | 0.53 |
+Interpretation:
+- `easy` is still quite direct and can be near-perfect for strong LLMs
+- `medium` and `hard` are more informative because they require retrieval, escalation judgment, and multi-turn behavior
+## Project Structure
+```text
+helpdesk_env/
+├── README.md
+├── Dockerfile
+├── .gitignore
+├── .dockerignore
+├── __init__.py
+├── client.py
+├── data/
+│   ├── knowledge_base.json
+│   └── tickets/
+│       ├── easy.json
+│       ├── medium.json
+│       └── hard.json
+├── inference.py
+├── models.py
+├── openenv.yaml
+├── requirements.txt
+├── user_simulator.py
+├── graders/
+│   ├── category_grader.py
+│   ├── faq_grader.py
+│   └── resolution_grader.py
+└── server/
+    ├── app.py
+    └── helpdesk_environment.py
+```
+## Notes
+[user_simulator.py](user_simulator.py) is intentionally kept. It powers the customer-side replies for the `hard` task, which is what makes the benchmark genuinely multi-turn instead of a static single-response scoring setup.

__init__.py ADDED Viewed

	@@ -0,0 +1,16 @@

+from .client import HelpdeskEnvClient
+from .server.helpdesk_environment import HelpdeskEnv
+from .models import Action, Observation, Reward, TicketState
+# OpenEnv-style alias for episode/ticket state
+State = TicketState
+__all__ = [
+    "Action",
+    "Observation",
+    "Reward",
+    "TicketState",
+    "State",
+    "HelpdeskEnv",
+    "HelpdeskEnvClient",
+]

client.py ADDED Viewed

	@@ -0,0 +1,97 @@

+"""HTTP client for the Helpdesk OpenEnv server (see server/app.py)."""
+from dataclasses import dataclass
+import os
+from typing import Any, Dict, Optional
+import requests
+from .models import Action, Observation, Reward
+@dataclass
+class StepResult:
+    observation: Observation
+    reward: Reward
+    done: bool
+    info: Dict[str, Any]
+class HelpdeskEnvClient:
+    """Minimal client for POST /reset and POST /step on the FastAPI server."""
+    def __init__(
+        self,
+        base_url: str,
+        request_timeout_s: float = 60.0,
+        access_token: Optional[str] = None,
+    ):
+        self._base = base_url.rstrip("/")
+        self._timeout = float(request_timeout_s)
+        self._http = requests.Session()
+        token = access_token or os.getenv("HF_SPACE_TOKEN")
+        if token:
+            self._http.headers.update({"Authorization": f"Bearer {token}"})
+    @classmethod
+    def from_env(cls, request_timeout_s: float = 60.0) -> "HelpdeskEnvClient":
+        base_url = os.getenv("HF_SPACE_URL", "").strip()
+        if not base_url:
+            raise RuntimeError("Set HF_SPACE_URL before calling HelpdeskEnvClient.from_env()")
+        return cls(base_url=base_url, request_timeout_s=request_timeout_s)
+    def reset(self, task_id: str = "easy") -> StepResult:
+        r = self._http.post(
+            f"{self._base}/reset",
+            json={"task_id": task_id},
+            timeout=self._timeout,
+        )
+        r.raise_for_status()
+        data = r.json()
+        obs = Observation(**data["observation"])
+        rew = (
+            Reward(**data["reward"])
+            if data.get("reward") is not None
+            else Reward(
+                value=0.0,
+                correctness=0.0,
+                safety=1.0,
+                resolution=0.0,
+                efficiency=0.0,
+                penalties=0.0,
+                done=False,
+                info={},
+            )
+        )
+        return StepResult(
+            observation=obs,
+            reward=rew,
+            done=bool(data.get("done", False)),
+            info=dict(data.get("info") or {}),
+        )
+    def step(self, action: Action) -> StepResult:
+        r = self._http.post(
+            f"{self._base}/step",
+            json={"action": action.model_dump()},
+            timeout=self._timeout,
+        )
+        r.raise_for_status()
+        data = r.json()
+        return StepResult(
+            observation=Observation(**data["observation"]),
+            reward=Reward(**data["reward"]),
+            done=bool(data.get("done", False)),
+            info=dict(data.get("info") or {}),
+        )
+    def state(self) -> Observation:
+        r = self._http.get(f"{self._base}/state", timeout=self._timeout)
+        r.raise_for_status()
+        data = r.json()
+        return Observation(**data["observation"])
+    def health(self) -> Dict[str, str]:
+        r = self._http.get(f"{self._base}/health", timeout=self._timeout)
+        r.raise_for_status()
+        return dict(r.json())

data/knowledge_base.json ADDED Viewed

	@@ -0,0 +1,62 @@

+[
+  {
+    "id": "faq_001",
+    "category": "payment_failure",
+    "question": "What should I do if a UPI payment failed but money was debited?",
+    "answer": "If the payment status shows failed but the amount was debited, ask the customer to wait up to 24 hours for an automatic reversal. Collect the UTR, amount, and transaction time. Escalate only if the debit is not reversed after the standard window."
+  },
+  {
+    "id": "faq_002",
+    "category": "payment_failure",
+    "question": "What if the merchant says payment was not received even though the customer paid?",
+    "answer": "Ask for the UTR, merchant name, amount, and time of payment. If the transaction is pending or processing, advise the customer to wait for final status. If the status remains unresolved beyond the expected window, raise a payments investigation."
+  },
+  {
+    "id": "faq_003",
+    "category": "refund_delay",
+    "question": "How should support handle a delayed refund in a UPI app?",
+    "answer": "Confirm the original transaction reference, refund reference if available, amount, and merchant name. Inform the customer that refunds may take several business days depending on the bank and merchant. Escalate when the refund exceeds the documented turnaround time."
+  },
+  {
+    "id": "faq_004",
+    "category": "refund_delay",
+    "question": "What if the merchant claims a refund was completed but the customer has not received it?",
+    "answer": "Verify the refund date, amount, merchant, and UTR or ARN if shared by the merchant. Check whether the refund is still in progress at the bank side. Escalate when the refund is marked complete but remains uncredited past the expected settlement window."
+  },
+  {
+    "id": "faq_005",
+    "category": "fraud_complaint",
+    "question": "How should an unauthorized UPI transaction be handled?",
+    "answer": "Treat unauthorized payment reports as high priority. Do not ask for PIN, OTP, CVV, or full card details. Advise the customer to secure the account immediately, verify recent activity, and escalate to the fraud team for formal review."
+  },
+  {
+    "id": "faq_006",
+    "category": "kyc_account_restriction",
+    "question": "What should support say when a wallet or account is restricted due to KYC issues?",
+    "answer": "Explain whether the restriction is due to pending, expired, or failed KYC verification. Ask the customer to confirm the registered details and complete the required KYC steps in-app. Escalate only if the account remains restricted after successful verification or manual review is needed."
+  },
+  {
+    "id": "faq_007",
+    "category": "kyc_account_restriction",
+    "question": "What if a customer says their KYC was submitted but the account is still blocked?",
+    "answer": "Confirm when the documents were submitted and whether any rejection message is shown. If review is still in progress, provide the expected review timeline. Escalate to the KYC team if the review is overdue or the account is blocked despite successful verification."
+  },
+  {
+    "id": "faq_008",
+    "category": "upi_pin_or_bank_linking",
+    "question": "How do you handle UPI PIN setup or reset issues safely?",
+    "answer": "Never ask for the customer’s UPI PIN or OTP. Confirm whether the SIM is active on the same device, whether the debit card details were entered correctly, and whether the bank is supported. Suggest retrying after checking SMS permissions and bank availability."
+  },
+  {
+    "id": "faq_009",
+    "category": "upi_pin_or_bank_linking",
+    "question": "What if the customer cannot link a bank account in the UPI app?",
+    "answer": "Check whether the registered mobile number matches the bank account, the SIM is present in the device, and the bank’s UPI service is currently available. Ask for the bank name and exact error message. Escalate only if the account remains unlinked after standard troubleshooting."
+  },
+  {
+    "id": "faq_010",
+    "category": "fraud_complaint",
+    "question": "What if the customer clicked a scam collect request or shared app access?",
+    "answer": "Advise the customer to secure the account immediately, review recent transactions, and report the incident as potential fraud. Do not promise a refund. Escalate to the fraud team for investigation and next steps."
+  }
+]

data/tickets/easy.json ADDED Viewed

	@@ -0,0 +1,62 @@

+[
+  {
+    "id": "easy_001",
+    "text": "My UPI payment failed but the money has already been deducted from my bank account.",
+    "gold_category": "payment_failure",
+    "difficulty": "easy"
+  },
+  {
+    "id": "easy_002",
+    "text": "The merchant says they did not receive my payment even though the app showed money debited.",
+    "gold_category": "payment_failure",
+    "difficulty": "easy"
+  },
+  {
+    "id": "easy_003",
+    "text": "A merchant refunded me three days ago but I still do not see the money in my account.",
+    "gold_category": "refund_delay",
+    "difficulty": "easy"
+  },
+  {
+    "id": "easy_004",
+    "text": "The seller says refund is completed but nothing has reached my bank yet.",
+    "gold_category": "refund_delay",
+    "difficulty": "easy"
+  },
+  {
+    "id": "easy_005",
+    "text": "I did not make this UPI payment and I think someone used my account.",
+    "gold_category": "fraud_complaint",
+    "difficulty": "easy"
+  },
+  {
+    "id": "easy_006",
+    "text": "I accepted a strange collect request and now money is gone from my account.",
+    "gold_category": "fraud_complaint",
+    "difficulty": "easy"
+  },
+  {
+    "id": "easy_007",
+    "text": "My wallet is restricted because KYC is still pending.",
+    "gold_category": "kyc_account_restriction",
+    "difficulty": "easy"
+  },
+  {
+    "id": "easy_008",
+    "text": "I submitted my KYC but the account is still blocked.",
+    "gold_category": "kyc_account_restriction",
+    "difficulty": "easy"
+  },
+  {
+    "id": "easy_009",
+    "text": "I cannot reset my UPI PIN on the app.",
+    "gold_category": "upi_pin_or_bank_linking",
+    "difficulty": "easy"
+  },
+  {
+    "id": "easy_010",
+    "text": "My bank account is not linking in the UPI app even though the mobile number is correct.",
+    "gold_category": "upi_pin_or_bank_linking",
+    "difficulty": "easy"
+  }
+]

data/tickets/hard.json ADDED Viewed

	@@ -0,0 +1,92 @@

+[
+  {
+    "id": "hard_001",
+    "initial_text": "My payment is messed up and I need help right now.",
+    "issue_category": "payment_failure",
+    "gold_faq_id": "faq_001",
+    "trigger_phrases": ["utr", "amount", "transaction time"],
+    "clarified_text": "The payment failed but the amount was debited from my bank account about 20 minutes ago.",
+    "difficulty": "hard"
+  },
+  {
+    "id": "hard_002",
+    "initial_text": "I paid the shop but they are saying payment never came.",
+    "issue_category": "payment_failure",
+    "gold_faq_id": "faq_002",
+    "trigger_phrases": ["merchant name", "utr", "pending"],
+    "clarified_text": "The merchant says unpaid, but my app shows money debited and I have the UTR.",
+    "difficulty": "hard"
+  },
+  {
+    "id": "hard_003",
+    "initial_text": "I am waiting for my money back and no one is helping.",
+    "issue_category": "refund_delay",
+    "gold_faq_id": "faq_003",
+    "trigger_phrases": ["refund reference", "merchant", "amount"],
+    "clarified_text": "The order was cancelled and the merchant told me the refund would come, but it is still not credited.",
+    "difficulty": "hard"
+  },
+  {
+    "id": "hard_004",
+    "initial_text": "Refund issue again. This is getting frustrating.",
+    "issue_category": "refund_delay",
+    "gold_faq_id": "faq_004",
+    "trigger_phrases": ["refund date", "utr", "bank account"],
+    "clarified_text": "The merchant claims the refund was completed, but my bank account still does not show the amount.",
+    "difficulty": "hard"
+  },
+  {
+    "id": "hard_005",
+    "initial_text": "Someone took money from my UPI account and I did not do it.",
+    "issue_category": "fraud_complaint",
+    "gold_faq_id": "faq_005",
+    "trigger_phrases": ["unauthorized", "secure account", "recent transaction"],
+    "clarified_text": "I saw a payment I never approved and I am worried my account has been compromised.",
+    "difficulty": "hard"
+  },
+  {
+    "id": "hard_006",
+    "initial_text": "My wallet is blocked and I cannot use the app properly.",
+    "issue_category": "kyc_account_restriction",
+    "gold_faq_id": "faq_006",
+    "trigger_phrases": ["kyc status", "restriction reason", "verification"],
+    "clarified_text": "The app says my wallet is restricted because KYC is pending, but I am not sure what to do next.",
+    "difficulty": "hard"
+  },
+  {
+    "id": "hard_007",
+    "initial_text": "I already uploaded my documents and the account is still blocked.",
+    "issue_category": "kyc_account_restriction",
+    "gold_faq_id": "faq_007",
+    "trigger_phrases": ["submission date", "review status", "blocked after kyc"],
+    "clarified_text": "I submitted KYC documents days ago, but the account is still blocked with no update.",
+    "difficulty": "hard"
+  },
+  {
+    "id": "hard_008",
+    "initial_text": "I am unable to set my UPI PIN and the app keeps failing.",
+    "issue_category": "upi_pin_or_bank_linking",
+    "gold_faq_id": "faq_008",
+    "trigger_phrases": ["same device", "sms permission", "debit card"],
+    "clarified_text": "I am trying to set the UPI PIN after changing phones and the app fails during verification.",
+    "difficulty": "hard"
+  },
+  {
+    "id": "hard_009",
+    "initial_text": "My bank account just will not link and I have no idea why.",
+    "issue_category": "upi_pin_or_bank_linking",
+    "gold_faq_id": "faq_009",
+    "trigger_phrases": ["bank name", "registered mobile number", "error message"],
+    "clarified_text": "The bank account is not showing in the app even though the mobile number is linked to the bank.",
+    "difficulty": "hard"
+  },
+  {
+    "id": "hard_010",
+    "initial_text": "I clicked something strange and now money is gone from my account.",
+    "issue_category": "fraud_complaint",
+    "gold_faq_id": "faq_010",
+    "trigger_phrases": ["collect request", "scam", "secure account"],
+    "clarified_text": "I accepted a suspicious collect request and now I think I was scammed through UPI.",
+    "difficulty": "hard"
+  }
+]

data/tickets/medium.json ADDED Viewed

	@@ -0,0 +1,72 @@

+[
+  {
+    "id": "medium_001",
+    "text": "UPI transaction failed and money got debited. What should I tell the customer?",
+    "gold_faq_id": "faq_001",
+    "should_escalate": false,
+    "difficulty": "medium"
+  },
+  {
+    "id": "medium_002",
+    "text": "Merchant says payment not received even though the user paid through UPI.",
+    "gold_faq_id": "faq_002",
+    "should_escalate": false,
+    "difficulty": "medium"
+  },
+  {
+    "id": "medium_003",
+    "text": "Customer says the refund still has not arrived after the order was cancelled.",
+    "gold_faq_id": "faq_003",
+    "should_escalate": false,
+    "difficulty": "medium"
+  },
+  {
+    "id": "medium_004",
+    "text": "Merchant says refund completed two days ago but the amount is not in the bank account.",
+    "gold_faq_id": "faq_004",
+    "should_escalate": false,
+    "difficulty": "medium"
+  },
+  {
+    "id": "medium_005",
+    "text": "Customer reports an unauthorized UPI payment from their account.",
+    "gold_faq_id": "faq_005",
+    "should_escalate": true,
+    "difficulty": "medium"
+  },
+  {
+    "id": "medium_006",
+    "text": "Customer says the wallet is restricted because KYC is pending.",
+    "gold_faq_id": "faq_006",
+    "should_escalate": false,
+    "difficulty": "medium"
+  },
+  {
+    "id": "medium_007",
+    "text": "KYC was submitted last week but the account is still blocked with no update.",
+    "gold_faq_id": "faq_007",
+    "should_escalate": true,
+    "difficulty": "medium"
+  },
+  {
+    "id": "medium_008",
+    "text": "User cannot set or reset the UPI PIN and wants next steps.",
+    "gold_faq_id": "faq_008",
+    "should_escalate": false,
+    "difficulty": "medium"
+  },
+  {
+    "id": "medium_009",
+    "text": "The bank account is not linking in the UPI app even after several tries.",
+    "gold_faq_id": "faq_009",
+    "should_escalate": false,
+    "difficulty": "medium"
+  },
+  {
+    "id": "medium_010",
+    "text": "Customer clicked a suspicious collect request and now says the transfer was not authorized.",
+    "gold_faq_id": "faq_010",
+    "should_escalate": true,
+    "difficulty": "medium"
+  }
+]

graders/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+

graders/category_grader.py ADDED Viewed

	@@ -0,0 +1,40 @@

+from typing import Iterable, List
+from .score_utils import ensure_open_unit_interval
+def grade_track_classification(predicted_track: str, gold_track: str) -> float:
+    if predicted_track.strip().lower() == gold_track.strip().lower():
+        return ensure_open_unit_interval(1.0)
+    return ensure_open_unit_interval(0.0)
+def grade_information_collection(
+    requested_fields: Iterable[str],
+    required_fields: Iterable[str],
+) -> float:
+    requested = {field.strip().lower() for field in requested_fields if field.strip()}
+    required = {field.strip().lower() for field in required_fields if field.strip()}
+    if not requested or not required:
+        return ensure_open_unit_interval(0.0)
+    overlap = requested & required
+    return ensure_open_unit_interval(len(overlap) / len(required))
+def grade_batch_classification(predictions: List[str], gold_labels: List[str]) -> float:
+    if len(predictions) != len(gold_labels):
+        raise ValueError("predictions and gold_labels must have the same length")
+    if not predictions:
+        return ensure_open_unit_interval(0.0)
+    total = sum(
+        grade_track_classification(predicted, gold)
+        for predicted, gold in zip(predictions, gold_labels)
+    )
+    return ensure_open_unit_interval(total / len(predictions))
+# Backward-compatible alias while the environment transitions from category to track naming.
+def grade_classification(predicted_category: str, gold_category: str) -> float:
+    return grade_track_classification(predicted_category, gold_category)

graders/faq_grader.py ADDED Viewed

	@@ -0,0 +1,30 @@

+from typing import Iterable
+from .score_utils import ensure_open_unit_interval
+def grade_operation_choice(selected_operation: str, valid_operations: Iterable[str]) -> float:
+    operation = selected_operation.strip().lower()
+    valid = {candidate.strip().lower() for candidate in valid_operations if candidate.strip()}
+    if not operation or not valid:
+        return ensure_open_unit_interval(0.0)
+    return ensure_open_unit_interval(1.0 if operation in valid else 0.0)
+def grade_retrieval_or_action_match(selected_reference: str, gold_reference: str) -> float:
+    if selected_reference.strip() and selected_reference.strip() == gold_reference.strip():
+        return ensure_open_unit_interval(1.0)
+    return ensure_open_unit_interval(0.0)
+def grade_escalation(agent_escalated: bool, should_escalate: bool, correct_target: bool = True) -> float:
+    if agent_escalated != should_escalate:
+        return ensure_open_unit_interval(0.0)
+    if agent_escalated and not correct_target:
+        return ensure_open_unit_interval(0.5)
+    return ensure_open_unit_interval(1.0)
+# Backward-compatible alias from the old FAQ-focused environment.
+def grade_faq_retrieval(retrieved_faq_id: str, gold_faq_id: str) -> float:
+    return grade_retrieval_or_action_match(retrieved_faq_id, gold_faq_id)

graders/resolution_grader.py ADDED Viewed

	@@ -0,0 +1,30 @@

+from ..models import TicketState
+from .score_utils import ensure_open_unit_interval
+def grade_resolution(ticket_state: TicketState, max_turns: int = 6) -> float:
+    if ticket_state.escalated:
+        return ensure_open_unit_interval(1.0)
+    if not ticket_state.issue_resolved:
+        return ensure_open_unit_interval(0.0)
+    if ticket_state.turns_used > max_turns:
+        return ensure_open_unit_interval(0.0)
+    slot_bonus = 0.1 if ticket_state.required_slots and ticket_state.collected_slots else 0.0
+    penalty_turns = max(0, ticket_state.turns_used - 3)
+    score = 0.9 + slot_bonus - (0.05 * penalty_turns)
+    return ensure_open_unit_interval(score)
+def grade_case_closure(ticket_state: TicketState) -> float:
+    if ticket_state.issue_resolved or ticket_state.escalated:
+        return ensure_open_unit_interval(1.0)
+    return ensure_open_unit_interval(0.0)
+def grade_clarification(asked_clarification: bool, ticket_needed_clarification: bool) -> float:
+    if asked_clarification == ticket_needed_clarification:
+        return ensure_open_unit_interval(0.25)
+    return ensure_open_unit_interval(0.0)

graders/score_utils.py ADDED Viewed

	@@ -0,0 +1,24 @@

+import math
+from typing import Any
+MIN_SCORE = 0.001
+MAX_SCORE = 0.999
+def ensure_open_unit_interval(value: Any) -> float:
+    """Return a native Python float strictly inside the open unit interval."""
+    try:
+        score = float(value)
+    except (TypeError, ValueError):
+        return MIN_SCORE
+    if not math.isfinite(score):
+        return MIN_SCORE
+    score = max(0.0, min(1.0, score))
+    if score <= 0.0:
+        return MIN_SCORE
+    if score >= 1.0:
+        return MAX_SCORE
+    return float(score)

helpdesk_env.egg-info/PKG-INFO ADDED Viewed

	@@ -0,0 +1,321 @@

+Metadata-Version: 2.4
+Name: helpdesk-env
+Version: 1.0.0
+Summary: UPI banking customer support environment for OpenEnv
+Requires-Python: >=3.10
+Description-Content-Type: text/markdown
+Requires-Dist: openenv-core[core]>=0.2.2
+Requires-Dist: fastapi>=0.115.0
+Requires-Dist: openai>=1.0.0
+Requires-Dist: pydantic>=2.0.0
+Requires-Dist: requests>=2.31.0
+Requires-Dist: uvicorn>=0.24.0
+Provides-Extra: dev
+Requires-Dist: pytest>=8.0.0; extra == "dev"
+Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
+---
+title: UPI Banking Support Environment
+emoji: 🏦
+colorFrom: blue
+colorTo: indigo
+sdk: docker
+pinned: false
+app_port: 8000
+tags:
+  - openenv
+  - banking
+  - upi
+  - customer-support
+---
+# UPI Banking Support Environment
+OpenEnv-style environment for evaluating agents on UPI customer support workflows. The benchmark focuses on realistic banking support decisions rather than generic FAQ matching.
+## Motivation
+This environment is designed to test whether an agent can behave like a safe and useful support assistant for a UPI payments product such as Paytm, PhonePe, or Google Pay style support flows.
+The goal is not only to answer customers correctly, but also to:
+- identify the right issue type
+- retrieve the right knowledge entry
+- escalate fraud or overdue review cases when needed
+- avoid unsafe behavior such as asking for PINs or OTPs
+- handle multi-turn conversations before closing a case
+## Environment Description
+The environment uses three tasks with increasing difficulty:
+- `easy`: classify a customer issue into the correct support track
+- `medium`: choose the right FAQ or escalate when human/manual review is required
+- `hard`: run a short multi-turn support conversation with clarification, guidance, and closure
+The current support tracks are:
+- `payment_failure`
+- `refund_delay`
+- `fraud_complaint`
+- `kyc_account_restriction`
+- `upi_pin_or_bank_linking`
+The dataset includes:
+- 10 banking FAQ entries in [data/knowledge_base.json](data/knowledge_base.json)
+- 10 `easy` tickets in [data/tickets/easy.json](data/tickets/easy.json)
+- 10 `medium` tickets in [data/tickets/medium.json](data/tickets/medium.json)
+- 10 `hard` tickets in [data/tickets/hard.json](data/tickets/hard.json)
+## Action Space
+The public inference script and server accept the legacy action names below, which are internally mapped to the compact action model in [models.py](models.py).
+| Action | Parameters | Purpose |
+|---|---|---|
+| `classify` | `category` | Predict the correct support track for an `easy` ticket |
+| `lookup_faq` | `faq_id` | Choose the best FAQ entry for `medium` or `hard` |
+| `ask_clarification` | `message` | Ask a question to gather missing details in `hard` |
+| `reply` | `message` | Provide safe support guidance to the user |
+| `escalate` | `message` | Escalate a case that should not be fully handled automatically |
+| `resolve_ticket` | none | Close the case when it appears correctly resolved |
+Internally, these are normalized to:
+- `ask_for_details`
+- `take_action`
+- `respond_to_user`
+- `escalate_case`
+- `close_case`
+## Observation Space
+The model receives an `Observation` object from [models.py](models.py).
+| Field | Type | Description |
+|---|---|---|
+| `case_id` | `str` | Unique identifier for the active ticket |
+| `track` | `str` | Task split only: `easy`, `medium`, or `hard` |
+| `customer_message` | `str` | Current customer issue text shown to the agent |
+| `conversation_history` | `list[dict]` | Prior user/agent turns |
+| `known_facts` | `dict` | Agent-visible state such as FAQ set, available categories, and progress flags |
+| `required_slots` | `list[str]` | High-level missing information requirements for the episode |
+| `available_actions` | `list[str]` | Actions allowed by the environment |
+| `turn_number` | `int` | Current turn count |
+Important evaluation detail:
+- hidden gold labels such as the correct FAQ id and escalation label are not exposed to the model in the observation
+## Reward
+Rewards are normalized to the range `0.0` to `1.0` in [server/helpdesk_environment.py](server/helpdesk_environment.py).
+The final reward is shaped rather than purely binary. It combines:
+- `correctness`
+- `safety`
+- `resolution`
+- `efficiency`
+- `penalties`
+Weighted reward:
+```text
+0.35 * correctness
++ 0.30 * safety
++ 0.20 * resolution
++ 0.15 * efficiency
++ penalties
+```
+Examples:
+- correct classification gives a strong `easy` reward
+- correct FAQ retrieval gives partial progress on `medium`
+- correct escalation gives reward on `medium`
+- clarification plus guidance plus successful closure raises `hard` reward
+- unsafe prompts such as asking for PIN or OTP reduce reward sharply
+## Task Difficulty
+| Task | Difficulty | Description | Expected Agent Behavior |
+|---|---|---|---|
+| `easy` | Low | Single-turn issue classification | Identify the correct banking support track |
+| `medium` | Medium | FAQ retrieval or escalation decision | Select the right FAQ or escalate fraud / overdue review cases |
+| `hard` | High | Multi-turn support conversation | Ask clarification, guide safely, and close only when appropriate |
+## Setup
+From the package root:
+```bash
+cd /path/to/helpdesk_env
+python3 -m venv .venv
+source .venv/bin/activate
+.venv/bin/pip install -r requirements.txt
+```
+Runtime configuration is read from `.env`.
+The environment currently uses:
+- `API_BASE_URL` for the provider endpoint
+- `MODEL` or `MODEL_NAME` for the selected model
+- `API_KEY` as the primary model credential
+- `OPENAI_API_KEY` and `GROQ_API_KEY` are also supported as compatibility aliases
+- `HF_SPACE_URL` for the deployed Space runtime URL
+- `HF_SPACE_TOKEN` for protected Space access when required
+## Usage
+### Using Docker
+```bash
+# Build the image from the repository root
+docker build -t helpdesk-openenv:latest .
+# Run the server
+docker run -p 8000:8000 helpdesk-openenv:latest
+```
+Docker smoke test:
+```bash
+curl http://127.0.0.1:8000/health
+curl http://127.0.0.1:8000/
+curl -X POST http://127.0.0.1:8000/reset \
+  -H "Content-Type: application/json" \
+  -d '{}'
+curl -X POST http://127.0.0.1:8000/step \
+  -H "Content-Type: application/json" \
+  -d '{"action":{"action_type":"classify","category":"payment_failure"}}'
+curl http://127.0.0.1:8000/state
+```
+### Local Development
+```bash
+# Install dependencies
+python3 -m venv .venv
+.venv/bin/pip install -r requirements.txt
+# Quick compile check
+PYTHONPYCACHEPREFIX=/tmp/pycache python3 -m py_compile \
+  inference.py server/app.py server/helpdesk_environment.py
+# Run the server locally
+PYTHONPATH=.. .venv/bin/uvicorn helpdesk_env.server.app:app --host 127.0.0.1 --port 8000
+```
+### Run Inference
+```bash
+API_BASE_URL=https://api.openai.com/v1 \
+API_KEY=$OPENAI_API_KEY \
+MODEL=gpt-5 \
+TASK_NAME=easy \
+python3 inference.py
+```
+```bash
+API_BASE_URL=https://api.groq.com/openai/v1 \
+API_KEY=$GROQ_API_KEY \
+MODEL=llama-3.3-70b-versatile \
+TASK_NAME=easy \
+python3 inference.py
+```
+`inference.py` reads configuration from `.env`.
+The script prints structured logs in the required format:
+```text
+[START] task=easy env=helpdesk_env model=llama-3.3-70b-versatile
+[STEP] step=1 action={"action_type":"classify","category":"payment_failure"} reward=1.00 done=true error=null
+[END] success=true steps=1 score=1.000 rewards=1.00
+```
+### Use the Python Client
+```python
+from helpdesk_env.client import HelpdeskEnvClient
+client = HelpdeskEnvClient("http://127.0.0.1:8000")
+result = client.reset("easy")
+print(result.observation.customer_message)
+```
+For a deployed HF Space:
+```python
+from helpdesk_env.client import HelpdeskEnvClient
+client = HelpdeskEnvClient.from_env()
+print(client.health())
+```
+### Test the Live HF Space
+```bash
+curl -X POST "https://freakdivi-helpdesk.hf.space/reset" \
+  -H "Content-Type: application/json" \
+  -d '{"task_id":"easy"}'
+curl -X POST "https://freakdivi-helpdesk.hf.space/step" \
+  -H "Content-Type: application/json" \
+  -d '{"action":{"action_type":"classify","category":"payment_failure"}}'
+```
+## Hugging Face Space Deployment
+This repo is configured as a Docker-based HF Space through the YAML frontmatter at the top of this README:
+- `sdk: docker`
+- `app_port: 8000`
+- `tags` include `openenv`
+Live Space:
+- https://huggingface.co/spaces/Freakdivi/HelpDesk
+## Baseline Scores
+Latest observed Groq baseline run after removing answer leakage from the observation:
+| Model | Easy | Medium | Hard |
+|---|---:|---:|---:|
+| `llama-3.3-70b-versatile` | 0.98 | 0.67 | 0.53 |
+Interpretation:
+- `easy` is still quite direct and can be near-perfect for strong LLMs
+- `medium` and `hard` are more informative because they require retrieval, escalation judgment, and multi-turn behavior
+## Project Structure
+```text
+helpdesk_env/
+├── README.md
+├── Dockerfile
+├── .gitignore
+├── .dockerignore
+├── __init__.py
+├── client.py
+├── data/
+│   ├── knowledge_base.json
+│   └── tickets/
+│       ├── easy.json
+│       ├── medium.json
+│       └── hard.json
+├── inference.py
+├── models.py
+├── openenv.yaml
+├── requirements.txt
+├── user_simulator.py
+├── graders/
+│   ├── category_grader.py
+│   ├── faq_grader.py
+│   └── resolution_grader.py
+└── server/
+    ├── app.py
+    └── helpdesk_environment.py
+```
+## Notes
+[user_simulator.py](user_simulator.py) is intentionally kept. It powers the customer-side replies for the `hard` task, which is what makes the benchmark genuinely multi-turn instead of a static single-response scoring setup.

helpdesk_env.egg-info/SOURCES.txt ADDED Viewed

	@@ -0,0 +1,24 @@

+README.md
+pyproject.toml
+./__init__.py
+./client.py
+./inference.py
+./models.py
+./user_simulator.py
+./data/knowledge_base.json
+./data/tickets/easy.json
+./data/tickets/hard.json
+./data/tickets/medium.json
+graders/__init__.py
+graders/category_grader.py
+graders/faq_grader.py
+graders/resolution_grader.py
+helpdesk_env.egg-info/PKG-INFO
+helpdesk_env.egg-info/SOURCES.txt
+helpdesk_env.egg-info/dependency_links.txt
+helpdesk_env.egg-info/entry_points.txt
+helpdesk_env.egg-info/requires.txt
+helpdesk_env.egg-info/top_level.txt
+server/__init__.py
+server/app.py
+server/helpdesk_environment.py

helpdesk_env.egg-info/dependency_links.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+

helpdesk_env.egg-info/entry_points.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ [console_scripts]
2	+ server = helpdesk_env.server.app:main

helpdesk_env.egg-info/requires.txt ADDED Viewed

	@@ -0,0 +1,10 @@

+openenv-core[core]>=0.2.2
+fastapi>=0.115.0
+openai>=1.0.0
+pydantic>=2.0.0
+requests>=2.31.0
+uvicorn>=0.24.0
+[dev]
+pytest>=8.0.0
+pytest-cov>=4.0.0

helpdesk_env.egg-info/top_level.txt ADDED Viewed

	@@ -0,0 +1 @@


1	+ helpdesk_env

inference.py ADDED Viewed

	@@ -0,0 +1,329 @@

+import json
+import importlib
+import os
+import sys
+import textwrap
+from pathlib import Path
+from typing import TYPE_CHECKING, Any, Dict, List, Literal, Optional, Tuple, Type, cast
+from openai import OpenAI
+ROOT = Path(__file__).resolve().parent
+def _load_dotenv() -> None:
+    env_path = ROOT / ".env"
+    if not env_path.exists():
+        return
+    for raw_line in env_path.read_text(encoding="utf-8").splitlines():
+        line = raw_line.strip()
+        if not line or line.startswith("#") or "=" not in line:
+            continue
+        key, value = line.split("=", 1)
+        os.environ.setdefault(key.strip(), value.strip().strip('"').strip("'"))
+_load_dotenv()
+if TYPE_CHECKING:
+    from .models import Action
+    from .server.helpdesk_environment import HelpdeskEnv
+def _import_local_modules() -> Tuple[Type["HelpdeskEnv"], Type["Action"], Any]:
+    if __package__ not in (None, ""):
+        from .models import Action, normalize_action
+        from .server.helpdesk_environment import HelpdeskEnv
+        return HelpdeskEnv, Action, normalize_action
+    package_parent = ROOT.parent
+    package_name = ROOT.name
+    if str(package_parent) not in sys.path:
+        sys.path.insert(0, str(package_parent))
+    helpdesk_environment = importlib.import_module(
+        f"{package_name}.server.helpdesk_environment"
+    )
+    models = importlib.import_module(f"{package_name}.models")
+    return helpdesk_environment.HelpdeskEnv, models.Action, models.normalize_action
+HelpdeskEnv, Action, normalize_action = cast(
+    Tuple[Type["HelpdeskEnv"], Type["Action"], Any],
+    _import_local_modules(),
+)
+if __package__ not in (None, ""):
+    from .graders.score_utils import ensure_open_unit_interval
+else:
+    from graders.score_utils import ensure_open_unit_interval
+LOCAL_IMAGE_NAME = os.getenv("LOCAL_IMAGE_NAME", "helpdesk-openenv")
+API_BASE_URL = os.getenv("API_BASE_URL", "https://api.openai.com/v1")
+MODEL_NAME = os.getenv("MODEL") or os.getenv("MODEL_NAME") or "gpt-5"
+API_KEY = os.getenv("API_KEY") or os.getenv("OPENAI_API_KEY") or os.getenv("GROQ_API_KEY")
+HF_SPACE_URL = os.getenv("HF_SPACE_URL", "https://freakdivi-helpdesk.hf.space")
+HF_SPACE_TOKEN = os.getenv("HF_SPACE_TOKEN", "")
+TASK_NAME = os.getenv("TASK_NAME", "medium")
+BENCHMARK = os.getenv("BENCHMARK", "helpdesk_env")
+TEMPERATURE = float(os.getenv("TEMPERATURE", "0"))
+MAX_TOKENS = int(os.getenv("MAX_TOKENS", "180"))
+SUCCESS_SCORE_THRESHOLD = float(os.getenv("SUCCESS_SCORE_THRESHOLD", "0.50"))
+MAX_STEPS_BY_TASK = {
+    "easy": 1,
+    "medium": 3,
+    "hard": 8,
+}
+SYSTEM_PROMPT_BASE = (
+    "You are a banking customer support agent for a UPI payments app. "
+    "Never ask for PIN, OTP, CVV, or full card details. "
+    "You must return exactly one JSON object with keys from: "
+    "action_type, category, faq_id, message. "
+    "Valid action_type values are exactly: classify, lookup_faq, ask_clarification, "
+    "reply, escalate, resolve_ticket."
+)
+def system_prompt_for_task(task_id: str) -> str:
+    if task_id == "easy":
+        return (
+            SYSTEM_PROMPT_BASE
+            + " For easy tasks, classify the issue into exactly one category from "
+            "observation.available_categories."
+        )
+    if task_id == "medium":
+        return (
+            SYSTEM_PROMPT_BASE
+            + " For medium tasks, choose lookup_faq with the best faq_id from "
+            "observation.knowledge_base, or use escalate when fraud or overdue review requires manual handling."
+        )
+    return (
+        SYSTEM_PROMPT_BASE
+        + " For hard tasks, ask for clarification first, then retrieve the right FAQ, "
+        "then reply with safe guidance, and only resolve after the customer confirms the issue is fixed."
+    )
+def build_user_prompt(task_id: str, observation_json: str, history: List[str]) -> str:
+    history_block = "\n".join(history[-4:]) if history else "None"
+    return textwrap.dedent(
+        f"""
+        Task: {task_id}
+        Observation JSON:
+        {observation_json}
+        Recent action history:
+        {history_block}
+        Return the next action as one JSON object only.
+        """
+    ).strip()
+def log_start(task: str, env: str, model: str) -> None:
+    print(f"[START] task={task} env={env} model={model}", flush=True)
+def log_step(step: int, action: str, reward: float, done: bool, error: Optional[str]) -> None:
+    error_val = error if error else "null"
+    print(
+        f"[STEP] step={step} action={action} reward={reward:.2f} "
+        f"done={str(done).lower()} error={error_val}",
+        flush=True,
+    )
+def log_end(success: bool, steps: int, score: float, rewards: List[float]) -> None:
+    rewards_str = ",".join(f"{reward:.2f}" for reward in rewards)
+    print(
+        f"[END] success={str(success).lower()} steps={steps} "
+        f"score={score:.3f} rewards={rewards_str}",
+        flush=True,
+    )
+def _extract_json_object(text: str) -> str:
+    text = text.strip()
+    if text.startswith("```"):
+        lines = text.split("\n")
+        if len(lines) >= 2 and lines[0].startswith("```"):
+            lines = lines[1:]
+        if lines and lines[-1].strip() == "```":
+            lines = lines[:-1]
+        text = "\n".join(lines).strip()
+    return text
+_VALID_ACTIONS = frozenset(
+    {
+        "classify",
+        "lookup_faq",
+        "ask_clarification",
+        "reply",
+        "escalate",
+        "resolve_ticket",
+    }
+)
+ActionType = Literal[
+    "classify",
+    "lookup_faq",
+    "ask_clarification",
+    "reply",
+    "escalate",
+    "resolve_ticket",
+]
+def _normalize_action_type(raw: object) -> Optional[ActionType]:
+    if raw is None:
+        return None
+    value = str(raw).strip().lower().replace("-", "_")
+    return cast(ActionType, value) if value in _VALID_ACTIONS else None
+def _fallback_action(task_id: str, turn_number: int) -> Dict[str, Any]:
+    if task_id == "easy":
+        return {"action_type": "classify", "category": "payment_failure"}
+    if task_id == "medium":
+        return {"action_type": "escalate", "message": "Escalating for manual review."}
+    if turn_number == 0:
+        return {
+            "action_type": "ask_clarification",
+            "message": "Please share the UTR, amount, and exact issue.",
+        }
+    if turn_number == 1:
+        return {"action_type": "lookup_faq", "faq_id": "faq_001"}
+    if turn_number in (2, 3):
+        return {
+            "action_type": "reply",
+            "message": "Please follow the safe steps in the app and confirm the result.",
+        }
+    return {"action_type": "resolve_ticket"}
+def parse_action(response_text: str, task_id: str, turn_number: int) -> Dict[str, Any]:
+    text = _extract_json_object(response_text)
+    try:
+        payload = json.loads(text)
+    except json.JSONDecodeError:
+        start = text.find("{")
+        end = text.rfind("}")
+        if start != -1 and end != -1 and end > start:
+            try:
+                payload = json.loads(text[start : end + 1])
+            except json.JSONDecodeError:
+                payload = {}
+        else:
+            payload = {}
+    action_type = _normalize_action_type(payload.get("action_type"))
+    if not action_type:
+        return _fallback_action(task_id, turn_number)
+    try:
+        return {
+            "action_type": action_type,
+            "category": payload.get("category"),
+            "faq_id": payload.get("faq_id"),
+            "message": payload.get("message"),
+        }
+    except Exception:
+        return _fallback_action(task_id, turn_number)
+def get_model_action(
+    client: OpenAI,
+    task_id: str,
+    observation_json: str,
+    history: List[str],
+    turn_number: int,
+) -> Dict[str, Any]:
+    user_prompt = build_user_prompt(task_id, observation_json, history)
+    completion = client.chat.completions.create(
+        model=MODEL_NAME,
+        messages=[
+            {"role": "system", "content": system_prompt_for_task(task_id)},
+            {"role": "user", "content": user_prompt},
+        ],
+        temperature=TEMPERATURE,
+        max_tokens=MAX_TOKENS,
+        response_format={"type": "json_object"},
+    )
+    text = completion.choices[0].message.content or ""
+    return parse_action(text, task_id, turn_number)
+def main() -> None:
+    if not API_KEY:
+        raise RuntimeError(
+            "Set API_KEY, OPENAI_API_KEY, or GROQ_API_KEY before running inference.py"
+        )
+    client = OpenAI(base_url=API_BASE_URL, api_key=API_KEY)
+    env = HelpdeskEnv()
+    history: List[str] = []
+    rewards: List[float] = []
+    steps_taken = 0
+    score = ensure_open_unit_interval(0.0)
+    success = False
+    log_start(task=TASK_NAME, env=BENCHMARK, model=MODEL_NAME)
+    try:
+        observation = env.reset(TASK_NAME)
+        done = False
+        for step in range(1, MAX_STEPS_BY_TASK.get(TASK_NAME, 3) + 1):
+            if done:
+                break
+            error: Optional[str] = None
+            try:
+                raw_action = get_model_action(
+                    client=client,
+                    task_id=TASK_NAME,
+                    observation_json=observation.model_dump_json(),
+                    history=history,
+                    turn_number=observation.turn_number,
+                )
+                action = normalize_action(raw_action)
+                observation, reward, done, _info = env.step(action)
+                reward_value = ensure_open_unit_interval(reward.value)
+            except Exception as exc:
+                raw_action = _fallback_action(TASK_NAME, observation.turn_number)
+                action = normalize_action(raw_action)
+                reward_value = ensure_open_unit_interval(0.0)
+                done = True
+                error = str(exc)
+            action_str = json.dumps(action.model_dump(exclude_none=True), separators=(",", ":"))
+            log_step(
+                step=step,
+                action=action_str,
+                reward=reward_value,
+                done=done,
+                error=error,
+            )
+            rewards.append(reward_value)
+            steps_taken = step
+            history.append(f"step={step} action={action_str} reward={reward_value:.2f}")
+        score = ensure_open_unit_interval(sum(rewards) / len(rewards) if rewards else 0.0)
+        success = score >= SUCCESS_SCORE_THRESHOLD
+    finally:
+        log_end(success=success, steps=steps_taken, score=score, rewards=rewards)
+if __name__ == "__main__":
+    main()

models.py ADDED Viewed

	@@ -0,0 +1,150 @@

+from dataclasses import dataclass, field
+from typing import Any, Dict, List, Literal, Optional
+from pydantic import BaseModel, Field, model_validator
+class Observation(BaseModel):
+    case_id: str
+    track: str
+    customer_message: str
+    conversation_history: List[Dict[str, str]]
+    known_facts: Dict[str, Any]
+    required_slots: List[str]
+    available_actions: List[str]
+    turn_number: int
+    @property
+    def ticket_id(self) -> str:
+        return self.case_id
+    @property
+    def task_id(self) -> str:
+        return str(self.known_facts.get("difficulty", ""))
+    @property
+    def ticket_text(self) -> str:
+        return self.customer_message
+    @property
+    def knowledge_base(self) -> List[Dict[str, Any]]:
+        kb = self.known_facts.get("knowledge_base", [])
+        return kb if isinstance(kb, list) else []
+    @property
+    def available_categories(self) -> List[str]:
+        categories = self.known_facts.get("available_categories", [])
+        return categories if isinstance(categories, list) else []
+class Action(BaseModel):
+    action_type: Literal[
+        "ask_for_details",
+        "take_action",
+        "respond_to_user",
+        "escalate_case",
+        "close_case",
+    ]
+    message: Optional[str] = None
+    fields_requested: List[str] = Field(default_factory=list)
+    operation: Optional[str] = None
+    target: Optional[str] = None
+    # Legacy compatibility with the original helpdesk action schema.
+    category: Optional[str] = None
+    faq_id: Optional[str] = None
+    @model_validator(mode="after")
+    def _validate_canonical_shape(self) -> "Action":
+        if self.action_type == "take_action" and not self.operation:
+            raise ValueError("take_action requires operation")
+        return self
+LegacyActionType = Literal[
+    "classify",
+    "lookup_faq",
+    "ask_clarification",
+    "reply",
+    "escalate",
+    "resolve_ticket",
+]
+def normalize_action(raw: Dict[str, Any]) -> Action:
+    action_type = str(raw.get("action_type", "")).strip()
+    if action_type == "classify":
+        return Action(
+            action_type="take_action",
+            operation="classify",
+            category=raw.get("category"),
+            message=raw.get("message"),
+            faq_id=raw.get("faq_id"),
+        )
+    if action_type == "lookup_faq":
+        return Action(
+            action_type="take_action",
+            operation="lookup_faq",
+            faq_id=raw.get("faq_id"),
+            message=raw.get("message"),
+            category=raw.get("category"),
+        )
+    if action_type == "ask_clarification":
+        return Action(
+            action_type="ask_for_details",
+            fields_requested=list(raw.get("fields_requested") or ["issue_details"]),
+            message=raw.get("message"),
+        )
+    if action_type == "reply":
+        return Action(
+            action_type="respond_to_user",
+            message=raw.get("message"),
+        )
+    if action_type == "escalate":
+        return Action(
+            action_type="escalate_case",
+            target=raw.get("target") or "human_agent",
+            message=raw.get("message"),
+        )
+    if action_type == "resolve_ticket":
+        return Action(
+            action_type="close_case",
+            operation=raw.get("operation") or "resolve_with_guidance",
+            message=raw.get("message"),
+        )
+    return Action(**raw)
+class Reward(BaseModel):
+    value: float = Field(ge=0.0, le=1.0)
+    correctness: float
+    safety: float
+    resolution: float
+    efficiency: float
+    penalties: float
+    done: bool
+    info: Dict[str, Any]
+    @property
+    def escalation_accuracy(self) -> float:
+        return float(self.info.get("escalation_accuracy", self.correctness))
+@dataclass
+class TicketState:
+    ticket_id: str
+    track: str
+    required_slots: List[str] = field(default_factory=list)
+    collected_slots: Dict[str, Any] = field(default_factory=dict)
+    issue_resolved: bool = False
+    clarification_received: bool = False
+    escalated: bool = False
+    turns_used: int = 0
+    correct_faq_retrieved: bool = False

openenv.yaml ADDED Viewed

	@@ -0,0 +1,113 @@

+spec_version: 1
+name: helpdesk_env
+version: "0.1.0"
+description: >
+  An OpenEnv RL environment simulating UPI banking customer support workflows.
+  An AI agent classifies issues, retrieves the correct FAQ or escalation path,
+  and completes a safe multi-turn support flow across three graded tasks of
+  increasing difficulty.
+author: Freakdivi
+tags:
+  - openenv
+  - banking
+  - upi
+  - customer-support
+  - rl-environment
+type: space
+runtime: fastapi
+app: server.app:app
+port: 8000
+default_task: medium
+tasks:
+  - id: easy
+    difficulty: easy
+    description: Classify the customer's issue into the correct support category
+    dataset: data/tickets/easy.json
+    max_steps: 1
+    reward_range: [0.0, 1.0]
+    grader:
+      type: python
+      reward_source: server.helpdesk_environment:HelpdeskEnv.step
+      score_field: reward.value
+      functions:
+        - graders.category_grader:grade_classification
+        - graders.resolution_grader:grade_resolution
+        - graders.score_utils:ensure_open_unit_interval
+  - id: medium
+    difficulty: medium
+    description: Select the correct FAQ or escalate cases that require manual handling
+    dataset: data/tickets/medium.json
+    max_steps: 3
+    reward_range: [0.0, 1.0]
+    grader:
+      type: python
+      reward_source: server.helpdesk_environment:HelpdeskEnv.step
+      score_field: reward.value
+      functions:
+        - graders.faq_grader:grade_faq_retrieval
+        - graders.faq_grader:grade_escalation
+        - graders.faq_grader:grade_operation_choice
+        - graders.score_utils:ensure_open_unit_interval
+  - id: hard
+    difficulty: hard
+    description: Run a multi-turn support conversation with clarification, guidance, and safe closure
+    dataset: data/tickets/hard.json
+    max_steps: 8
+    reward_range: [0.0, 1.0]
+    grader:
+      type: python
+      reward_source: server.helpdesk_environment:HelpdeskEnv.step
+      score_field: reward.value
+      functions:
+        - graders.category_grader:grade_information_collection
+        - graders.faq_grader:grade_faq_retrieval
+        - graders.resolution_grader:grade_case_closure
+        - graders.resolution_grader:grade_resolution
+        - graders.score_utils:ensure_open_unit_interval
+observation_space:
+  type: object
+  fields:
+    case_id: string
+    track: string
+    customer_message: string
+    conversation_history: array
+    known_facts: object
+    required_slots: array
+    available_actions: array
+    turn_number: integer
+action_space:
+  type: object
+  fields:
+    action_type: "classify | lookup_faq | ask_clarification | reply | escalate | resolve_ticket"
+    category: string (optional)
+    faq_id: string (optional)
+    message: string (optional)
+    fields_requested: array (optional)
+    target: string (optional)
+    operation: string (optional)
+reward:
+  type: float
+  range: [0.0, 1.0]
+  description: >
+    Partial reward is produced at each step and normalized by the environment.
+    The final reward combines correctness, safety, resolution, efficiency, and
+    penalties, with score outputs constrained to the open interval (0, 1) for
+    submission compatibility.
+endpoints:
+  reset: POST /reset
+  step: POST /step
+  state: GET /state
+  health: GET /health
+runtime_config:
+  framework: fastapi
+  python: "3.10"
+  port: 8000

pyproject.toml ADDED Viewed

	@@ -0,0 +1,43 @@

+# Copyright (c) Meta Platforms, Inc. and affiliates.
+# All rights reserved.
+#
+# This source code is licensed under the BSD-style license found in the
+# LICENSE file in the root directory of this source tree.
+[build-system]
+requires = ["setuptools>=45", "wheel"]
+build-backend = "setuptools.build_meta"
+[project]
+name = "openenv-helpdesk_env"
+version = "0.1.0"
+description = "UPI banking customer support environment for OpenEnv"
+requires-python = ">=3.10"
+dependencies = [
+    # Core OpenEnv runtime (provides FastAPI server + HTTP client types)
+    # install from github
+    # "openenv-core[core] @ git+https://github.com/meta-pytorch/OpenEnv.git",
+    "openenv-core[core]>=0.2.2",
+    # Environment-specific dependencies
+    "fastapi>=0.115.0",
+    "openai>=1.0.0",
+    "pydantic>=2.0.0",
+    "requests>=2.31.0",
+    "uvicorn>=0.24.0",
+]
+[project.optional-dependencies]
+dev = [
+    "pytest>=8.0.0",
+    "pytest-cov>=4.0.0",
+]
+[project.scripts]
+# Server entry point - enables running via: uv run --project . server
+# or: python -m helpdesk_env.server.app
+server = "helpdesk_env.server.app:main"
+[tool.setuptools]
+include-package-data = true
+packages = ["helpdesk_env", "helpdesk_env.server", "helpdesk_env.graders"]
+package-dir = { "helpdesk_env" = ".", "helpdesk_env.server" = "server", "helpdesk_env.graders" = "graders" }

pyrightconfig.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "venvPath": "../..",
+  "venv": ".venv",
+  "include": [
+    "."
+  ]
+}

requirements.txt ADDED Viewed

	@@ -0,0 +1,5 @@

+pydantic
+openai
+fastapi
+uvicorn
+requests

server/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+

server/app.py ADDED Viewed

	@@ -0,0 +1,88 @@

+"""FastAPI server exposing HelpdeskEnv over HTTP."""
+from typing import Any, Dict, Optional
+from fastapi import FastAPI
+from pydantic import BaseModel
+import uvicorn
+from .helpdesk_environment import HelpdeskEnv
+from ..models import Action, Reward, normalize_action
+app = FastAPI(title="Helpdesk OpenEnv")
+_env: Optional[HelpdeskEnv] = None
+def get_env() -> HelpdeskEnv:
+    global _env
+    if _env is None:
+        _env = HelpdeskEnv()
+    return _env
+class ResetBody(BaseModel):
+    task_id: str = "easy"
+def _zero_reward() -> Dict[str, Any]:
+    return Reward(
+        value=0.0,
+        correctness=0.0,
+        safety=1.0,
+        resolution=0.0,
+        efficiency=0.0,
+        penalties=0.0,
+        done=False,
+        info={},
+    ).model_dump()
+@app.get("/health")
+def health() -> Dict[str, str]:
+    return {"status": "healthy"}
+@app.get("/")
+def root() -> Dict[str, Any]:
+    return {
+        "name": "UPI Banking Support Environment",
+        "status": "running",
+        "endpoints": ["/health", "/reset", "/step", "/state"],
+    }
+@app.post("/reset")
+def reset(body: ResetBody = ResetBody()) -> Dict[str, Any]:
+    obs = get_env().reset(body.task_id)
+    return {
+        "observation": obs.model_dump(),
+        "reward": _zero_reward(),
+        "done": False,
+        "info": {},
+    }
+@app.post("/step")
+def step(body: Dict[str, Any]) -> Dict[str, Any]:
+    action = normalize_action(body["action"])
+    obs, reward, done, info = get_env().step(action)
+    return {
+        "observation": obs.model_dump(),
+        "reward": reward.model_dump(),
+        "done": done,
+        "info": info,
+    }
+@app.get("/state")
+def state() -> Dict[str, Any]:
+    obs = get_env().state()
+    return {"observation": obs.model_dump()}
+def main() -> None:
+    uvicorn.run("helpdesk_env.server.app:app", host="0.0.0.0", port=8000)
+if __name__ == "__main__":
+    main()

server/helpdesk_environment.py ADDED Viewed

	@@ -0,0 +1,360 @@

+import json
+import random
+from pathlib import Path
+from typing import Any, Dict, List, Optional, Tuple
+from ..graders.category_grader import grade_classification, grade_information_collection
+from ..graders.faq_grader import (
+    grade_escalation,
+    grade_faq_retrieval,
+    grade_operation_choice,
+)
+from ..graders.resolution_grader import grade_case_closure, grade_resolution
+from ..graders.score_utils import ensure_open_unit_interval
+from ..models import Action, Observation, Reward, TicketState
+from ..user_simulator import UserSimulator
+def _data_dir() -> Path:
+    return Path(__file__).resolve().parent.parent / "data"
+class HelpdeskEnv:
+    def __init__(self):
+        data_dir = _data_dir()
+        tickets_dir = data_dir / "tickets"
+        with open(data_dir / "knowledge_base.json", "r", encoding="utf-8") as f:
+            self.kb: List[Dict[str, str]] = json.load(f)
+        with open(tickets_dir / "easy.json", "r", encoding="utf-8") as f:
+            self.easy_tickets: List[Dict[str, Any]] = json.load(f)
+        with open(tickets_dir / "medium.json", "r", encoding="utf-8") as f:
+            self.medium_tickets: List[Dict[str, Any]] = json.load(f)
+        with open(tickets_dir / "hard.json", "r", encoding="utf-8") as f:
+            self.hard_tickets: List[Dict[str, Any]] = json.load(f)
+        self.current_ticket: Optional[Dict[str, Any]] = None
+        self.ticket_state: Optional[TicketState] = None
+        self.user_sim: Optional[UserSimulator] = None
+        self.task_id: str = "easy"
+        self.turn_number: int = 0
+        self.conversation_history: List[Dict[str, str]] = []
+        self.action_history: List[str] = []
+    def reset(self, task_id: str = "easy") -> Observation:
+        pool_map = {
+            "easy": self.easy_tickets,
+            "medium": self.medium_tickets,
+            "hard": self.hard_tickets,
+        }
+        if task_id not in pool_map:
+            raise ValueError("task_id must be one of: easy, medium, hard")
+        self.task_id = task_id
+        self.current_ticket = random.choice(pool_map[task_id])
+        self.ticket_state = TicketState(
+            ticket_id=self.current_ticket["id"],
+            track=self._infer_track(self.current_ticket),
+            required_slots=self._required_slots(self.current_ticket, task_id),
+        )
+        self.user_sim = UserSimulator(self.current_ticket) if task_id == "hard" else None
+        self.turn_number = 0
+        self.conversation_history = []
+        self.action_history = []
+        return self.state()
+    def step(self, action: Action) -> Tuple[Observation, Reward, bool, Dict[str, Any]]:
+        if self.current_ticket is None or self.ticket_state is None:
+            raise RuntimeError("Environment not initialized. Call reset() first.")
+        current_ticket = self.current_ticket
+        ticket_state = self.ticket_state
+        canonical_action = action
+        self.turn_number += 1
+        ticket_state.turns_used += 1
+        self.action_history.append(canonical_action.action_type)
+        self._track_collected_slots(canonical_action)
+        action_content = (
+            canonical_action.message
+            or canonical_action.operation
+            or canonical_action.target
+            or canonical_action.action_type
+        )
+        self.conversation_history.append({"role": "agent", "content": action_content})
+        done = False
+        metrics: Dict[str, float] = {
+            "correctness": 0.0,
+            "safety": 1.0,
+            "resolution": 0.0,
+            "efficiency": 0.0,
+            "penalties": 0.0,
+        }
+        info: Dict[str, Any] = {
+            "action_type": canonical_action.action_type,
+            "operation": canonical_action.operation,
+            "target": canonical_action.target,
+        }
+        if canonical_action.action_type == "ask_for_details":
+            metrics["correctness"] = self._grade_detail_request(canonical_action)
+            if self.task_id == "hard" and self.user_sim is not None:
+                user_response = self.user_sim.respond(canonical_action.message or "")
+                self.conversation_history.append({"role": "user", "content": user_response})
+                ticket_state.clarification_received = self.user_sim.clarification_given
+                info["user_response"] = user_response
+        elif canonical_action.action_type == "take_action":
+            correctness, resolved = self._grade_take_action(canonical_action)
+            metrics["correctness"] = correctness
+            ticket_state.issue_resolved = resolved
+            if resolved:
+                metrics["resolution"] = grade_resolution(ticket_state)
+                done = True
+        elif canonical_action.action_type == "respond_to_user":
+            metrics["correctness"] = self._grade_response(canonical_action)
+            if self.task_id == "hard" and self.user_sim is not None:
+                user_response = self.user_sim.respond(canonical_action.message or "")
+                self.conversation_history.append({"role": "user", "content": user_response})
+                ticket_state.issue_resolved = self.user_sim.confirm_resolved()
+                info["user_response"] = user_response
+        elif canonical_action.action_type == "escalate_case":
+            metrics["correctness"] = grade_escalation(
+                True,
+                bool(current_ticket.get("should_escalate", False)),
+            )
+            ticket_state.escalated = True
+            metrics["resolution"] = metrics["correctness"]
+            info["escalation_accuracy"] = metrics["correctness"]
+            done = True
+        elif canonical_action.action_type == "close_case":
+            if self.task_id == "hard" and self.user_sim is not None:
+                ticket_state.issue_resolved = self.user_sim.confirm_resolved()
+            metrics["resolution"] = grade_case_closure(ticket_state)
+            if metrics["resolution"] <= 0.001 and not ticket_state.escalated:
+                metrics["penalties"] -= 0.20
+            done = True
+        metrics["safety"] = self._grade_safety(canonical_action, metrics)
+        metrics["efficiency"] = self._grade_efficiency(done)
+        reward = self._calculate_reward(metrics, done=done)
+        info.update(
+            {
+                "ticket_id": ticket_state.ticket_id,
+                "task_id": self.task_id,
+                "track": ticket_state.track,
+                "turn_number": self.turn_number,
+            }
+        )
+        return self.state(), reward, done, info
+    def _infer_track(self, ticket: Dict[str, Any]) -> str:
+        category = (
+            ticket.get("issue_category")
+            or ticket.get("gold_category")
+            or ticket.get("difficulty")
+            or self.task_id
+        )
+        return str(category).strip().lower().replace(" ", "_")
+    def _required_slots(self, ticket: Dict[str, Any], task_id: str) -> List[str]:
+        if task_id == "easy":
+            return ["issue_category"]
+        if task_id == "medium":
+            return ["faq_or_escalation_decision"]
+        return ["issue_details", "resolution_confirmation"]
+    def _track_collected_slots(self, action: Action) -> None:
+        if self.ticket_state is None:
+            return
+        for field_name in action.fields_requested:
+            self.ticket_state.collected_slots[field_name] = "requested"
+        if action.operation:
+            self.ticket_state.collected_slots["last_operation"] = action.operation
+        if action.target:
+            self.ticket_state.collected_slots["escalation_target"] = action.target
+    def _grade_detail_request(self, action: Action) -> float:
+        if self.ticket_state is None:
+            return ensure_open_unit_interval(0.0)
+        if not action.fields_requested and not action.message:
+            return ensure_open_unit_interval(0.0)
+        if not self.ticket_state.required_slots:
+            return ensure_open_unit_interval(0.5)
+        info_score = grade_information_collection(
+            action.fields_requested,
+            self.ticket_state.required_slots,
+        )
+        if self.task_id != "hard" and info_score <= 0.001:
+            return ensure_open_unit_interval(0.5)
+        return ensure_open_unit_interval(info_score)
+    def _grade_take_action(self, action: Action) -> Tuple[float, bool]:
+        if self.current_ticket is None:
+            return ensure_open_unit_interval(0.0), False
+        operation = (action.operation or "").strip().lower()
+        current_ticket = self.current_ticket
+        if operation in {"classify_issue", "classify"}:
+            gold_category = current_ticket.get("gold_category", "")
+            score = grade_classification(action.category or "", gold_category)
+            resolved = (action.category or "").strip().lower() == str(gold_category).strip().lower()
+            return score, resolved
+        if operation == "lookup_faq":
+            gold_faq_id = current_ticket.get("gold_faq_id", "")
+            score = grade_faq_retrieval(action.faq_id or "", gold_faq_id)
+            if self.ticket_state is not None and (action.faq_id or "").strip() == str(gold_faq_id).strip():
+                self.ticket_state.correct_faq_retrieved = True
+            return score, False
+        if operation == "resolve_with_guidance":
+            resolved = bool(
+                self.ticket_state
+                and self.ticket_state.correct_faq_retrieved
+                and (self.task_id != "hard" or self.ticket_state.clarification_received)
+            )
+            return ensure_open_unit_interval(1.0 if resolved else 0.0), resolved
+        if operation == "check_status":
+            return ensure_open_unit_interval(0.5), False
+        banking_operations = {
+            "check_payment",
+            "check_refund",
+            "check_kyc",
+            "secure_account",
+            "troubleshoot_upi",
+        }
+        op_score = grade_operation_choice(operation, banking_operations)
+        return op_score, False
+    def _grade_response(self, action: Action) -> float:
+        if not action.message:
+            return ensure_open_unit_interval(0.0)
+        if self.task_id == "hard" and self.ticket_state and self.ticket_state.correct_faq_retrieved:
+            return ensure_open_unit_interval(1.0)
+        return ensure_open_unit_interval(0.5)
+    def _grade_safety(self, action: Action, metrics: Dict[str, float]) -> float:
+        text = (action.message or "").lower()
+        sensitive_markers = ["otp", "pin", "cvv", "password"]
+        if any(marker in text for marker in sensitive_markers):
+            metrics["penalties"] -= 0.50
+            return ensure_open_unit_interval(0.0)
+        if action.action_type == "close_case" and metrics["resolution"] <= 0.001:
+            return ensure_open_unit_interval(0.25)
+        if action.action_type == "escalate_case":
+            expected = bool(self.current_ticket and self.current_ticket.get("should_escalate", False))
+            return ensure_open_unit_interval(1.0 if expected else 0.6)
+        return ensure_open_unit_interval(1.0)
+    def _grade_efficiency(self, done: bool) -> float:
+        max_turns = 1 if self.task_id == "easy" else 2 if self.task_id == "medium" else 6
+        if not done:
+            remaining_ratio = max(0.0, 1.0 - (self.turn_number / max_turns))
+            return ensure_open_unit_interval(round(0.5 * remaining_ratio, 3))
+        return ensure_open_unit_interval(1.0 - (0.1 * max(0, self.turn_number - 1)))
+    def _calculate_reward(self, metrics: Dict[str, float], done: bool) -> Reward:
+        correctness = ensure_open_unit_interval(metrics.get("correctness", 0.0))
+        safety = ensure_open_unit_interval(metrics.get("safety", 0.0))
+        resolution = ensure_open_unit_interval(metrics.get("resolution", 0.0))
+        efficiency = ensure_open_unit_interval(metrics.get("efficiency", 0.0))
+        penalties = metrics.get("penalties", 0.0)
+        weighted = (
+            (0.35 * correctness)
+            + (0.30 * safety)
+            + (0.20 * resolution)
+            + (0.15 * efficiency)
+        )
+        recent_actions = self.action_history[-3:]
+        if len(recent_actions) >= 2 and len(set(recent_actions)) < len(recent_actions):
+            penalties -= 0.05
+        case_adjustment = self._case_complexity_adjustment()
+        final_value = ensure_open_unit_interval(weighted + penalties + case_adjustment)
+        return Reward(
+            value=final_value,
+            correctness=correctness,
+            safety=safety,
+            resolution=resolution,
+            efficiency=efficiency,
+            penalties=penalties,
+            done=done,
+            info={
+                "turn_number": self.turn_number,
+                "task_id": self.task_id,
+                "case_adjustment": case_adjustment,
+                "escalation_accuracy": metrics.get("escalation_accuracy", correctness),
+            },
+        )
+    def _case_complexity_adjustment(self) -> float:
+        if self.current_ticket is None:
+            return 0.0
+        ticket_id = str(self.current_ticket.get("id", ""))
+        bucket = sum(ord(char) for char in ticket_id) % 4
+        return -0.015 * bucket
+    def _build_known_facts(self) -> Dict[str, Any]:
+        if self.current_ticket is None or self.ticket_state is None:
+            return {}
+        return {
+            "difficulty": self.current_ticket.get("difficulty", self.task_id),
+            "knowledge_base": self.kb,
+            "available_categories": [
+                "payment_failure",
+                "refund_delay",
+                "fraud_complaint",
+                "kyc_account_restriction",
+                "upi_pin_or_bank_linking",
+            ],
+            "clarification_received": self.ticket_state.clarification_received,
+            "faq_retrieved": self.ticket_state.correct_faq_retrieved,
+            "issue_resolved": self.ticket_state.issue_resolved,
+            "collected_slots": self.ticket_state.collected_slots,
+        }
+    def state(self) -> Observation:
+        if self.current_ticket is None or self.ticket_state is None:
+            raise RuntimeError("Environment not initialized. Call reset() first.")
+        customer_message = self.current_ticket.get("text") or self.current_ticket.get(
+            "initial_text", ""
+        )
+        return Observation(
+            case_id=self.current_ticket["id"],
+            track=self.task_id,
+            customer_message=customer_message,
+            conversation_history=self.conversation_history,
+            known_facts=self._build_known_facts(),
+            required_slots=self.ticket_state.required_slots,
+            available_actions=[
+                "ask_for_details",
+                "take_action",
+                "respond_to_user",
+                "escalate_case",
+                "close_case",
+            ],
+            turn_number=self.turn_number,
+        )
+__all__ = ["HelpdeskEnv"]

user_simulator.py ADDED Viewed

	@@ -0,0 +1,46 @@

+import random
+from typing import Dict, List
+class UserSimulator:
+    def __init__(self, ticket: Dict):
+        self.ticket_id = ticket.get("id", "")
+        self.initial_text = ticket.get("initial_text", "")
+        self.clarified_text = ticket.get("clarified_text", "")
+        self.trigger_phrases: List[str] = ticket.get("trigger_phrases", [])
+        self.gold_faq_id = ticket.get("gold_faq_id", "")
+        self.state = "initial"
+        self.issue_resolved = False
+        self.clarification_given = False
+    def respond(self, agent_message: str) -> str:
+        agent_message_lower = agent_message.lower()
+        if self.state == "initial":
+            if any(phrase.lower() in agent_message_lower for phrase in self.trigger_phrases):
+                self.state = "clarified"
+                self.clarification_given = True
+                return self.clarified_text
+            return random.choice(
+                [
+                    "I'm not sure what you mean",
+                    "Can you help me?",
+                    "It just stopped working",
+                ]
+            )
+        if self.state == "clarified":
+            guidance_keywords = ["try", "follow", "steps", "should", "please"]
+            if any(keyword in agent_message_lower for keyword in guidance_keywords):
+                self.state = "waiting_resolve"
+            return "Ok I will try that, thanks"
+        if self.state == "waiting_resolve":
+            self.issue_resolved = True
+            return "Yes that fixed it!"
+        return "Can you help me?"
+    def confirm_resolved(self) -> bool:
+        return self.issue_resolved

uv.lock ADDED Viewed

The diff for this file is too large to render. See raw diff