Spaces:

Clove25
/

ToolUseEnv

Sleeping

App Files Files Community

Clove25 commited on about 1 month ago

Commit

7d581e2

verified ·

1 Parent(s): a60dfa7

Update README.md

Browse files

Files changed (1) hide show

README.md +86 -0

README.md CHANGED Viewed

@@ -61,6 +61,92 @@ The typed `ToolUseObservation` includes:
 The typed `ToolUseState` exposes internal progress such as `final_score`, `drafted_reply`, `resolution_code`, `required_evidence`, `collected_evidence`, and action history.
 ## Reward design
 The reward is shaped over the full trajectory:

 The typed `ToolUseState` exposes internal progress such as `final_score`, `drafted_reply`, `resolution_code`, `required_evidence`, `collected_evidence`, and action history.
+## How to use
+Each episode is a support case. The agent should usually follow this flow:
+1. Read the customer ticket.
+2. Inspect the relevant business artifacts.
+3. Look up the matching policy.
+4. Draft a customer-facing reply.
+5. Submit the final resolution code.
+### What each action field means
+- `action_type`
+  The operation you want the environment to perform.
+- `artifact_id`
+  The internal record you want to inspect. Examples: `order`, `payment`, `account`, `risk_log`.
+- `query`
+  The policy lookup term. Examples: `damaged_items`, `duplicate_charge`, `account_takeover`.
+- `message`
+  The reply draft that would be sent to the customer.
+- `resolution_code`
+  The final case outcome you want to submit. Examples: `send_replacement`, `refund_duplicate_charge`, `lock_account_and_escalate_fraud`.
+### Typical action examples
+Review the ticket:
+```json
+{
+  "action_type": "review_ticket"
+}
+```
+Inspect an order record:
+```json
+{
+  "action_type": "inspect_artifact",
+  "artifact_id": "order"
+}
+```
+Look up a policy:
+```json
+{
+  "action_type": "search_policy",
+  "query": "duplicate_charge"
+}
+```
+Save a reply draft:
+```json
+{
+  "action_type": "draft_reply",
+  "message": "We confirmed the duplicate charge and issued a refund. You should see it in 3-5 business days."
+}
+```
+Submit the final resolution:
+```json
+{
+  "action_type": "submit_resolution",
+  "resolution_code": "refund_duplicate_charge"
+}
+```
+### How the playground works
+If `/web` is enabled, the playground lets you send one action at a time.
+- Start with `Reset`.
+- Enter the action fields for the next step.
+- Use `Get state` to inspect internal progress.
+- Keep stepping until you submit a final resolution or run out of steps.
+The observation will show:
+- which evidence you have already collected
+- the last tool result
+- any action validation error
+- your current partial score
+- how many steps remain
 ## Reward design
 The reward is shaped over the full trajectory: