Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
|
@@ -61,6 +61,92 @@ The typed `ToolUseObservation` includes:
|
|
| 61 |
|
| 62 |
The typed `ToolUseState` exposes internal progress such as `final_score`, `drafted_reply`, `resolution_code`, `required_evidence`, `collected_evidence`, and action history.
|
| 63 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 64 |
## Reward design
|
| 65 |
|
| 66 |
The reward is shaped over the full trajectory:
|
|
|
|
| 61 |
|
| 62 |
The typed `ToolUseState` exposes internal progress such as `final_score`, `drafted_reply`, `resolution_code`, `required_evidence`, `collected_evidence`, and action history.
|
| 63 |
|
| 64 |
+
## How to use
|
| 65 |
+
|
| 66 |
+
Each episode is a support case. The agent should usually follow this flow:
|
| 67 |
+
|
| 68 |
+
1. Read the customer ticket.
|
| 69 |
+
2. Inspect the relevant business artifacts.
|
| 70 |
+
3. Look up the matching policy.
|
| 71 |
+
4. Draft a customer-facing reply.
|
| 72 |
+
5. Submit the final resolution code.
|
| 73 |
+
|
| 74 |
+
### What each action field means
|
| 75 |
+
|
| 76 |
+
- `action_type`
|
| 77 |
+
The operation you want the environment to perform.
|
| 78 |
+
- `artifact_id`
|
| 79 |
+
The internal record you want to inspect. Examples: `order`, `payment`, `account`, `risk_log`.
|
| 80 |
+
- `query`
|
| 81 |
+
The policy lookup term. Examples: `damaged_items`, `duplicate_charge`, `account_takeover`.
|
| 82 |
+
- `message`
|
| 83 |
+
The reply draft that would be sent to the customer.
|
| 84 |
+
- `resolution_code`
|
| 85 |
+
The final case outcome you want to submit. Examples: `send_replacement`, `refund_duplicate_charge`, `lock_account_and_escalate_fraud`.
|
| 86 |
+
|
| 87 |
+
### Typical action examples
|
| 88 |
+
|
| 89 |
+
Review the ticket:
|
| 90 |
+
|
| 91 |
+
```json
|
| 92 |
+
{
|
| 93 |
+
"action_type": "review_ticket"
|
| 94 |
+
}
|
| 95 |
+
```
|
| 96 |
+
|
| 97 |
+
Inspect an order record:
|
| 98 |
+
|
| 99 |
+
```json
|
| 100 |
+
{
|
| 101 |
+
"action_type": "inspect_artifact",
|
| 102 |
+
"artifact_id": "order"
|
| 103 |
+
}
|
| 104 |
+
```
|
| 105 |
+
|
| 106 |
+
Look up a policy:
|
| 107 |
+
|
| 108 |
+
```json
|
| 109 |
+
{
|
| 110 |
+
"action_type": "search_policy",
|
| 111 |
+
"query": "duplicate_charge"
|
| 112 |
+
}
|
| 113 |
+
```
|
| 114 |
+
|
| 115 |
+
Save a reply draft:
|
| 116 |
+
|
| 117 |
+
```json
|
| 118 |
+
{
|
| 119 |
+
"action_type": "draft_reply",
|
| 120 |
+
"message": "We confirmed the duplicate charge and issued a refund. You should see it in 3-5 business days."
|
| 121 |
+
}
|
| 122 |
+
```
|
| 123 |
+
|
| 124 |
+
Submit the final resolution:
|
| 125 |
+
|
| 126 |
+
```json
|
| 127 |
+
{
|
| 128 |
+
"action_type": "submit_resolution",
|
| 129 |
+
"resolution_code": "refund_duplicate_charge"
|
| 130 |
+
}
|
| 131 |
+
```
|
| 132 |
+
|
| 133 |
+
### How the playground works
|
| 134 |
+
|
| 135 |
+
If `/web` is enabled, the playground lets you send one action at a time.
|
| 136 |
+
|
| 137 |
+
- Start with `Reset`.
|
| 138 |
+
- Enter the action fields for the next step.
|
| 139 |
+
- Use `Get state` to inspect internal progress.
|
| 140 |
+
- Keep stepping until you submit a final resolution or run out of steps.
|
| 141 |
+
|
| 142 |
+
The observation will show:
|
| 143 |
+
|
| 144 |
+
- which evidence you have already collected
|
| 145 |
+
- the last tool result
|
| 146 |
+
- any action validation error
|
| 147 |
+
- your current partial score
|
| 148 |
+
- how many steps remain
|
| 149 |
+
|
| 150 |
## Reward design
|
| 151 |
|
| 152 |
The reward is shaped over the full trajectory:
|