Clove25 commited on
Commit
7d581e2
·
verified ·
1 Parent(s): a60dfa7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +86 -0
README.md CHANGED
@@ -61,6 +61,92 @@ The typed `ToolUseObservation` includes:
61
 
62
  The typed `ToolUseState` exposes internal progress such as `final_score`, `drafted_reply`, `resolution_code`, `required_evidence`, `collected_evidence`, and action history.
63
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
64
  ## Reward design
65
 
66
  The reward is shaped over the full trajectory:
 
61
 
62
  The typed `ToolUseState` exposes internal progress such as `final_score`, `drafted_reply`, `resolution_code`, `required_evidence`, `collected_evidence`, and action history.
63
 
64
+ ## How to use
65
+
66
+ Each episode is a support case. The agent should usually follow this flow:
67
+
68
+ 1. Read the customer ticket.
69
+ 2. Inspect the relevant business artifacts.
70
+ 3. Look up the matching policy.
71
+ 4. Draft a customer-facing reply.
72
+ 5. Submit the final resolution code.
73
+
74
+ ### What each action field means
75
+
76
+ - `action_type`
77
+ The operation you want the environment to perform.
78
+ - `artifact_id`
79
+ The internal record you want to inspect. Examples: `order`, `payment`, `account`, `risk_log`.
80
+ - `query`
81
+ The policy lookup term. Examples: `damaged_items`, `duplicate_charge`, `account_takeover`.
82
+ - `message`
83
+ The reply draft that would be sent to the customer.
84
+ - `resolution_code`
85
+ The final case outcome you want to submit. Examples: `send_replacement`, `refund_duplicate_charge`, `lock_account_and_escalate_fraud`.
86
+
87
+ ### Typical action examples
88
+
89
+ Review the ticket:
90
+
91
+ ```json
92
+ {
93
+ "action_type": "review_ticket"
94
+ }
95
+ ```
96
+
97
+ Inspect an order record:
98
+
99
+ ```json
100
+ {
101
+ "action_type": "inspect_artifact",
102
+ "artifact_id": "order"
103
+ }
104
+ ```
105
+
106
+ Look up a policy:
107
+
108
+ ```json
109
+ {
110
+ "action_type": "search_policy",
111
+ "query": "duplicate_charge"
112
+ }
113
+ ```
114
+
115
+ Save a reply draft:
116
+
117
+ ```json
118
+ {
119
+ "action_type": "draft_reply",
120
+ "message": "We confirmed the duplicate charge and issued a refund. You should see it in 3-5 business days."
121
+ }
122
+ ```
123
+
124
+ Submit the final resolution:
125
+
126
+ ```json
127
+ {
128
+ "action_type": "submit_resolution",
129
+ "resolution_code": "refund_duplicate_charge"
130
+ }
131
+ ```
132
+
133
+ ### How the playground works
134
+
135
+ If `/web` is enabled, the playground lets you send one action at a time.
136
+
137
+ - Start with `Reset`.
138
+ - Enter the action fields for the next step.
139
+ - Use `Get state` to inspect internal progress.
140
+ - Keep stepping until you submit a final resolution or run out of steps.
141
+
142
+ The observation will show:
143
+
144
+ - which evidence you have already collected
145
+ - the last tool result
146
+ - any action validation error
147
+ - your current partial score
148
+ - how many steps remain
149
+
150
  ## Reward design
151
 
152
  The reward is shaped over the full trajectory: