prashantmatlani commited on
Commit
f619d4c
Β·
1 Parent(s): 014ab20

updated README.md, .yaml

Browse files
Files changed (2) hide show
  1. README.md +151 -0
  2. openenv.yaml +14 -1
README.md CHANGED
@@ -41,6 +41,157 @@ Build and evaluate an agent that can:
41
 
42
  ## πŸ—οΈ System Architecture
43
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
  ### 1. Environment (`env.py`)
45
 
46
  A **stateful, stochastic simulation** of customer support operations.
 
41
 
42
  ## πŸ—οΈ System Architecture
43
 
44
+ +----------------------+
45
+ | Customer Ticket |
46
+ | (noisy, ambiguous) |
47
+ +----------+-----------+
48
+ |
49
+ v
50
+ +----------------------+
51
+ | Environment (env.py)|
52
+ |----------------------|
53
+ | - State |
54
+ | - Reward |
55
+ | - Stochasticity |
56
+ +----------+-----------+
57
+ |
58
+ v
59
+ +----------------------+
60
+ | Observation Space |
61
+ |----------------------|
62
+ | message |
63
+ | known_info |
64
+ | required |
65
+ +----------+-----------+
66
+ |
67
+ v
68
+ +----------------------+
69
+ | Agent (LLM + Rule) |
70
+ |----------------------|
71
+ | - Reasoning (LLM) |
72
+ | - Constraints |
73
+ | - Fallback |
74
+ +----------+-----------+
75
+ |
76
+ v
77
+ +----------------------+
78
+ | Action |
79
+ |----------------------|
80
+ | classify |
81
+ | ask_info |
82
+ | resolve |
83
+ +----------+-----------+
84
+ |
85
+ v
86
+ +----------------------+
87
+ | Environment Step |
88
+ |----------------------|
89
+ | reward |
90
+ | next_state |
91
+ +----------------------+
92
+
93
+
94
+ ## Interaction Loop
95
+
96
+ RESET β†’ OBSERVE β†’ ACT β†’ STEP β†’ REPEAT
97
+
98
+ Detailed Flow:
99
+
100
+ [RESET]
101
+ ↓
102
+ [Observation]
103
+ ↓
104
+ [Agent Decision]
105
+ ↓
106
+ [Action]
107
+ ↓
108
+ [Environment Step]
109
+ ↓
110
+ [Reward + Next State]
111
+ ↓
112
+ [Done?] ── No ──> Loop
113
+ β”‚
114
+ Yes
115
+ ↓
116
+ [Episode End]
117
+
118
+
119
+ ## Self-Correction Loop
120
+
121
+ Initial Flow:
122
+ classify β†’ ask_info β†’ resolve
123
+
124
+ With Self-Correction:
125
+
126
+ classify
127
+ ↓
128
+ ask_info
129
+ ↓
130
+ [New Information Arrives]
131
+ ↓
132
+ re-evaluate decision
133
+ ↓
134
+ re-classify (if needed)
135
+ ↓
136
+ ask remaining info
137
+ ↓
138
+ resolve
139
+
140
+ ## Agent Decision Logic
141
+
142
+ IF not classified:
143
+ β†’ classify
144
+
145
+ ELIF missing required fields:
146
+ β†’ ask_info
147
+
148
+ ELIF uncertain:
149
+ β†’ re-classify
150
+
151
+ ELSE:
152
+ β†’ resolve
153
+
154
+ ## Stochastic Behavior
155
+
156
+ Customer Message =
157
+ base_variant
158
+ + noise injection
159
+ + ambiguity
160
+
161
+ Required Info =
162
+ full_schema
163
+ - randomly masked fields
164
+
165
+ Difficulty Controls:
166
+ EASY β†’ low noise, clear signals
167
+ MEDIUM β†’ moderate noise
168
+ HARD β†’ high ambiguity + missing info
169
+
170
+ ## Reward Flow
171
+
172
+ Action β†’ Immediate Reward β†’ Final Outcome
173
+
174
+ Examples:
175
+
176
+ ask_info (useful) β†’ +0.3
177
+ repeat ask β†’ -0.3
178
+ step penalty β†’ -0.05
179
+ correct classify β†’ +0.2
180
+ premature resolve β†’ -1.0 (hard)
181
+ successful resolve β†’ +0.2 to +1.0
182
+
183
+ ## Example Episode
184
+
185
+ Step 1: classify β†’ reward -0.05
186
+ Step 2: ask_info β†’ reward +0.20
187
+ Step 3: re-classify β†’ reward -0.05
188
+ Step 4: resolve β†’ reward +0.45
189
+
190
+ Outcome:
191
+ βœ” success
192
+ βœ” self-correction observed
193
+ βœ” efficient resolution
194
+
195
  ### 1. Environment (`env.py`)
196
 
197
  A **stateful, stochastic simulation** of customer support operations.
openenv.yaml CHANGED
@@ -1,4 +1,17 @@
1
- name: customer-support-agent
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  description: >
3
  A goal-oriented customer support environment where an agent must gather
4
  required information from the user and resolve the ticket efficiently.
 
1
+
2
+ ---
3
+ title: Customer Support OpenEnv Environment
4
+ emoji: πŸ€–
5
+ colorFrom: blue
6
+ colorTo: green
7
+ sdk: docker
8
+ tags:
9
+ - openenv
10
+ - reinforcement-learning
11
+ - llm
12
+ - customer-support
13
+ ---
14
+
15
  description: >
16
  A goal-oriented customer support environment where an agent must gather
17
  required information from the user and resolve the ticket efficiently.