Spaces:
Sleeping
Sleeping
Upload flatmate_rl.md
Browse files- flatmate_rl.md +4 -0
flatmate_rl.md
CHANGED
|
@@ -257,6 +257,10 @@ Bad behavior gets penalized:
|
|
| 257 |
|
| 258 |
Over time, the overall loss should go down because the agent makes fewer workflow errors. In this environment, lower error means fewer invalid tool calls, fewer missed confirmations, fewer bad slot choices, and more completed bookings or deals.
|
| 259 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 260 |
## What Success Looks Like
|
| 261 |
|
| 262 |
A strong Flatmate RL agent should feel less like a chatbot and more like an operational assistant.
|
|
|
|
| 257 |
|
| 258 |
Over time, the overall loss should go down because the agent makes fewer workflow errors. In this environment, lower error means fewer invalid tool calls, fewer missed confirmations, fewer bad slot choices, and more completed bookings or deals.
|
| 259 |
|
| 260 |
+

|
| 261 |
+
|
| 262 |
+
This is the broker getting better over time: as training progresses, it makes fewer avoidable mistakes and becomes more reliable at moving a flatmate search from messy inputs to valid next steps.
|
| 263 |
+
|
| 264 |
## What Success Looks Like
|
| 265 |
|
| 266 |
A strong Flatmate RL agent should feel less like a chatbot and more like an operational assistant.
|