kushalExplores commited on
Commit
56b6e5a
·
verified ·
1 Parent(s): 05f9611

Upload flatmate_rl.md

Browse files
Files changed (1) hide show
  1. flatmate_rl.md +4 -0
flatmate_rl.md CHANGED
@@ -257,6 +257,10 @@ Bad behavior gets penalized:
257
 
258
  Over time, the overall loss should go down because the agent makes fewer workflow errors. In this environment, lower error means fewer invalid tool calls, fewer missed confirmations, fewer bad slot choices, and more completed bookings or deals.
259
 
 
 
 
 
260
  ## What Success Looks Like
261
 
262
  A strong Flatmate RL agent should feel less like a chatbot and more like an operational assistant.
 
257
 
258
  Over time, the overall loss should go down because the agent makes fewer workflow errors. In this environment, lower error means fewer invalid tool calls, fewer missed confirmations, fewer bad slot choices, and more completed bookings or deals.
259
 
260
+ ![SFT training loss trending down over time](screenshot.png)
261
+
262
+ This is the broker getting better over time: as training progresses, it makes fewer avoidable mistakes and becomes more reliable at moving a flatmate search from messy inputs to valid next steps.
263
+
264
  ## What Success Looks Like
265
 
266
  A strong Flatmate RL agent should feel less like a chatbot and more like an operational assistant.