train(grpo): unified hint prompt, no-history chat, positive-advantage filter 3326716 vaibhav12332112312 commited on about 1 month ago
fix: align notebook with 15-day horizon, drop unused replies field f7b5241 vaibhav12332112312 commited on about 1 month ago
Set TASK_HORIZON to 15 days and align graders, UI, and training prompts. 99717c2 anuragredbus commited on about 1 month ago
fix(env): tolerate malformed predict_engagement scheduled_actions 4bfe286 vaibhav12332112312 commited on about 1 month ago
train: shrink to weekly horizon + bounded steps abe4587 vaibhav12332112312 commited on about 1 month ago
train: per-step credit + drop replies + larger batches 9ee7a09 vaibhav12332112312 commited on about 1 month ago