refactor: Task3 reward model changed, agent adjusted for new model 48661cd ajaxwin commited on 3 days ago