Re-eval with max_new_tokens=256 fix; n=5 per condition 013746e verified InosLihka commited on 15 days ago
SFT prime (1 epochs, lora r=16) on teacher trajectories fbbaa22 verified InosLihka commited on 15 days ago