improve: abstention penalty, better prompt, mixed curriculum, more steps 253d1ff Jayant-Kernel commited on 12 days ago
evaluate: switch to 0.5B model comparison, 200 episodes 6b64fd2 Jayant-Kernel commited on 13 days ago
fix: parse_action confidence bug, numeric answers bug, missing reasoning field bug 66bdd16 Jayant-Kernel commited on 13 days ago
add: evaluate 1.5B base vs trained, upload chart to HF Hub 77e0352 Jayant-Kernel commited on 13 days ago
update: evaluate retrained model, upload charts to HF Hub b84ec51 unverified Jayant-Kernel commited on 13 days ago
update: evaluate on 200 episodes each for more reliable results a178a66 unverified Jayant-Kernel commited on 13 days ago
fix: remove erroneous del model outside evaluate_model scope c6e06ad unverified Jayant-Kernel commited on 13 days ago
fix: free GPU memory between model evaluations 4c67564 unverified Jayant-Kernel commited on 13 days ago
add: evaluation script - base vs trained model comparison 8fb443c unverified Jayant-Kernel commited on 13 days ago