Fix trl/pytorch version incompatibility + indentation bugs 4ef2798 Imsachin010 commited on 8 days ago
Fix indentation bug in grpo_train.py + update requirements.txt f5051d6 Imsachin010 commited on 8 days ago
Fix FP16 AMP crash by explicitly loading base model in float32 for fallback hardware 876b380 Imsachin010 commited on 12 days ago
Fix BFloat16 AMP crash by explicitly casting to float16 during fallback loading 1141c48 Imsachin010 commited on 12 days ago
Fix GRPOConfig __post_init__ crash by ensuring batch_size matches num_generations 612fcba Imsachin010 commited on 12 days ago
feat: scale up to Qwen2.5-7B, set GRPO steps to 150 for health check, add HF push cell 0557d58 Imsachin010 commited on 12 days ago
fix: save reward_history.txt from GRPO trainer logs after --mode grpo 439ffff Imsachin010 commited on 12 days ago
fix: add training dir to sys.path so -m training.test_rollout works on Colab 9f6f68c Imsachin010 commited on 12 days ago
fix: colab working dir bug, rollout sys.path, openenv imports, add plot_rewards ae60795 Imsachin010 commited on 12 days ago