Polish for hackathon submission: training evidence, two pipelines, UI, docs e81353d K446 commited on 12 days ago
Replace env-simulation reward with fast pure-heuristic to fix hang efbeb4b K446 commited on 12 days ago
Fix GRPO training: reward variance, batch/gen alignment, generation config e1ab78c K446 commited on 12 days ago
Update run_training.py and train_grpo.py, remove Dockerfile.training 7be88b4 K446 commited on 12 days ago
fix: notebook uses compute_grpo_reward_env, updated hyperparams, no emojis 69bab30 K446 commited on 12 days ago