train: higher LR=1e-5, more SFT=20, lower temp=1.0->0.7 23c3a1b YashashMathur commited on 13 days ago
fix: C-4 reward clamp, C-6 HF_TOKEN, W-2 citation floor, W-9 dup import b022bda verified YashashMathur commited on 14 days ago
Upload hf_training/train.py with huggingface_hub c51cef7 verified YashashMathur commited on 14 days ago