anugrah55's picture
Overhaul trainer: TRL GRPO with env-backed reward, Qwen2.5-0.5B 4bit+LoRA, slim PyTorch CUDA base, heartbeat HTTP for HF Spaces health probe
d597642 verified
download
history contribute delete
11.7 kB
This file contains binary data. It cannot be displayed, but you can still download it.