Polish for hackathon submission: training evidence, two pipelines, UI, docs e81353d K446 commited on 12 days ago
Fix health check timeout: start UI server in background before training 89992e4 K446 commited on 12 days ago
Reduce prompt/completion length to fix silent OOM on backward pass a76abcc K446 commited on 12 days ago
Replace env-simulation reward with fast pure-heuristic to fix hang efbeb4b K446 commited on 12 days ago
Fix GRPO training: reward variance, batch/gen alignment, generation config e1ab78c K446 commited on 12 days ago
Update run_training.py and train_grpo.py, remove Dockerfile.training 7be88b4 K446 commited on 12 days ago
Add pre-train gen sanity check, explicit GenerationConfig, dynamic GRPOConfig params, torch_compile/vllm off a6ecb81 K446 commited on 12 days ago
QLoRA best practices: prepare_model_for_kbit_training, paged_adamw_8bit, cosine LR, faster iteration 8dab919 K446 commited on 12 days ago
Fix: enable_input_require_grads for gradient checkpointing + 4-bit c505237 K446 commited on 12 days ago
Fix OOM: reduce batch/gen/tokens, add grad checkpointing + adafactor c09f4cb K446 commited on 12 days ago
Drop unsloth: use standard bitsandbytes 4-bit + peft LoRA + TRL GRPOTrainer 6072ace K446 commited on 12 days ago