docs: clarify scenario count, OPENGRID_MODE flag; drop runtime/epoch info 2f2ff77 K446 commited on 12 days ago
Polish for hackathon submission: training evidence, two pipelines, UI, docs e81353d K446 commited on 12 days ago
Fix health check timeout: start UI server in background before training 89992e4 K446 commited on 12 days ago
Reduce prompt/completion length to fix silent OOM on backward pass a76abcc K446 commited on 12 days ago
Replace env-simulation reward with fast pure-heuristic to fix hang efbeb4b K446 commited on 12 days ago
Fix GRPO training: reward variance, batch/gen alignment, generation config e1ab78c K446 commited on 12 days ago
Update run_training.py and train_grpo.py, remove Dockerfile.training 7be88b4 K446 commited on 12 days ago
Add pre-train gen sanity check, explicit GenerationConfig, dynamic GRPOConfig params, torch_compile/vllm off a6ecb81 K446 commited on 12 days ago
QLoRA best practices: prepare_model_for_kbit_training, paged_adamw_8bit, cosine LR, faster iteration 8dab919 K446 commited on 12 days ago
Fix: enable_input_require_grads for gradient checkpointing + 4-bit c505237 K446 commited on 12 days ago
Fix OOM: reduce batch/gen/tokens, add grad checkpointing + adafactor c09f4cb K446 commited on 12 days ago
Drop unsloth: use standard bitsandbytes 4-bit + peft LoRA + TRL GRPOTrainer 6072ace K446 commited on 12 days ago
Pin transformers <4.52 and unsloth_zoo==2025.11.1 for API compat b724812 K446 commited on 12 days ago
Dynamic NVIDIA lib path discovery in entrypoint for bitsandbytes c7e8b79 K446 commited on 12 days ago
Add LD_LIBRARY_PATH for pip-installed NVIDIA libs (bitsandbytes fix) f9c90fc K446 commited on 12 days ago
Remove torchao entirely - transformers handles absence gracefully d4e5470 K446 commited on 12 days ago
Fix: install torchao 0.8.0 separately, unsloth --no-deps to avoid torchao>=0.13 conflict f4d773c K446 commited on 12 days ago
Pin compatible versions: torch 2.6.0 + torchao <0.9 + transformers <5.0 9b70933 K446 commited on 12 days ago
Update CUDA to 12.4.1 and unpin PyTorch version to fix torchao/int1 compatibility 371b620 K446 commited on 12 days ago
Remove --no-deps to allow installing sub-dependencies like regex 3c0ad6e K446 commited on 12 days ago
fix: notebook uses compute_grpo_reward_env, updated hyperparams, no emojis 69bab30 K446 commited on 12 days ago
fix: add unsloth back with pinned versions to avoid dep backtracking 689cb35 K446 commited on 12 days ago
fix: unified Dockerfile with entrypoint for server/training mode 1dfed79 K446 commited on 12 days ago