Add LEARNING.md β all mistakes, war stories, examples, and principles from NSGF++ reproduction c95af57 verified rogermt commited on 13 days ago
Split SKILL.md into SKILL.md (rules) + LEARNING.md (stories/mistakes) + TODO.md (next steps)" d6ef77d verified rogermt commited on 13 days ago
SKILL.md: Add VRAM estimation, checkpointing, multi-session, multi-GPU lessons from CIFAR OOM 66d3632 verified rogermt commited on 13 days ago
config.yaml: CIFAR sinkhorn batch 128β32 for T4, pool batches 2500β10000 to compensate 80b1d4b verified rogermt commited on 13 days ago
main.py: --resume-phase, --checkpoint-dir, --sinkhorn-batch flags a365009 verified rogermt commited on 13 days ago
Add checkpointing, resume, CIFAR OOM fix, --sinkhorn-batch flag 8b62ba9 verified rogermt commited on 13 days ago
Update SKILL.md with DataLoader batch size mismatch bug and shared state pitfall d5802bb verified rogermt commited on 13 days ago
--train-iters now also overrides time_predictor.num_iterations 7921a70 verified rogermt commited on 13 days ago
Fix DataLoader batch size mismatch across training phases + --train-iters now overrides all phases 9e3fccc verified rogermt commited on 13 days ago
Add pool.finalize() call after building trajectory pool for O(1) sampling 376238e verified rogermt commited on 13 days ago
Fix geomloss tensor shape bug for images + optimize pool sampling 3e32ac2 verified rogermt commited on 13 days ago
Add SKILL.md β paper reproduction skill with lessons from NSGF++ implementation 88f3058 verified rogermt commited on 13 days ago