K446's picture
Fix OOM: reduce batch/gen/tokens, add grad checkpointing + adafactor
c09f4cb