Phase 2C: Pre-training pipeline — data pipeline, sequence packing, HF Trainer CLM, 124 total tests passing 28118c7 verified rtferraz commited on 11 days ago