Phase 2C: Pre-training pipeline — data pipeline, sequence packing, HF Trainer CLM, 124 total tests passing 28118c7 verified rtferraz commited on 9 days ago