Add data_pipeline.py — tokenize_user_sequences, pack_sequences, prepare_clm_dataset 1dfd4e2 verified rtferraz commited on 9 days ago
Phase 2C: Pre-training pipeline — data pipeline, sequence packing, HF Trainer CLM, 124 total tests passing 28118c7 verified rtferraz commited on 9 days ago