Commit History

Add data_pipeline.py — tokenize_user_sequences, pack_sequences, prepare_clm_dataset
1dfd4e2
verified

rtferraz commited on

Phase 2C: Pre-training pipeline — data pipeline, sequence packing, HF Trainer CLM, 124 total tests passing
28118c7
verified

rtferraz commited on