rtferraz's picture
Phase 2C: Pre-training pipeline — data pipeline, sequence packing, HF Trainer CLM, 124 total tests passing
28118c7 verified
raw
history blame
315 Bytes
"""
Training utilities for domainTokenizer.
- data_pipeline: tokenize_user_sequences, pack_sequences, prepare_clm_dataset
- pretrain: pretrain_domain_model
"""
from .data_pipeline import (
tokenize_user_sequences,
pack_sequences,
prepare_clm_dataset,
)
from .pretrain import pretrain_domain_model