Training details

#1
by goodman2001 - opened

The openbmb/VisRAG-Ret-Train-Synthetic-data has 160GB; Can only two RTX A100 be enough to finish the full train process?
May I ask what hardware configuration details you used during the training phase? Did you complete the model training process in a single stage or two stages?

Hey,

For the dataset, you don't need to load all 160GB into memory at once, streaming works fine. 2x A100 (80GB) should be sufficient with gradient accumulation. The model was trained in multiple stages (LoRA fine-tuning, seed averaging, hard negative retraining, domain specialization), using B200.

Details are in the model card.

Hey,

For the dataset, you don't need to load all 160GB into memory at once, streaming works fine. 2x A100 (80GB) should be sufficient with gradient accumulation. The model was trained in multiple stages (LoRA fine-tuning, seed averaging, hard negative retraining, domain specialization), using B200.

Details are in the model card.

thank you~

Sign up or log in to comment