Training details

by goodman2001 - opened 22 days ago

goodman2001

The openbmb/VisRAG-Ret-Train-Synthetic-data has 160GB; Can only two RTX A100 be enough to finish the full train process?
May I ask what hardware configuration details you used during the training phase? Did you complete the model training process in a single stage or two stages?

athrael-soju

Owner 22 days ago

Hey,

For the dataset, you don't need to load all 160GB into memory at once, streaming works fine. 2x A100 (80GB) should be sufficient with gradient accumulation. The model was trained in multiple stages (LoRA fine-tuning, seed averaging, hard negative retraining, domain specialization), using B200.

Details are in the model card.

goodman2001

21 days ago

Hey,

For the dataset, you don't need to load all 160GB into memory at once, streaming works fine. 2x A100 (80GB) should be sufficient with gradient accumulation. The model was trained in multiple stages (LoRA fine-tuning, seed averaging, hard negative retraining, domain specialization), using B200.

Details are in the model card.

thank you~

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment