Thank you for your contribution

by wilfredomartel - opened Mar 18

Discussion

wilfredomartel

Mar 18

Hi,
I just want to thank you for your excellent work and your valuable contribution to the comunity.

dtamayo

Language Technologies Laboratory @ Barcelona Supercomputing Center org Mar 18

Thanks! We hope you find them useful, please don't hesitate to reach out if you have any more questions.

dtamayo changed discussion status to closed Mar 18

wilfredomartel

Mar 19

How much time does it take to train a model like this ? how many A100 or H100 are needed for continual training considering 50gb of legal documents ?

dtamayo

Language Technologies Laboratory @ Barcelona Supercomputing Center org Mar 19

•

edited Mar 19

It will mainly depend on how many tokens you extract from those 50GB (let us say ~10B tokens) and your setup (e.g., mlm_probability, data packing, hardware efficiency, pre-training code).

As a reference point: on a single node with 4xH100 of 64GB, we observed ~450,000 tokens/sec with mlm_probability=0.3 (lower values will train faster). That translates to roughly ~6 hours per epoch over the dataset. On a single GPU, expect to take about ~4x longer.

In practice, you don't need a large cluster for this, 1 A100/H100 is sufficient for that dataset.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment