Thank you for your contribution

#1
by wilfredomartel - opened

Hi,
I just want to thank you for your excellent work and your valuable contribution to the comunity.

Language Technologies Laboratory @ Barcelona Supercomputing Center org

Thanks! We hope you find them useful, please don't hesitate to reach out if you have any more questions.

dtamayo changed discussion status to closed

How much time does it take to train a model like this ? how many A100 or H100 are needed for continual training considering 50gb of legal documents ?

Language Technologies Laboratory @ Barcelona Supercomputing Center org
edited Mar 19

It will mainly depend on how many tokens you extract from those 50GB (let us say ~10B tokens) and your setup (e.g., mlm_probability, data packing, hardware efficiency, pre-training code).

As a reference point: on a single node with 4xH100 of 64GB, we observed ~450,000 tokens/sec with mlm_probability=0.3 (lower values will train faster). That translates to roughly ~6 hours per epoch over the dataset. On a single GPU, expect to take about ~4x longer.

In practice, you don't need a large cluster for this, 1 A100/H100 is sufficient for that dataset.

Sign up or log in to comment