Exact Training data used?
#1
by nlpguy - opened
Thanks for this amazing model. Is there an exact breakdown by source of the 1T Tokens used for training, or is there a specific collection of public corpuses that were used available?
psinger changed discussion status to closed