Metadata Conditioned LLMs Collection Pretraining Data: English NOW corpus (english-corpora.org/now). Paper: arxiv.org/abs/2601.15236. Code: github.com/iamshnoo/metadata_localization • 92 items • Updated 3 days ago
Metadata Conditioned LLMs Collection Pretraining Data: English NOW corpus (english-corpora.org/now). Paper: arxiv.org/abs/2601.15236. Code: github.com/iamshnoo/metadata_localization • 92 items • Updated 3 days ago
iamshnoo/combined_no_europe_without_metadata_1b_step8k Text Generation • 1B • Updated 12 days ago • 899
iamshnoo/combined_no_europe_without_metadata_1b_step4k Text Generation • 1B • Updated 12 days ago • 893
iamshnoo/combined_no_europe_without_metadata_1b_step2k Text Generation • 1B • Updated 12 days ago • 877
iamshnoo/combined_no_asia_without_metadata_1b_step8k Text Generation • 1B • Updated 12 days ago • 851
iamshnoo/combined_no_asia_without_metadata_1b_step4k Text Generation • 1B • Updated 12 days ago • 849
iamshnoo/combined_no_asia_without_metadata_1b_step2k Text Generation • 1B • Updated 12 days ago • 828
iamshnoo/combined_no_america_without_metadata_1b_step8k Text Generation • 1B • Updated 12 days ago • 807
iamshnoo/combined_no_america_without_metadata_1b_step4k Text Generation • 1B • Updated 12 days ago • 807
iamshnoo/combined_no_america_without_metadata_1b_step2k Text Generation • 1B • Updated 12 days ago • 799
iamshnoo/combined_no_africa_without_metadata_1b_step8k Text Generation • 1B • Updated 12 days ago • 794