Nemotron-Personas Collection A collection of multilingual, region-specific synthetic persona datasets that support sovereign AI development across many countries and regions. • 6 items • Updated 6 days ago • 37
PGC Psychiatric GWAS Summary Statistics Collection ~1 billion rows of genome-wide association study (GWAS) NOTE: We are in the process to transfer these datasets to the Psychiatric Genomics Consortiu • 12 items • Updated 2 days ago • 76
view article Article Fine-Tuning Your First Large Language Model (LLM) with PyTorch and Hugging Face Feb 11, 2025 • 116
view article Article Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries +7 Mar 10 • 123
view article Article Nemotron-Personas: Improve AI Training With the First Synthetic Personas Dataset Aligned to Real-World Distributions Jun 10, 2025 • 25
Spanish PII & De-Identification Collection 33 models for Spanish PII detection & de-identification. 55+ entity types. HIPAA & GDPR compliant. Apache 2.0. • 35 items • Updated Feb 17 • 4
💧 LFM2.5 Collection Collection of post-trained and base LFM2.5 models. • 30 items • Updated 4 days ago • 126
Multilingual PII & De-Identification Collection Multilingual models for extracting PII entities and de-identifying clinical text, with support for HIPAA and GDPR compliance. • 245 items • Updated Mar 10 • 22