Polygl0t/LilTii-v0.1
Text Generation β’ 0.7B β’ Updated β’ 14
A 0.6B Bengali Language Model that Outperforms Qwen.
Note π§± Base model pretrained only with Bengali text.
Note π§± Base model pretrained with a Bengali + English mixture.
Note π Pretraining dataset.
Note π Annotations to train classifiers/filters (Educational).
Note π Annotations to train classifiers/filters (Toxicity).
Note π― Quality Filter (Educational)
Note π― Quality Filter (Toxicity)
Note π Data used to train the LilTii tokenizer.