LilTii - a Polygl0t Collection

Polygl0t 's Collections

ViTucano-v1 (Portuguese)

Tucano (Portuguese)

TeenyTinyLlama (Portuguese)

LilTii

updated Mar 5

A 0.6B Bengali Language Model that Outperforms Qwen.

Polygl0t/LilTii-v0.1

Text Generation • 0.7B • Updated Mar 5 • 14

Note 🧱 Base model pretrained only with Bengali text.
Polygl0t/LilTii-v0.2

Text Generation • 0.7B • Updated Mar 5 • 24

Note 🧱 Base model pretrained with a Bengali + English mixture.
Polygl0t/gigakriya-v1

Viewer • Updated Mar 5 • 41.6M • 80

Note 📚 Pretraining dataset.
Polygl0t/bengali-edu-qwen-annotations

Viewer • Updated Mar 5 • 320k • 30

Note 📚 Annotations to train classifiers/filters (Educational).
Polygl0t/bengali-toxicity-qwen-annotations

Viewer • Updated Mar 5 • 320k • 26

Note 📚 Annotations to train classifiers/filters (Toxicity).
Polygl0t/bengali-banglabert-edu-classifier

Text Classification • 34.7M • Updated Mar 5 • 2

Note 🎯 Quality Filter (Educational)
Polygl0t/bengali-banglabert-toxicity-classifier

Text Classification • 34.7M • Updated Mar 5

Note 🎯 Quality Filter (Toxicity)
Polygl0t/tokenizers

Viewer • Updated Mar 5 • 8.98M • 717

Note 📚 Data used to train the LilTii tokenizer.