Pclanglais commited on
Commit
3b2902b
·
verified ·
1 Parent(s): 16f5348

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -3
README.md CHANGED
@@ -17,9 +17,7 @@ metrics:
17
 
18
  # CommonLingua
19
 
20
- CommonLingua is a 2.35 million-parameters language identification model trained on 2,482,568 paragraphs from Structured Wikipedia and [Common Corpus](https://huggingface.co/datasets/PleIAs/common_corpus) trained by Pleias in partnership with the GSMA's "AI Language Models in Africa, by Africa, for Africa" initiative.
21
-
22
- As of 2026, CommonLingua is the best performing model on the CommonLID benchmark with significant gains over the previous SOTA.
23
 
24
  CommonLingua is based on a byte-level hybrid architecture combining three conv1D layers with an attention layer. It was originally designed for large scale classification of pretraining data and intently trained on diverse data sources, especially realistic documents with OCR errors as well as a a particular focus on the long tail — 61 African languages are supported, including languages with almost no coverage.
25
 
 
17
 
18
  # CommonLingua
19
 
20
+ CommonLingua is a 2.35 million-parameters language identification model trained on 2,482,568 paragraphs from Structured Wikipedia and [Common Corpus](https://huggingface.co/datasets/PleIAs/common_corpus) trained by Pleias in partnership with the GSMA's "AI Language Models in Africa, by Africa, for Africa" initiative. As of 2026, CommonLingua is the best performing model on the CommonLID benchmark with significant gains over the previous baseline.
 
 
21
 
22
  CommonLingua is based on a byte-level hybrid architecture combining three conv1D layers with an attention layer. It was originally designed for large scale classification of pretraining data and intently trained on diverse data sources, especially realistic documents with OCR errors as well as a a particular focus on the long tail — 61 African languages are supported, including languages with almost no coverage.
23