PantagrueLLM
/

Text_Base_FR_OSCAR

Feature Extraction

Model card Files Files and versions

vpelloin commited on Jan 22

Commit

99f2833

·

verified ·

1 Parent(s): 33a308f

Update README.md

Files changed (1) hide show

README.md +12 -7

README.md CHANGED Viewed

@@ -3,9 +3,14 @@ license: cc-by-nc-sa-2.0
 language:
 - fr
 pipeline_tag: feature-extraction
-library_name: fairseq
 datasets:
 - oscar-corpus/oscar
 ---
 # Pantagruel: Unified Self-Supervised Encoders for French Text and Speech
@@ -29,14 +34,14 @@ Pantagruel text encoders are trained on large-scale French text corpora, includi
 The table below presents the accuracy of the natural language inference task on the French XNLI dataset.
-| **HuggingFace name**| **Model name (paper)** | **Arch/ Params** | **Pretrained dataset** | **Accuracy on XNLI (FR) (dev / test)** |
 |----------|------------------------|-----------------|----------------------|---------------------------------------|
-| text-base-camtok-wiki | Pantagruel-B-camtok-Wk | Base / 110M | French Wikipedia 2019 (4GB) | 76.94% / 77.43% |
-| text-base-wiki | Pantagruel-B-Wk | Base / 125M | French Wikipedia 2019 (4GB) | 77.40% / 78.41% |
-| text-base-wiki-mlm | Pantagruel-B-Wk-MLM | Base / 125M | French Wikipedia 2019 (4GB) | 78.25% / 78.41% |
-| text-base-camtok-oscar | Pantagruel-B-camtok-Osc | Base / 110M | OSCAR 2019 (138GB) | 80.40% / 80.53% |
 | text-base-oscar-mlm | Pantagruel-B-Osc-MLM | Base / 125M | OSCAR 2019 (138GB) | 81.11% / 81.52% |
-| text-base-croissant-mlm | Pantagruel-B-Crs-MLM | Base / 125M | croissantLLM (1.5GB) | 81.05% / 80.69% |
 For more downstream tasks and evaluation datasets, please refer to [our paper](https://arxiv.org/abs/2601.05911).

 language:
 - fr
 pipeline_tag: feature-extraction
 datasets:
 - oscar-corpus/oscar
+library_name: transformers
+tags:
+  - data2vec2
+  - JEPA
+  - text
+  - fairseq
 ---
 # Pantagruel: Unified Self-Supervised Encoders for French Text and Speech
 The table below presents the accuracy of the natural language inference task on the French XNLI dataset.
+| **HuggingFace name**| **Model name (paper)** | **Arch/ Params** | **Pretrained dataset** | **Accuracy on XNLI (FR) (dev / test)** |
 |----------|------------------------|-----------------|----------------------|---------------------------------------|
+| [text-base-camtok-wiki](https://huggingface.co/PantagrueLLM/text-base-camtok-wiki) | Pantagruel-B-camtok-Wk | Base / 110M | French Wikipedia 2019 (4GB) | 76.94% / 77.43% |
+| [text-base-wiki](https://huggingface.co/PantagrueLLM/text-base-wiki) | Pantagruel-B-Wk | Base / 125M | French Wikipedia 2019 (4GB) | 77.40% / 78.41% |
+| [text-base-wiki-mlm](https://huggingface.co/PantagrueLLM/text-base-wiki-mlm) | Pantagruel-B-Wk-MLM | Base / 125M | French Wikipedia 2019 (4GB) | 78.25% / 78.41% |
+| [text-base-camtok-oscar](https://huggingface.co/PantagrueLLM/text-base-camtok-oscar) | Pantagruel-B-camtok-Osc | Base / 110M | OSCAR 2019 (138GB) | 80.40% / 80.53% |
 | text-base-oscar-mlm | Pantagruel-B-Osc-MLM | Base / 125M | OSCAR 2019 (138GB) | 81.11% / 81.52% |
+| [text-base-croissant-mlm](https://huggingface.co/PantagrueLLM/text-base-croissant-mlm) | Pantagruel-B-Crs-MLM | Base / 125M | croissantLLM (1.5GB) | 81.05% / 80.69% |
 For more downstream tasks and evaluation datasets, please refer to [our paper](https://arxiv.org/abs/2601.05911).