vpelloin commited on
Commit
99f2833
·
verified ·
1 Parent(s): 33a308f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -7
README.md CHANGED
@@ -3,9 +3,14 @@ license: cc-by-nc-sa-2.0
3
  language:
4
  - fr
5
  pipeline_tag: feature-extraction
6
- library_name: fairseq
7
  datasets:
8
  - oscar-corpus/oscar
 
 
 
 
 
 
9
  ---
10
 
11
  # Pantagruel: Unified Self-Supervised Encoders for French Text and Speech
@@ -29,14 +34,14 @@ Pantagruel text encoders are trained on large-scale French text corpora, includi
29
 
30
  The table below presents the accuracy of the natural language inference task on the French XNLI dataset.
31
 
32
- | **HuggingFace name**| **Model name (paper)** | **Arch/ Params** | **Pretrained dataset** | **Accuracy on XNLI (FR) (dev / test)** |
33
  |----------|------------------------|-----------------|----------------------|---------------------------------------|
34
- | text-base-camtok-wiki | Pantagruel-B-camtok-Wk | Base / 110M | French Wikipedia 2019 (4GB) | 76.94% / 77.43% |
35
- | text-base-wiki | Pantagruel-B-Wk | Base / 125M | French Wikipedia 2019 (4GB) | 77.40% / 78.41% |
36
- | text-base-wiki-mlm | Pantagruel-B-Wk-MLM | Base / 125M | French Wikipedia 2019 (4GB) | 78.25% / 78.41% |
37
- | text-base-camtok-oscar | Pantagruel-B-camtok-Osc | Base / 110M | OSCAR 2019 (138GB) | 80.40% / 80.53% |
38
  | text-base-oscar-mlm | Pantagruel-B-Osc-MLM | Base / 125M | OSCAR 2019 (138GB) | 81.11% / 81.52% |
39
- | text-base-croissant-mlm | Pantagruel-B-Crs-MLM | Base / 125M | croissantLLM (1.5GB) | 81.05% / 80.69% |
40
 
41
  For more downstream tasks and evaluation datasets, please refer to [our paper](https://arxiv.org/abs/2601.05911).
42
 
 
3
  language:
4
  - fr
5
  pipeline_tag: feature-extraction
 
6
  datasets:
7
  - oscar-corpus/oscar
8
+ library_name: transformers
9
+ tags:
10
+ - data2vec2
11
+ - JEPA
12
+ - text
13
+ - fairseq
14
  ---
15
 
16
  # Pantagruel: Unified Self-Supervised Encoders for French Text and Speech
 
34
 
35
  The table below presents the accuracy of the natural language inference task on the French XNLI dataset.
36
 
37
+ | **HuggingFace name**| **Model name (paper)** | **Arch/ Params** | **Pretrained dataset** | **Accuracy on XNLI (FR) (dev / test)** |
38
  |----------|------------------------|-----------------|----------------------|---------------------------------------|
39
+ | [text-base-camtok-wiki](https://huggingface.co/PantagrueLLM/text-base-camtok-wiki) | Pantagruel-B-camtok-Wk | Base / 110M | French Wikipedia 2019 (4GB) | 76.94% / 77.43% |
40
+ | [text-base-wiki](https://huggingface.co/PantagrueLLM/text-base-wiki) | Pantagruel-B-Wk | Base / 125M | French Wikipedia 2019 (4GB) | 77.40% / 78.41% |
41
+ | [text-base-wiki-mlm](https://huggingface.co/PantagrueLLM/text-base-wiki-mlm) | Pantagruel-B-Wk-MLM | Base / 125M | French Wikipedia 2019 (4GB) | 78.25% / 78.41% |
42
+ | [text-base-camtok-oscar](https://huggingface.co/PantagrueLLM/text-base-camtok-oscar) | Pantagruel-B-camtok-Osc | Base / 110M | OSCAR 2019 (138GB) | 80.40% / 80.53% |
43
  | text-base-oscar-mlm | Pantagruel-B-Osc-MLM | Base / 125M | OSCAR 2019 (138GB) | 81.11% / 81.52% |
44
+ | [text-base-croissant-mlm](https://huggingface.co/PantagrueLLM/text-base-croissant-mlm) | Pantagruel-B-Crs-MLM | Base / 125M | croissantLLM (1.5GB) | 81.05% / 80.69% |
45
 
46
  For more downstream tasks and evaluation datasets, please refer to [our paper](https://arxiv.org/abs/2601.05911).
47