speechbrain
Arabic
speech
ssl
arabic
dialect
HarounElleuch commited on
Commit
dff453e
·
verified ·
1 Parent(s): bda100f

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +150 -12
README.md CHANGED
@@ -1,12 +1,150 @@
1
- ---
2
- datasets:
3
- - Elyadata/Ara-Best-RQ_dataset
4
- language:
5
- - ar
6
- library_name: speechbrain
7
- tags:
8
- - speech
9
- - ssl
10
- - arabic
11
- - dialect
12
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - Elyadata/Ara-Best-RQ_dataset
4
+ language:
5
+ - ar
6
+ library_name: speechbrain
7
+ tags:
8
+ - speech
9
+ - ssl
10
+ - arabic
11
+ - dialect
12
+ ---
13
+
14
+ # Ara-BEST-RQ-300M-6k
15
+
16
+ **Ara-BEST-RQ-300M-6k** is a 300M-parameter self-supervised speech representation model for Arabic and Arabic dialects. It is part of the Ara-BEST-RQ family introduced in **[Ara-Best-RQ: Multi Dialectal Arabic SSL](https://arxiv.org/abs/2603.21900)**.
17
+
18
+ This model was pretrained on the **crawled Ara-BEST-RQ dataset**: approximately **5.6k hours** of Creative Commons Arabic speech collected from publicly available YouTube videos and segmented for self-supervised speech learning.
19
+
20
+ - **Paper:** [Ara-Best-RQ: Multi Dialectal Arabic SSL](https://arxiv.org/abs/2603.21900)
21
+ - **Dataset:** [Elyadata/Ara-Best-RQ_dataset](https://huggingface.co/datasets/Elyadata/Ara-Best-RQ_dataset)
22
+ - **Implementation:** [elyadata/AraBEST-RQ](https://github.com/elyadata/AraBEST-RQ)
23
+
24
+ ## Model Details
25
+
26
+ ### Model Description
27
+
28
+ Ara-BEST-RQ is a family of Arabic-focused self-supervised learning (SSL) speech models based on the BEST-RQ framework. The models are designed to learn speech representations that transfer well to Arabic speech processing tasks, including automatic speech recognition (ASR) and dialect identification (DID).
29
+
30
+ This checkpoint corresponds to the **300M** variant pretrained on the **crawled 6k-hour dataset**.
31
+
32
+ - **Model type:** Self-supervised speech representation model
33
+ - **Architecture:** Conformer-based BEST-RQ encoder
34
+ - **Parameters:** ~300M
35
+ - **Training data:** Crawled Arabic speech data
36
+ - **Training hours:** ~5,640 hours
37
+ - **Languages:** Arabic, including multiple dialects
38
+ - **Primary use:** Speech representation learning / downstream fine-tuning
39
+
40
+ ### Architecture
41
+
42
+ The 300M Ara-BEST-RQ model uses:
43
+
44
+ - 24 Conformer encoder layers
45
+ - Model dimension: 848
46
+ - 8 attention heads
47
+ - Feed-forward dimension: 2048
48
+ - GELU activations
49
+ - Relative position multi-head attention
50
+ - Convolutional front-end
51
+ - Random projection quantizer with 4096 codebook entries
52
+
53
+
54
+ ## Training Data
55
+
56
+ The model was pretrained on the crawled Ara-BEST-RQ dataset.
57
+
58
+ The released dataset on Hugging Face provides **metadata only**: YouTube video identifiers and audio segment boundaries. No audio or video files are distributed as part of the dataset.
59
+
60
+ Dataset link: [Elyadata/Ara-Best-RQ_dataset](https://huggingface.co/datasets/Elyadata/Ara-Best-RQ_dataset)
61
+
62
+ ## Evaluation
63
+
64
+ The paper evaluates Ara-BEST-RQ models on automatic speech recognition and dialect identification tasks. The following results are reported for the **Ara-BEST-RQ crawled 300M** model.
65
+
66
+ ### Automatic Speech Recognition
67
+
68
+ WER scores on ASR benchmarks:
69
+
70
+ | Dataset | WER |
71
+ |---|---:|
72
+ | Common Voice 19.0 Arabic | 18.67 |
73
+ | MGB-3 | 30.85 |
74
+ | MGB-5 | 54.18 |
75
+ | TARIC-SLU | 23.98 |
76
+ | Average | 31.92 |
77
+
78
+ ### Dialect Identification
79
+
80
+ Results on ADI-20:
81
+
82
+ | Split | Accuracy | Weighted F1 |
83
+ |---|---:|---:|
84
+ | Validation | 97.21 | 97.17 |
85
+ | Test | 96.02 | 95.98 |
86
+
87
+ ## Usage
88
+
89
+ This is a self-supervised pretrained model intended to be used as a speech encoder or as an initialization checkpoint for downstream fine-tuning.
90
+
91
+ For training and fine-tuning recipes, please refer to the official implementation:
92
+
93
+ ```bash
94
+ git clone https://github.com/elyadata/AraBEST-RQ
95
+ cd AraBEST-RQ
96
+ ```
97
+
98
+ You can download the checkpoint from Hugging Face using:
99
+
100
+ ```python
101
+ from huggingface_hub import snapshot_download
102
+
103
+ model_dir = snapshot_download("Elyadata/AraBEST-RQ-300M-6k")
104
+ print(model_dir)
105
+ ```
106
+
107
+ Please refer to the repository configuration and SpeechBrain recipes for the correct model-loading interface.
108
+
109
+ ### Fine-tuning with SpeechBrain
110
+
111
+ To fine-tune this pretrained Ara-BEST-RQ checkpoint in a SpeechBrain recipe, adapt the `pretrainer` section of your YAML configuration so that it loads both the pretrained model checkpoint and the corresponding normalizer.
112
+
113
+ Example:
114
+
115
+ ```yaml
116
+ pretrainer: !new:speechbrain.utils.parameter_transfer.Pretrainer
117
+ collect_in: !ref <save_folder>
118
+ loadables:
119
+ pt_model: !ref <pt_model>
120
+ normalize: !ref <normalize>
121
+ paths:
122
+ pt_model: !ref <pt_model_path>/model.ckpt
123
+ normalize: !ref <pt_model_path>/normalizer.ckpt
124
+ ```
125
+
126
+ In your downstream recipe, make sure that:
127
+
128
+ - `<pt_model>` points to the Ara-BEST-RQ pretrained model object used in your training graph.
129
+ - `<normalize>` points to the normalization module used by the recipe.
130
+ - `<pt_model_path>` points to the local directory containing `model.ckpt` and `normalizer.ckpt`.
131
+ - `<save_folder>` is the experiment directory where SpeechBrain should collect and manage pretrained components.
132
+
133
+ This setup allows SpeechBrain to initialize the downstream model from the Ara-BEST-RQ SSL checkpoint before fine-tuning on task-specific data.
134
+
135
+
136
+ ## Citation
137
+
138
+ If you use this model, please cite the Ara-BEST-RQ paper:
139
+
140
+ ```bibtex
141
+ @misc{elleuch2026arabestrqmultidialectalarabic,
142
+ title={Ara-Best-RQ: Multi Dialectal Arabic SSL},
143
+ author={Haroun Elleuch and Ryan Whetten and Salima Mdhaffar and Yannick Estève and Fethi Bougares},
144
+ year={2026},
145
+ eprint={2603.21900},
146
+ archivePrefix={arXiv},
147
+ primaryClass={cs.CL},
148
+ url={https://arxiv.org/abs/2603.21900},
149
+ }
150
+ ```