Automatic Speech Recognition
NeMo
English
asr
atc
air-traffic-control
aviation
parakeet
fastconformer
tdt
finetuned
built-with-llama
Eval Results (legacy)
Instructions to use twangodev/rasr-parakeet-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- NeMo
How to use twangodev/rasr-parakeet-v1 with NeMo:
import nemo.collections.asr as nemo_asr asr_model = nemo_asr.models.ASRModel.from_pretrained("twangodev/rasr-parakeet-v1") transcriptions = asr_model.transcribe(["file.wav"]) - Notebooks
- Google Colab
- Kaggle
Commit ·
1faf953
0
Parent(s):
Initial release: rasr-parakeet-v1
Browse files- .gitattributes +36 -0
- README.md +197 -0
- rasr-parakeet-v1.nemo +3 -0
- training_recipe.yaml +36 -0
.gitattributes
ADDED
|
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
rasr-parakeet-v1.nemo filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
|
@@ -0,0 +1,197 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: llama3.2
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
base_model: nvidia/parakeet-tdt-0.6b-v3
|
| 6 |
+
tags:
|
| 7 |
+
- automatic-speech-recognition
|
| 8 |
+
- asr
|
| 9 |
+
- atc
|
| 10 |
+
- air-traffic-control
|
| 11 |
+
- aviation
|
| 12 |
+
- parakeet
|
| 13 |
+
- nemo
|
| 14 |
+
- fastconformer
|
| 15 |
+
- tdt
|
| 16 |
+
- finetuned
|
| 17 |
+
- built-with-llama
|
| 18 |
+
datasets:
|
| 19 |
+
- twangodev/radiotalk-us-audio-tada-noisy
|
| 20 |
+
- jlvdoorn/atco2-asr
|
| 21 |
+
- jlvdoorn/atco2-asr-atcosim
|
| 22 |
+
metrics:
|
| 23 |
+
- wer
|
| 24 |
+
- cer
|
| 25 |
+
library_name: nemo
|
| 26 |
+
pipeline_tag: automatic-speech-recognition
|
| 27 |
+
model-index:
|
| 28 |
+
- name: rasr-parakeet-v1
|
| 29 |
+
results:
|
| 30 |
+
- task:
|
| 31 |
+
type: automatic-speech-recognition
|
| 32 |
+
name: Speech-to-Text
|
| 33 |
+
dataset:
|
| 34 |
+
name: ATCO2 (jlvdoorn/atco2-asr validation)
|
| 35 |
+
type: jlvdoorn/atco2-asr
|
| 36 |
+
split: validation
|
| 37 |
+
metrics:
|
| 38 |
+
- type: wer
|
| 39 |
+
value: 0.1246
|
| 40 |
+
name: Word Error Rate
|
| 41 |
+
- type: cer
|
| 42 |
+
value: 0.0780
|
| 43 |
+
name: Character Error Rate
|
| 44 |
+
---
|
| 45 |
+
|
| 46 |
+
# rasr-parakeet-v1
|
| 47 |
+
|
| 48 |
+
ATC ASR finetune of `nvidia/parakeet-tdt-0.6b-v3` on a synthetic US-style ATC corpus (`radiotalk-us-audio-tada-noisy`) with a small real-ATC anchor (ATCO2 + ATCOSIM train splits). Trained as v1 of the [rasr](https://github.com/twangodev/rasr) toolkit.
|
| 49 |
+
|
| 50 |
+
## Headline
|
| 51 |
+
|
| 52 |
+
| Metric | This model | Prior public SOTA (`jlvdoorn/whisper-large-v3-atco2-asr`) |
|
| 53 |
+
|---|---|---|
|
| 54 |
+
| **ATCO2 val WER** | **0.125** | 0.157 |
|
| 55 |
+
| **ATCO2 val CER** | **0.078** | 0.088 |
|
| 56 |
+
| **ATCO2 val numeric WER** | **0.050** | 0.074 |
|
| 57 |
+
|
| 58 |
+
21% relative WER reduction over the previous public SOTA on the ATCO2 validation benchmark, with a smaller base model (0.6B params vs 1.55B).
|
| 59 |
+
|
| 60 |
+
## Quick start
|
| 61 |
+
|
| 62 |
+
```python
|
| 63 |
+
import nemo.collections.asr as nemo_asr
|
| 64 |
+
|
| 65 |
+
model = nemo_asr.models.ASRModel.from_pretrained("twangodev/rasr-parakeet-v1")
|
| 66 |
+
result = model.transcribe(["atc_clip.wav"])
|
| 67 |
+
print(result[0].text)
|
| 68 |
+
```
|
| 69 |
+
|
| 70 |
+
Or via the rasr eval toolkit:
|
| 71 |
+
|
| 72 |
+
```bash
|
| 73 |
+
pip install rasr
|
| 74 |
+
rasr eval run \
|
| 75 |
+
-m nemo:hf://twangodev/rasr-parakeet-v1 \
|
| 76 |
+
-d hf:jlvdoorn/atco2-asr:validation \
|
| 77 |
+
--language en --batch-size 16
|
| 78 |
+
```
|
| 79 |
+
|
| 80 |
+
## Architecture
|
| 81 |
+
|
| 82 |
+
- **Base**: [`nvidia/parakeet-tdt-0.6b-v3`](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3) (FastConformer encoder + TDT decoder, 0.6B params)
|
| 83 |
+
- **Tokenizer**: kept from base — SentencePiece BPE 8192 tokens, multilingual
|
| 84 |
+
- **Sample rate**: 16 kHz mono
|
| 85 |
+
- **Max input duration**: 18 seconds (extended-length inputs may degrade — TDT joint memory)
|
| 86 |
+
|
| 87 |
+
## Training data
|
| 88 |
+
|
| 89 |
+
**This model was trained on transcripts generated by Llama 3.2 and audio synthesized via the Tada TTS pipeline.** Specifically:
|
| 90 |
+
|
| 91 |
+
| Source | Type | Role |
|
| 92 |
+
|---|---|---|
|
| 93 |
+
| [`twangodev/radiotalk-us-audio-tada-noisy`](https://huggingface.co/datasets/twangodev/radiotalk-us-audio-tada-noisy) (200k subset) | Synthetic US ATC | Bulk training audio. Dialogue transcripts generated by **Llama 3.2**, audio synthesized by [Tada](https://github.com/twangodev/tada) (TTS) with VHF channel degradation pipeline. |
|
| 94 |
+
| [`jlvdoorn/atco2-asr`](https://huggingface.co/datasets/jlvdoorn/atco2-asr) (train split, ~446 clips) | Real European ATC | Real-data anchor; upweighted 10× to supply real-radio acoustic priors and European operator vocabulary. |
|
| 95 |
+
| [`jlvdoorn/atco2-asr-atcosim`](https://huggingface.co/datasets/jlvdoorn/atco2-asr-atcosim) (train, ~10k clips) | Real EU ATC + simulator | Real-data anchor; upweighted 10×. |
|
| 96 |
+
|
| 97 |
+
### Llama 3.2 attribution
|
| 98 |
+
|
| 99 |
+
This model is "Built with Llama" under the [Llama 3.2 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/LICENSE). Llama 3.2 was used to generate the ATC dialogue transcripts in the `radiotalk-us-audio-tada-noisy` dataset — those transcripts are the supervised targets the model learned to produce. The audio itself was synthesized by Tada (not Llama).
|
| 100 |
+
|
| 101 |
+
## Training recipe
|
| 102 |
+
|
| 103 |
+
Full reproducible recipe: [`configs/train/rtx6kpro/parakeet-mixed.yaml`](https://github.com/twangodev/rasr/blob/main/configs/train/rtx6kpro/parakeet-mixed.yaml).
|
| 104 |
+
|
| 105 |
+
| Hyperparameter | Value |
|
| 106 |
+
|---|---|
|
| 107 |
+
| Optimizer | AdamW, β=(0.9, 0.98), weight_decay=1e-3 |
|
| 108 |
+
| Learning rate | 1e-4 |
|
| 109 |
+
| Schedule | CosineAnnealing, warmup 5000 steps, min_lr=1e-6 |
|
| 110 |
+
| Batch size | 32 (effective) |
|
| 111 |
+
| Precision | bf16-mixed |
|
| 112 |
+
| Max steps | 50,000 |
|
| 113 |
+
| Augmentation | SpecAugment (default), speed perturb 0.95-1.05 |
|
| 114 |
+
| Max audio duration | 18.0 s |
|
| 115 |
+
| Mixing | weighted manifest concat (radiotalk ×1, ATCO2 train ×10, ATCO2+ATCOSIM train ×10) |
|
| 116 |
+
| Hardware | NVIDIA RTX PRO 6000 Blackwell (96 GB) |
|
| 117 |
+
| Wall clock | ~12 hours |
|
| 118 |
+
|
| 119 |
+
## Strengths
|
| 120 |
+
|
| 121 |
+
- **Structurally robust ATC output.** Position-call grammar (CTAF + towered), runway IDs, headings, and altitude readbacks are recovered cleanly.
|
| 122 |
+
- **Strong on numeric/safety-critical content.** Per-utterance numeric WER 0.050 on ATCO2 val (3× better than prior SOTA on the same axis).
|
| 123 |
+
- **Stable on out-of-distribution audio.** Zero runaway hallucinations observed on real US GA audio (TartanAviation KBTP), unlike LLM-decoder ASR models (e.g., Canary-Qwen, Granite Speech) which confabulate confidently on hard audio.
|
| 124 |
+
- **Small footprint.** 0.6B params, fits in 4 GB VRAM at inference; ~10× faster than larger Whisper-based ATC finetunes.
|
| 125 |
+
|
| 126 |
+
## Limitations
|
| 127 |
+
|
| 128 |
+
This model was trained on a US-style synthetic corpus plus a European real-data anchor. The combination produces specific biases users should be aware of:
|
| 129 |
+
|
| 130 |
+
1. **Operator substitution bias.** The model has been observed substituting unfamiliar callsigns with familiar ones from its training distribution — e.g., emitting "Lufthansa" or "Delta" where the audio contained a less-common operator. Particularly noticeable on US general aviation (GA) traffic, where N-number tail callsigns (e.g., "Cessna Eight One Niner Charlie Mike") may be mis-substituted with major airline prefixes.
|
| 131 |
+
|
| 132 |
+
2. **Limited US GA airport name coverage.** The model has not seen most small US GA airport names during training. On real US GA audio (e.g., TartanAviation KBTP recordings), it produces phonetically-similar substitutions for the airport name ("Bravo Traffic", "Bello Traffic") instead of the correct name ("Butler Traffic").
|
| 133 |
+
|
| 134 |
+
3. **European real-anchor contamination on US output.** Training included European-real ATCO2/ATCOSIM data to anchor distribution and unblock the SOTA result on ATCO2 val. This European prior is visible in US-style transcription (occasional "Swiss", "Bern Tower", "Belfast Tower" tokens that should not appear).
|
| 135 |
+
|
| 136 |
+
4. **Sanity rate on real US GA audio: 77%** (10% CLEAN + 67% PLAUSIBLE-MISHEARD across 69 TartanAviation KBTP clips). Of the imperfect cases, the failure is overwhelmingly *substitution of correct word in correct slot*, not garbling or hallucination.
|
| 137 |
+
|
| 138 |
+
5. **Evaluation distribution.** This model is benchmarked against ATCO2 (European real ATC). It has not been evaluated against a US ATC benchmark — no fully public US ATC ASR test set with annotations currently exists.
|
| 139 |
+
|
| 140 |
+
## Recommended usage
|
| 141 |
+
|
| 142 |
+
- **For European ATC** (or audio matching ATCO2-style distribution): deploy as-is. Numbers above are the expected performance.
|
| 143 |
+
- **For US ATC**: use with **inference-time hot-word biasing** against a known callsign + airport-name vocabulary specific to the deployment region. NeMo's TDT decoder supports hot-word biasing via `change_decoding_strategy()`. Most substitution failures collapse to correct output with appropriate biasing.
|
| 144 |
+
- **For safety-critical applications**: always layer with confidence-based rejection. This model is intended as a research/development checkpoint, not as a safety-certified ATC transcription system.
|
| 145 |
+
|
| 146 |
+
## Citation
|
| 147 |
+
|
| 148 |
+
If you use this model, please cite the project and the underlying components:
|
| 149 |
+
|
| 150 |
+
```bibtex
|
| 151 |
+
@software{rasr,
|
| 152 |
+
author = {Ding, James},
|
| 153 |
+
title = {rasr: ATC ASR finetuning toolkit},
|
| 154 |
+
url = {https://github.com/twangodev/rasr},
|
| 155 |
+
year = {2026}
|
| 156 |
+
}
|
| 157 |
+
```
|
| 158 |
+
|
| 159 |
+
And the base model:
|
| 160 |
+
|
| 161 |
+
```bibtex
|
| 162 |
+
@misc{parakeet-tdt,
|
| 163 |
+
author = {NVIDIA},
|
| 164 |
+
title = {Parakeet-TDT-0.6B-v3},
|
| 165 |
+
url = {https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3}
|
| 166 |
+
}
|
| 167 |
+
```
|
| 168 |
+
|
| 169 |
+
And Llama 3.2 (training transcripts):
|
| 170 |
+
|
| 171 |
+
```bibtex
|
| 172 |
+
@misc{llama3.2,
|
| 173 |
+
author = {{Meta AI}},
|
| 174 |
+
title = {The Llama 3.2 Herd of Models},
|
| 175 |
+
year = {2024},
|
| 176 |
+
url = {https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/}
|
| 177 |
+
}
|
| 178 |
+
```
|
| 179 |
+
|
| 180 |
+
## License
|
| 181 |
+
|
| 182 |
+
Released under the **[Llama 3.2 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/LICENSE)** ("Built with Llama"). This is the binding upstream license because the training transcripts were generated by Llama 3.2, and the resulting model is treated as a derivative work of Llama Materials for licensing purposes.
|
| 183 |
+
|
| 184 |
+
In addition to the Llama 3.2 terms, this model also inherits attribution and use requirements from its other parents:
|
| 185 |
+
|
| 186 |
+
- **Parakeet-TDT-0.6B-v3** ([CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/), NVIDIA) — base model
|
| 187 |
+
- **ATCO2 corpus** (CC-BY-4.0) — real-data anchor (train split)
|
| 188 |
+
- **ATCOSIM corpus** (research use; see [source](https://www.spsc.tugraz.at/databases-and-tools/atcosim-air-traffic-control-simulation-speech-corpus.html))
|
| 189 |
+
- **radiotalk-us-audio-tada-noisy** (Llama 3.2 Community License — transcripts generated by Llama 3.2, audio synthesized via Tada) — synthetic training audio
|
| 190 |
+
|
| 191 |
+
To redistribute or deploy:
|
| 192 |
+
1. Include a copy of the Llama 3.2 Community License.
|
| 193 |
+
2. Display "Built with Llama" in your product / user interface / about page.
|
| 194 |
+
3. Comply with the [Llama 3.2 Acceptable Use Policy](https://github.com/meta-llama/llama-models/blob/main/models/llama3_2/USE_POLICY.md).
|
| 195 |
+
4. If your service exceeds 700M monthly active users, request a separate commercial license from Meta.
|
| 196 |
+
|
| 197 |
+
This is not legal advice. If you are deploying this model commercially or at scale, consult a lawyer regarding the interaction of the upstream licenses.
|
rasr-parakeet-v1.nemo
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:32171df9b141665764153d522b93a2a282aa6836ee80158fe77ff4b6f67f189d
|
| 3 |
+
size 2509332480
|
training_recipe.yaml
ADDED
|
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# 50k-step mixed run. Radiotalk synthetic + real ATCO2/ATCOSIM upweighted
|
| 2 |
+
# 10x to anchor distribution and supply the European/GA callsigns radiotalk
|
| 3 |
+
# doesn't cover. Target: meaningfully close the gap vs jlvdoorn's 0.157 WER
|
| 4 |
+
# on ATCO2 val. Expected wall clock: ~10-12 hours on the 6000 Pro
|
| 5 |
+
# (includes ~1 hr to dump the additional 100k radiotalk WAVs).
|
| 6 |
+
|
| 7 |
+
defaults: [base, rtx6kpro/hw]
|
| 8 |
+
|
| 9 |
+
name: parakeet-mixed
|
| 10 |
+
|
| 11 |
+
model:
|
| 12 |
+
scheme: parakeet
|
| 13 |
+
ref: nvidia/parakeet-tdt-0.6b-v3
|
| 14 |
+
|
| 15 |
+
data:
|
| 16 |
+
train:
|
| 17 |
+
- dataset: hf:twangodev/radiotalk-us-audio-tada-noisy:train
|
| 18 |
+
weight: 1.0
|
| 19 |
+
limit: 200000 # 2x the radiotalk-100k cache; remove when Lhotse lands
|
| 20 |
+
- dataset: hf:jlvdoorn/atco2-asr:train
|
| 21 |
+
weight: 10.0 # upweight real ATC 10x; small but anchors distribution
|
| 22 |
+
- dataset: hf:jlvdoorn/atco2-asr-atcosim:train
|
| 23 |
+
weight: 10.0
|
| 24 |
+
validation:
|
| 25 |
+
- dataset: hf:jlvdoorn/atco2-asr:validation
|
| 26 |
+
|
| 27 |
+
augmentation:
|
| 28 |
+
noise:
|
| 29 |
+
enabled: false # leaving off until a noise corpus is wired up
|
| 30 |
+
|
| 31 |
+
trainer:
|
| 32 |
+
max_steps: 50000
|
| 33 |
+
val_check_interval: 2000
|
| 34 |
+
|
| 35 |
+
output:
|
| 36 |
+
dir: ckpt/${name}
|