youtube-atc-fastconformer
A compact 115M-parameter FastConformer Hybrid RNNT-CTC model for automatic speech recognition in the air traffic control (ATC) domain, trained exclusively on pseudo-labeled data from YouTube recordings of virtual ATC simulator sessions (VATSIM/IVAO).
Overview
Automatic speech recognition for air traffic control faces severe training data scarcity due to operational recording restrictions and expensive domain-expert transcription requirements. This model demonstrates that large-scale, pseudo-labeled data from publicly available YouTube streams can effectively train specialized ASR systems without any manually annotated operational data.
The model was trained on the youtube-atc dataset containing over 800 hours of content spanning 709 videos from virtual airports in 17 countries, covering ground, tower, approach, and en-route operational domains with diverse speaker accents. Training data was generated using an automated pipeline with speaker diarization, multi-model transcription, and LLM-based transcript fusion.
Usage
Load from Hugging Face
from nemo.collections.asr.models import EncDecHybridRNNTCTCModel
model = EncDecHybridRNNTCTCModel.from_pretrained("niclaswue/youtube-atc-fastconformer")
results = model.transcribe(["audio.wav"], batch_size=16)
for hyp in results:
text = hyp.text if hasattr(hyp, "text") else hyp
print(text)
Load from .nemo file
from nemo.collections.asr.models import EncDecHybridRNNTCTCModel
model = EncDecHybridRNNTCTCModel.restore_from("FastConformer-Hybrid-Transducer-CTC-Char.nemo")
results = model.transcribe(["audio.wav"], batch_size=16)
Training
Trained using the NVIDIA NeMo framework (v2.5.0) with character-level tokenization (a-z, space, apostrophe) on the youtube-atc dataset.
Dataset
The training data was collected from publicly available YouTube streams of virtual ATC simulator sessions. The full data collection pipeline and curated video collection are available at: github.com/niclaswue/youtube-atc
Citation
@inproceedings{dlr219501,
title = {Can YouTube Stream Recordings Improve Automatic Speech Recognition for Air Traffic Control?},
year = {2025},
booktitle = {13th OpenSky Symposium},
author = {W{\"u}stenbecker, Niclas and Ohneiser, Oliver and Kleinert, Matthias},
month = {November},
url = {https://elib.dlr.de/219501/},
keywords = {Air Traffic Control; Automatic Speech Recognition; Public Dataset; Large Language Model;}
}
License
MIT
- Downloads last month
- 14