TATA: Teach, Align, Transduce, Adapt

A FastConformer-RNN-T model (32M parameters) for Egyptian Arabic speech recognition with an integrated speaker diarization pipeline. Developed for the MTC-AIC II competition where it placed 2nd with a Mean Levenshtein Distance of 9.59.

Quick Start

Installation

conda install -c conda-forge llvmlite numba
pip install "tata-asr @ git+https://huggingface.co/yousefkotp/TATA-egyptian-arabic-asr-diarization"

ASR Only (3 lines)

from tata import TATA

model = TATA()
text = model.transcribe("audio.wav")
print(text)

ASR + Speaker Diarization (4 lines)

from tata import TATA

model = TATA()
segments = model.transcribe("audio.wav", diarize=True)
for seg in segments:
    print(f"[Speaker {seg.speaker}] {seg.start:.1f}s - {seg.end:.1f}s: {seg.text}")

Model weights and tokenizer are downloaded automatically on first use.

Architecture

Component Details
ASR Encoder FastConformer, 16 layers, d_model=256, 4 heads
ASR Decoder RNN-T with pred_hidden=640
Tokenizer BPE, vocab_size=256, trained on Egyptian Arabic
VAD MarbleNet (multilingual)
Speaker Embeddings TitaNet-Large + ECAPA-TDNN (concatenated, 384-dim)
Clustering Agglomerative Hierarchical Clustering
Source Separation Demucs (htdemucs, vocal isolation)

Training

Trained from scratch using a four-stage curriculum:

  1. Teach -- CTC pretraining on LLM-generated synthetic speech
  2. Align -- CTC fine-tuning on real Egyptian Arabic data
  3. Transduce -- RNN-T training with encoder-only transfer from CTC
  4. Adapt -- Domain adaptation on competition-specific data

Hardware: Single NVIDIA P100 (16 GB), FP32 precision.

Citation

@misc{tata2025,
  title={Fake It, Then Make It: Synthetic-to-Real Training for Egyptian Arabic ASR with Diarization},
  author={Kotp, Yousef and Alaa, Karim and El-nenaey, Abdelrahman and Barakat, Rana and Zahran, Loaui and El Yamany, Ismael},
  year={2025},
  institution={Alexandria University}
}

License

Apache 2.0

Links

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support