Andrewsab's picture
mirror aystream/GigaAM-v3-e2e-rnnt-mlx@decfca492069 via mirror_to_hf.py
c470b3f verified
metadata
library_name: mlx
license: mit
language:
  - ru
  - en
tags:
  - automatic-speech-recognition
  - mlx
  - apple-silicon
  - russian
  - gigaam
  - conformer
  - rnnt
base_model: ai-sage/GigaAM-v3
pipeline_tag: automatic-speech-recognition

GigaAM v3 e2e RNNT — MLX

MLX port of GigaAM-v3 RNNT variant for Apple Silicon. Higher quality than CTC, ~77x realtime on M2 Max.

Usage

pip install git+https://github.com/aystream/gigaam-mlx.git
from gigaam_mlx import load_model, transcribe

model, tokenizer = load_model("rnnt")  # downloads automatically
text = transcribe(model, tokenizer, "recording.wav")

Or via CLI:

gigaam-mlx recording.wav --model-type rnnt

CTC vs RNNT

Variant Speed (20s chunk) Quality Full 18-min video
CTC 0.06s (~330x) Good 21.5s
RNNT (this) 0.26s (~77x) Better 25.0s

Links