Instructions to use VoiceScribe/gigaam-v3-e2e-rnnt-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use VoiceScribe/gigaam-v3-e2e-rnnt-mlx with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir gigaam-v3-e2e-rnnt-mlx VoiceScribe/gigaam-v3-e2e-rnnt-mlx
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
metadata
library_name: mlx
license: mit
language:
- ru
- en
tags:
- automatic-speech-recognition
- mlx
- apple-silicon
- russian
- gigaam
- conformer
- rnnt
base_model: ai-sage/GigaAM-v3
pipeline_tag: automatic-speech-recognition
GigaAM v3 e2e RNNT — MLX
MLX port of GigaAM-v3 RNNT variant for Apple Silicon. Higher quality than CTC, ~77x realtime on M2 Max.
Usage
pip install git+https://github.com/aystream/gigaam-mlx.git
from gigaam_mlx import load_model, transcribe
model, tokenizer = load_model("rnnt") # downloads automatically
text = transcribe(model, tokenizer, "recording.wav")
Or via CLI:
gigaam-mlx recording.wav --model-type rnnt
CTC vs RNNT
| Variant | Speed (20s chunk) | Quality | Full 18-min video |
|---|---|---|---|
| CTC | 0.06s (~330x) | Good | 21.5s |
| RNNT (this) | 0.26s (~77x) | Better | 25.0s |
Links
- Code: github.com/aystream/gigaam-mlx
- CTC variant: aystream/GigaAM-v3-e2e-ctc-mlx
- Original: salute-developers/GigaAM (paper)
- License: MIT