File size: 1,371 Bytes
c470b3f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
---
library_name: mlx
license: mit
language:
  - ru
  - en
tags:
  - automatic-speech-recognition
  - mlx
  - apple-silicon
  - russian
  - gigaam
  - conformer
  - rnnt
base_model: ai-sage/GigaAM-v3
pipeline_tag: automatic-speech-recognition
---

# GigaAM v3 e2e RNNT — MLX

MLX port of [GigaAM-v3](https://github.com/salute-developers/GigaAM) RNNT variant for Apple Silicon. Higher quality than CTC, ~77x realtime on M2 Max.

## Usage

```bash
pip install git+https://github.com/aystream/gigaam-mlx.git
```

```python
from gigaam_mlx import load_model, transcribe

model, tokenizer = load_model("rnnt")  # downloads automatically
text = transcribe(model, tokenizer, "recording.wav")
```

Or via CLI:

```bash
gigaam-mlx recording.wav --model-type rnnt
```

## CTC vs RNNT

| Variant | Speed (20s chunk) | Quality | Full 18-min video |
|---|---|---|---|
| [CTC](https://huggingface.co/aystream/GigaAM-v3-e2e-ctc-mlx) | 0.06s (~330x) | Good | 21.5s |
| **RNNT (this)** | **0.26s (~77x)** | **Better** | **25.0s** |

## Links

- **Code:** [github.com/aystream/gigaam-mlx](https://github.com/aystream/gigaam-mlx)
- **CTC variant:** [aystream/GigaAM-v3-e2e-ctc-mlx](https://huggingface.co/aystream/GigaAM-v3-e2e-ctc-mlx)
- **Original:** [salute-developers/GigaAM](https://github.com/salute-developers/GigaAM) ([paper](https://arxiv.org/abs/2506.01192))
- **License:** MIT