mirror aystream/GigaAM-v3-e2e-rnnt-mlx@decfca492069 via mirror_to_hf.py

c470b3f verified 17 days ago

1.37 kB

	---
	library_name: mlx
	license: mit
	language:
	- ru
	- en
	tags:
	- automatic-speech-recognition
	- mlx
	- apple-silicon
	- russian
	- gigaam
	- conformer
	- rnnt
	base_model: ai-sage/GigaAM-v3
	pipeline_tag: automatic-speech-recognition
	---

	# GigaAM v3 e2e RNNT — MLX

	MLX port of [GigaAM-v3](https://github.com/salute-developers/GigaAM) RNNT variant for Apple Silicon. Higher quality than CTC, ~77x realtime on M2 Max.

	## Usage

	```bash
	pip install git+https://github.com/aystream/gigaam-mlx.git
	```

	```python
	from gigaam_mlx import load_model, transcribe

	model, tokenizer = load_model("rnnt") # downloads automatically
	text = transcribe(model, tokenizer, "recording.wav")
	```

	Or via CLI:

	```bash
	gigaam-mlx recording.wav --model-type rnnt
	```

	## CTC vs RNNT

	\| Variant \| Speed (20s chunk) \| Quality \| Full 18-min video \|
	\|---\|---\|---\|---\|
	\| [CTC](https://huggingface.co/aystream/GigaAM-v3-e2e-ctc-mlx) \| 0.06s (~330x) \| Good \| 21.5s \|
	\| RNNT (this) \| 0.26s (~77x) \| Better \| 25.0s \|

	## Links

	- Code: [github.com/aystream/gigaam-mlx](https://github.com/aystream/gigaam-mlx)
	- CTC variant: [aystream/GigaAM-v3-e2e-ctc-mlx](https://huggingface.co/aystream/GigaAM-v3-e2e-ctc-mlx)
	- Original: [salute-developers/GigaAM](https://github.com/salute-developers/GigaAM) ([paper](https://arxiv.org/abs/2506.01192))
	- License: MIT