Instructions to use VoiceScribe/gigaam-v3-e2e-rnnt-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use VoiceScribe/gigaam-v3-e2e-rnnt-mlx with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir gigaam-v3-e2e-rnnt-mlx VoiceScribe/gigaam-v3-e2e-rnnt-mlx
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
mirror aystream/GigaAM-v3-e2e-rnnt-mlx@decfca492069 via mirror_to_hf.py
Browse files- README.md +54 -0
- UPSTREAM_SOURCE.md +26 -0
- config.json +32 -0
- tokenizer.model +3 -0
- weights.safetensors +3 -0
README.md
ADDED
|
@@ -0,0 +1,54 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
library_name: mlx
|
| 3 |
+
license: mit
|
| 4 |
+
language:
|
| 5 |
+
- ru
|
| 6 |
+
- en
|
| 7 |
+
tags:
|
| 8 |
+
- automatic-speech-recognition
|
| 9 |
+
- mlx
|
| 10 |
+
- apple-silicon
|
| 11 |
+
- russian
|
| 12 |
+
- gigaam
|
| 13 |
+
- conformer
|
| 14 |
+
- rnnt
|
| 15 |
+
base_model: ai-sage/GigaAM-v3
|
| 16 |
+
pipeline_tag: automatic-speech-recognition
|
| 17 |
+
---
|
| 18 |
+
|
| 19 |
+
# GigaAM v3 e2e RNNT — MLX
|
| 20 |
+
|
| 21 |
+
MLX port of [GigaAM-v3](https://github.com/salute-developers/GigaAM) RNNT variant for Apple Silicon. Higher quality than CTC, ~77x realtime on M2 Max.
|
| 22 |
+
|
| 23 |
+
## Usage
|
| 24 |
+
|
| 25 |
+
```bash
|
| 26 |
+
pip install git+https://github.com/aystream/gigaam-mlx.git
|
| 27 |
+
```
|
| 28 |
+
|
| 29 |
+
```python
|
| 30 |
+
from gigaam_mlx import load_model, transcribe
|
| 31 |
+
|
| 32 |
+
model, tokenizer = load_model("rnnt") # downloads automatically
|
| 33 |
+
text = transcribe(model, tokenizer, "recording.wav")
|
| 34 |
+
```
|
| 35 |
+
|
| 36 |
+
Or via CLI:
|
| 37 |
+
|
| 38 |
+
```bash
|
| 39 |
+
gigaam-mlx recording.wav --model-type rnnt
|
| 40 |
+
```
|
| 41 |
+
|
| 42 |
+
## CTC vs RNNT
|
| 43 |
+
|
| 44 |
+
| Variant | Speed (20s chunk) | Quality | Full 18-min video |
|
| 45 |
+
|---|---|---|---|
|
| 46 |
+
| [CTC](https://huggingface.co/aystream/GigaAM-v3-e2e-ctc-mlx) | 0.06s (~330x) | Good | 21.5s |
|
| 47 |
+
| **RNNT (this)** | **0.26s (~77x)** | **Better** | **25.0s** |
|
| 48 |
+
|
| 49 |
+
## Links
|
| 50 |
+
|
| 51 |
+
- **Code:** [github.com/aystream/gigaam-mlx](https://github.com/aystream/gigaam-mlx)
|
| 52 |
+
- **CTC variant:** [aystream/GigaAM-v3-e2e-ctc-mlx](https://huggingface.co/aystream/GigaAM-v3-e2e-ctc-mlx)
|
| 53 |
+
- **Original:** [salute-developers/GigaAM](https://github.com/salute-developers/GigaAM) ([paper](https://arxiv.org/abs/2506.01192))
|
| 54 |
+
- **License:** MIT
|
UPSTREAM_SOURCE.md
ADDED
|
@@ -0,0 +1,26 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Upstream Source
|
| 2 |
+
|
| 3 |
+
This repository is a Voice Scribe **mirror** of an upstream model. The model
|
| 4 |
+
weights and code are unchanged from the upstream at the recorded revision.
|
| 5 |
+
|
| 6 |
+
| Field | Value |
|
| 7 |
+
| --- | --- |
|
| 8 |
+
| Upstream repo | `aystream/GigaAM-v3-e2e-rnnt-mlx` |
|
| 9 |
+
| Upstream revision (sha) | `decfca492069ea30fb5ead79c4516d50c16d93ea` |
|
| 10 |
+
| Mirror created | `2026-05-07 12:46:54 UTC` |
|
| 11 |
+
| Mirror slug | `gigaam-mlx` |
|
| 12 |
+
| Description | GigaAM v3 e2e RNN-T - native MLX / Metal package with punctuation. |
|
| 13 |
+
|
| 14 |
+
## Why mirror?
|
| 15 |
+
|
| 16 |
+
The shipping Voice Scribe installer pins every model to the `voice-scribe/*`
|
| 17 |
+
namespace for a single source of truth, integrity check, and future CDN
|
| 18 |
+
migration. Upstream repos retain their original license (see
|
| 19 |
+
`LICENSE*` / `README*` files preserved unchanged below).
|
| 20 |
+
|
| 21 |
+
## Maintenance
|
| 22 |
+
|
| 23 |
+
When upstream publishes a new revision we want to adopt, run the matching
|
| 24 |
+
Voice Scribe mirror script with `--only gigaam-mlx` from the repo root. The script
|
| 25 |
+
creates a new commit on this mirror that replaces the snapshot and updates
|
| 26 |
+
this `UPSTREAM_SOURCE.md`.
|
config.json
ADDED
|
@@ -0,0 +1,32 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"model_type": "gigaam",
|
| 3 |
+
"model_variant": "v3_e2e_ctc",
|
| 4 |
+
"framework": "mlx",
|
| 5 |
+
"encoder": {
|
| 6 |
+
"feat_in": 64,
|
| 7 |
+
"n_layers": 16,
|
| 8 |
+
"d_model": 768,
|
| 9 |
+
"n_heads": 16,
|
| 10 |
+
"ff_expansion_factor": 4,
|
| 11 |
+
"conv_kernel_size": 5,
|
| 12 |
+
"subs_kernel_size": 5,
|
| 13 |
+
"subsampling": "conv1d",
|
| 14 |
+
"subsampling_factor": 4,
|
| 15 |
+
"self_attention_model": "rotary",
|
| 16 |
+
"rope_base": 5000
|
| 17 |
+
},
|
| 18 |
+
"head": {
|
| 19 |
+
"type": "ctc",
|
| 20 |
+
"num_classes": 257
|
| 21 |
+
},
|
| 22 |
+
"preprocessor": {
|
| 23 |
+
"sample_rate": 16000,
|
| 24 |
+
"n_mels": 64,
|
| 25 |
+
"hop_length": 160,
|
| 26 |
+
"win_length": 320,
|
| 27 |
+
"n_fft": 320,
|
| 28 |
+
"center": false
|
| 29 |
+
},
|
| 30 |
+
"tokenizer": "tokenizer.model",
|
| 31 |
+
"total_parameters": 220879361
|
| 32 |
+
}
|
tokenizer.model
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:828c12c991019eef952a960661f25a92d6ad279591e2ea466b4aeddf1d20a18a
|
| 3 |
+
size 255336
|
weights.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:bbd1074b2466baf1a301b7a6c8427cea6d21a20d2ce1e173e7e447692619117f
|
| 3 |
+
size 890094547
|