Andrewsab commited on
Commit
c470b3f
·
verified ·
1 Parent(s): 45905bf

mirror aystream/GigaAM-v3-e2e-rnnt-mlx@decfca492069 via mirror_to_hf.py

Browse files
Files changed (5) hide show
  1. README.md +54 -0
  2. UPSTREAM_SOURCE.md +26 -0
  3. config.json +32 -0
  4. tokenizer.model +3 -0
  5. weights.safetensors +3 -0
README.md ADDED
@@ -0,0 +1,54 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: mlx
3
+ license: mit
4
+ language:
5
+ - ru
6
+ - en
7
+ tags:
8
+ - automatic-speech-recognition
9
+ - mlx
10
+ - apple-silicon
11
+ - russian
12
+ - gigaam
13
+ - conformer
14
+ - rnnt
15
+ base_model: ai-sage/GigaAM-v3
16
+ pipeline_tag: automatic-speech-recognition
17
+ ---
18
+
19
+ # GigaAM v3 e2e RNNT — MLX
20
+
21
+ MLX port of [GigaAM-v3](https://github.com/salute-developers/GigaAM) RNNT variant for Apple Silicon. Higher quality than CTC, ~77x realtime on M2 Max.
22
+
23
+ ## Usage
24
+
25
+ ```bash
26
+ pip install git+https://github.com/aystream/gigaam-mlx.git
27
+ ```
28
+
29
+ ```python
30
+ from gigaam_mlx import load_model, transcribe
31
+
32
+ model, tokenizer = load_model("rnnt") # downloads automatically
33
+ text = transcribe(model, tokenizer, "recording.wav")
34
+ ```
35
+
36
+ Or via CLI:
37
+
38
+ ```bash
39
+ gigaam-mlx recording.wav --model-type rnnt
40
+ ```
41
+
42
+ ## CTC vs RNNT
43
+
44
+ | Variant | Speed (20s chunk) | Quality | Full 18-min video |
45
+ |---|---|---|---|
46
+ | [CTC](https://huggingface.co/aystream/GigaAM-v3-e2e-ctc-mlx) | 0.06s (~330x) | Good | 21.5s |
47
+ | **RNNT (this)** | **0.26s (~77x)** | **Better** | **25.0s** |
48
+
49
+ ## Links
50
+
51
+ - **Code:** [github.com/aystream/gigaam-mlx](https://github.com/aystream/gigaam-mlx)
52
+ - **CTC variant:** [aystream/GigaAM-v3-e2e-ctc-mlx](https://huggingface.co/aystream/GigaAM-v3-e2e-ctc-mlx)
53
+ - **Original:** [salute-developers/GigaAM](https://github.com/salute-developers/GigaAM) ([paper](https://arxiv.org/abs/2506.01192))
54
+ - **License:** MIT
UPSTREAM_SOURCE.md ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Upstream Source
2
+
3
+ This repository is a Voice Scribe **mirror** of an upstream model. The model
4
+ weights and code are unchanged from the upstream at the recorded revision.
5
+
6
+ | Field | Value |
7
+ | --- | --- |
8
+ | Upstream repo | `aystream/GigaAM-v3-e2e-rnnt-mlx` |
9
+ | Upstream revision (sha) | `decfca492069ea30fb5ead79c4516d50c16d93ea` |
10
+ | Mirror created | `2026-05-07 12:46:54 UTC` |
11
+ | Mirror slug | `gigaam-mlx` |
12
+ | Description | GigaAM v3 e2e RNN-T - native MLX / Metal package with punctuation. |
13
+
14
+ ## Why mirror?
15
+
16
+ The shipping Voice Scribe installer pins every model to the `voice-scribe/*`
17
+ namespace for a single source of truth, integrity check, and future CDN
18
+ migration. Upstream repos retain their original license (see
19
+ `LICENSE*` / `README*` files preserved unchanged below).
20
+
21
+ ## Maintenance
22
+
23
+ When upstream publishes a new revision we want to adopt, run the matching
24
+ Voice Scribe mirror script with `--only gigaam-mlx` from the repo root. The script
25
+ creates a new commit on this mirror that replaces the snapshot and updates
26
+ this `UPSTREAM_SOURCE.md`.
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "gigaam",
3
+ "model_variant": "v3_e2e_ctc",
4
+ "framework": "mlx",
5
+ "encoder": {
6
+ "feat_in": 64,
7
+ "n_layers": 16,
8
+ "d_model": 768,
9
+ "n_heads": 16,
10
+ "ff_expansion_factor": 4,
11
+ "conv_kernel_size": 5,
12
+ "subs_kernel_size": 5,
13
+ "subsampling": "conv1d",
14
+ "subsampling_factor": 4,
15
+ "self_attention_model": "rotary",
16
+ "rope_base": 5000
17
+ },
18
+ "head": {
19
+ "type": "ctc",
20
+ "num_classes": 257
21
+ },
22
+ "preprocessor": {
23
+ "sample_rate": 16000,
24
+ "n_mels": 64,
25
+ "hop_length": 160,
26
+ "win_length": 320,
27
+ "n_fft": 320,
28
+ "center": false
29
+ },
30
+ "tokenizer": "tokenizer.model",
31
+ "total_parameters": 220879361
32
+ }
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:828c12c991019eef952a960661f25a92d6ad279591e2ea466b4aeddf1d20a18a
3
+ size 255336
weights.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bbd1074b2466baf1a301b7a6c8427cea6d21a20d2ce1e173e7e447692619117f
3
+ size 890094547