mirror aystream/GigaAM-v3-e2e-rnnt-mlx@decfca492069 via mirror_to_hf.py

Files changed (5) hide show

README.md ADDED Viewed

+---
+library_name: mlx
+license: mit
+language:
+  - ru
+  - en
+tags:
+  - automatic-speech-recognition
+  - mlx
+  - apple-silicon
+  - russian
+  - gigaam
+  - conformer
+  - rnnt
+base_model: ai-sage/GigaAM-v3
+pipeline_tag: automatic-speech-recognition
+---
+# GigaAM v3 e2e RNNT — MLX
+MLX port of [GigaAM-v3](https://github.com/salute-developers/GigaAM) RNNT variant for Apple Silicon. Higher quality than CTC, ~77x realtime on M2 Max.
+## Usage
+```bash
+pip install git+https://github.com/aystream/gigaam-mlx.git
+```
+```python
+from gigaam_mlx import load_model, transcribe
+model, tokenizer = load_model("rnnt")  # downloads automatically
+text = transcribe(model, tokenizer, "recording.wav")
+```
+Or via CLI:
+```bash
+gigaam-mlx recording.wav --model-type rnnt
+```
+## CTC vs RNNT
+| Variant | Speed (20s chunk) | Quality | Full 18-min video |
+|---|---|---|---|
+| [CTC](https://huggingface.co/aystream/GigaAM-v3-e2e-ctc-mlx) | 0.06s (~330x) | Good | 21.5s |
+| **RNNT (this)** | **0.26s (~77x)** | **Better** | **25.0s** |
+## Links
+- **Code:** [github.com/aystream/gigaam-mlx](https://github.com/aystream/gigaam-mlx)
+- **CTC variant:** [aystream/GigaAM-v3-e2e-ctc-mlx](https://huggingface.co/aystream/GigaAM-v3-e2e-ctc-mlx)
+- **Original:** [salute-developers/GigaAM](https://github.com/salute-developers/GigaAM) ([paper](https://arxiv.org/abs/2506.01192))
+- **License:** MIT

UPSTREAM_SOURCE.md ADDED Viewed

+# Upstream Source
+This repository is a Voice Scribe **mirror** of an upstream model. The model
+weights and code are unchanged from the upstream at the recorded revision.
+| Field | Value |
+| --- | --- |
+| Upstream repo | `aystream/GigaAM-v3-e2e-rnnt-mlx` |
+| Upstream revision (sha) | `decfca492069ea30fb5ead79c4516d50c16d93ea` |
+| Mirror created | `2026-05-07 12:46:54 UTC` |
+| Mirror slug | `gigaam-mlx` |
+| Description | GigaAM v3 e2e RNN-T - native MLX / Metal package with punctuation. |
+## Why mirror?
+The shipping Voice Scribe installer pins every model to the `voice-scribe/*`
+namespace for a single source of truth, integrity check, and future CDN
+migration. Upstream repos retain their original license (see
+`LICENSE*` / `README*` files preserved unchanged below).
+## Maintenance
+When upstream publishes a new revision we want to adopt, run the matching
+Voice Scribe mirror script with `--only gigaam-mlx` from the repo root. The script
+creates a new commit on this mirror that replaces the snapshot and updates
+this `UPSTREAM_SOURCE.md`.

config.json ADDED Viewed

+{
+  "model_type": "gigaam",
+  "model_variant": "v3_e2e_ctc",
+  "framework": "mlx",
+  "encoder": {
+    "feat_in": 64,
+    "n_layers": 16,
+    "d_model": 768,
+    "n_heads": 16,
+    "ff_expansion_factor": 4,
+    "conv_kernel_size": 5,
+    "subs_kernel_size": 5,
+    "subsampling": "conv1d",
+    "subsampling_factor": 4,
+    "self_attention_model": "rotary",
+    "rope_base": 5000
+  },
+  "head": {
+    "type": "ctc",
+    "num_classes": 257
+  },
+  "preprocessor": {
+    "sample_rate": 16000,
+    "n_mels": 64,
+    "hop_length": 160,
+    "win_length": 320,
+    "n_fft": 320,
+    "center": false
+  },
+  "tokenizer": "tokenizer.model",
+  "total_parameters": 220879361
+}

tokenizer.model ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:828c12c991019eef952a960661f25a92d6ad279591e2ea466b4aeddf1d20a18a
+size 255336

weights.safetensors ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:bbd1074b2466baf1a301b7a6c8427cea6d21a20d2ce1e173e7e447692619117f
+size 890094547