| ---
|
| license: apache-2.0
|
| language:
|
| - ru
|
| tags:
|
| - automatic-speech-recognition
|
| - speaker-diarization
|
| - onnx
|
| - russian
|
| - asr
|
| - gigaam
|
| - 3d-speaker
|
| - camplus
|
| - eres2net
|
| - mobile
|
| - offline
|
| library_name: onnx
|
| ---
|
|
|
| # ProtocolVoice ASR Models
|
|
|
| ONNX models for offline Russian speech recognition and speaker diarization,
|
| packaged for the [ProtocolVoice](https://github.com/protocolvoice) Android app.
|
|
|
| ## Contents
|
|
|
| | File | Size | Purpose | Original source | Original license |
|
| |---|---|---|---|---|
|
| | `gigaam_v3_e2e_ctc_int8.onnx` | 305 MB | Russian ASR with built-in punctuation | [Sber/SaluteDevices GigaAM](https://github.com/salute-developers/GigaAM) (v3, e2e CTC, int8-quantized) | MIT |
|
| | `speaker_embedding_camplus.onnx` | 27 MB | Speaker embedding (CAM++) | [modelscope/3D-Speaker](https://github.com/modelscope/3D-Speaker) | Apache-2.0 |
|
| | `speaker_embedding.onnx` | 111 MB | Speaker embedding (ERes2Net) | [modelscope/3D-Speaker](https://github.com/modelscope/3D-Speaker) | Apache-2.0 |
|
| | `speaker_embedding_v2.onnx` | 68 MB | Speaker embedding (ERes2NetV2) | [modelscope/3D-Speaker](https://github.com/modelscope/3D-Speaker) | Apache-2.0 |
|
| | `manifest.json` | < 1 KB | SHA-256 hashes of all models | this repo | Apache-2.0 |
|
|
|
| ## Important
|
|
|
| These are NOT new models β this repository **redistributes existing models** in ONNX
|
| format for convenient mobile delivery. The original authors retain all credit and
|
| copyright. We did not train, fine-tune, or modify the model weights.
|
|
|
| **Please cite the original projects, not this redistribution:**
|
|
|
| - **GigaAM-v3** (ASR): Sber AI, SaluteDevices β
|
| https://github.com/salute-developers/GigaAM
|
| - **3D-Speaker** (CAM++, ERes2Net, ERes2NetV2): ModelScope, Alibaba β
|
| https://github.com/modelscope/3D-Speaker
|
|
|
| The ONNX conversions and runtime were prepared via [sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx)
|
| (Apache-2.0).
|
|
|
| ## Why this redistribution
|
|
|
| The ProtocolVoice mobile app needs to download these models on first run from a
|
| mirror that:
|
| - supports files larger than 100 MB without git-lfs limits,
|
| - has fast CDN reachable from Russia,
|
| - is the conventional hosting platform for ML models.
|
|
|
| All redistributed files retain their original licenses. This README serves as
|
| the required attribution under those licenses.
|
|
|
| ## How to use
|
|
|
| Each model is loaded by [sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx) on
|
| the device. The ProtocolVoice app:
|
|
|
| 1. Downloads each `.onnx` file by HTTP from
|
| `https://huggingface.co/protocolvoice/asr-models/resolve/main/{filename}`,
|
| 2. Verifies SHA-256 against `manifest.json`,
|
| 3. Loads via sherpa-onnx for offline inference.
|
|
|
| You can also use these files directly with sherpa-onnx in any project that
|
| respects the original licenses.
|
|
|
| ## Verifying integrity
|
|
|
| ```python
|
| import hashlib
|
|
|
| with open("gigaam_v3_e2e_ctc_int8.onnx", "rb") as f:
|
| print(hashlib.sha256(f.read()).hexdigest())
|
| # expected: 0aacb41f70f0f5aaac4b45dd430337b9e16b180f22c72af04db8516e7609c3c0
|
| ```
|
|
|
| Hashes for all files are in `manifest.json`.
|
|
|
| ## License
|
|
|
| This repository's metadata, README, and packaging scripts are released under
|
| **Apache-2.0**. Each model file remains under its original license (see the
|
| table above). By using a model, you accept its original license β not just
|
| this repository's.
|
|
|
| ## Removal request
|
|
|
| If you are an author of one of the upstream projects and have any concerns
|
| about this redistribution (attribution, hosting, anything else), please open
|
| a discussion on this Hugging Face repo or email the maintainers β the files
|
| will be amended or removed as requested.
|
|
|