asr-models / README.md
Conwerter's picture
Add full attribution README for upstream models (GigaAM, 3D-Speaker)
2341cf2 verified
---
license: apache-2.0
language:
- ru
tags:
- automatic-speech-recognition
- speaker-diarization
- onnx
- russian
- asr
- gigaam
- 3d-speaker
- camplus
- eres2net
- mobile
- offline
library_name: onnx
---
# ProtocolVoice ASR Models
ONNX models for offline Russian speech recognition and speaker diarization,
packaged for the [ProtocolVoice](https://github.com/protocolvoice) Android app.
## Contents
| File | Size | Purpose | Original source | Original license |
|---|---|---|---|---|
| `gigaam_v3_e2e_ctc_int8.onnx` | 305 MB | Russian ASR with built-in punctuation | [Sber/SaluteDevices GigaAM](https://github.com/salute-developers/GigaAM) (v3, e2e CTC, int8-quantized) | MIT |
| `speaker_embedding_camplus.onnx` | 27 MB | Speaker embedding (CAM++) | [modelscope/3D-Speaker](https://github.com/modelscope/3D-Speaker) | Apache-2.0 |
| `speaker_embedding.onnx` | 111 MB | Speaker embedding (ERes2Net) | [modelscope/3D-Speaker](https://github.com/modelscope/3D-Speaker) | Apache-2.0 |
| `speaker_embedding_v2.onnx` | 68 MB | Speaker embedding (ERes2NetV2) | [modelscope/3D-Speaker](https://github.com/modelscope/3D-Speaker) | Apache-2.0 |
| `manifest.json` | < 1 KB | SHA-256 hashes of all models | this repo | Apache-2.0 |
## Important
These are NOT new models β€” this repository **redistributes existing models** in ONNX
format for convenient mobile delivery. The original authors retain all credit and
copyright. We did not train, fine-tune, or modify the model weights.
**Please cite the original projects, not this redistribution:**
- **GigaAM-v3** (ASR): Sber AI, SaluteDevices β€”
https://github.com/salute-developers/GigaAM
- **3D-Speaker** (CAM++, ERes2Net, ERes2NetV2): ModelScope, Alibaba β€”
https://github.com/modelscope/3D-Speaker
The ONNX conversions and runtime were prepared via [sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx)
(Apache-2.0).
## Why this redistribution
The ProtocolVoice mobile app needs to download these models on first run from a
mirror that:
- supports files larger than 100 MB without git-lfs limits,
- has fast CDN reachable from Russia,
- is the conventional hosting platform for ML models.
All redistributed files retain their original licenses. This README serves as
the required attribution under those licenses.
## How to use
Each model is loaded by [sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx) on
the device. The ProtocolVoice app:
1. Downloads each `.onnx` file by HTTP from
`https://huggingface.co/protocolvoice/asr-models/resolve/main/{filename}`,
2. Verifies SHA-256 against `manifest.json`,
3. Loads via sherpa-onnx for offline inference.
You can also use these files directly with sherpa-onnx in any project that
respects the original licenses.
## Verifying integrity
```python
import hashlib
with open("gigaam_v3_e2e_ctc_int8.onnx", "rb") as f:
print(hashlib.sha256(f.read()).hexdigest())
# expected: 0aacb41f70f0f5aaac4b45dd430337b9e16b180f22c72af04db8516e7609c3c0
```
Hashes for all files are in `manifest.json`.
## License
This repository's metadata, README, and packaging scripts are released under
**Apache-2.0**. Each model file remains under its original license (see the
table above). By using a model, you accept its original license β€” not just
this repository's.
## Removal request
If you are an author of one of the upstream projects and have any concerns
about this redistribution (attribution, hosting, anything else), please open
a discussion on this Hugging Face repo or email the maintainers β€” the files
will be amended or removed as requested.