Add full attribution README for upstream models (GigaAM, 3D-Speaker)
Browse files
README.md
CHANGED
|
@@ -1,3 +1,99 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- ru
|
| 5 |
+
tags:
|
| 6 |
+
- automatic-speech-recognition
|
| 7 |
+
- speaker-diarization
|
| 8 |
+
- onnx
|
| 9 |
+
- russian
|
| 10 |
+
- asr
|
| 11 |
+
- gigaam
|
| 12 |
+
- 3d-speaker
|
| 13 |
+
- camplus
|
| 14 |
+
- eres2net
|
| 15 |
+
- mobile
|
| 16 |
+
- offline
|
| 17 |
+
library_name: onnx
|
| 18 |
---
|
| 19 |
+
|
| 20 |
+
# ProtocolVoice ASR Models
|
| 21 |
+
|
| 22 |
+
ONNX models for offline Russian speech recognition and speaker diarization,
|
| 23 |
+
packaged for the [ProtocolVoice](https://github.com/protocolvoice) Android app.
|
| 24 |
+
|
| 25 |
+
## Contents
|
| 26 |
+
|
| 27 |
+
| File | Size | Purpose | Original source | Original license |
|
| 28 |
+
|---|---|---|---|---|
|
| 29 |
+
| `gigaam_v3_e2e_ctc_int8.onnx` | 305 MB | Russian ASR with built-in punctuation | [Sber/SaluteDevices GigaAM](https://github.com/salute-developers/GigaAM) (v3, e2e CTC, int8-quantized) | MIT |
|
| 30 |
+
| `speaker_embedding_camplus.onnx` | 27 MB | Speaker embedding (CAM++) | [modelscope/3D-Speaker](https://github.com/modelscope/3D-Speaker) | Apache-2.0 |
|
| 31 |
+
| `speaker_embedding.onnx` | 111 MB | Speaker embedding (ERes2Net) | [modelscope/3D-Speaker](https://github.com/modelscope/3D-Speaker) | Apache-2.0 |
|
| 32 |
+
| `speaker_embedding_v2.onnx` | 68 MB | Speaker embedding (ERes2NetV2) | [modelscope/3D-Speaker](https://github.com/modelscope/3D-Speaker) | Apache-2.0 |
|
| 33 |
+
| `manifest.json` | < 1 KB | SHA-256 hashes of all models | this repo | Apache-2.0 |
|
| 34 |
+
|
| 35 |
+
## Important
|
| 36 |
+
|
| 37 |
+
These are NOT new models — this repository **redistributes existing models** in ONNX
|
| 38 |
+
format for convenient mobile delivery. The original authors retain all credit and
|
| 39 |
+
copyright. We did not train, fine-tune, or modify the model weights.
|
| 40 |
+
|
| 41 |
+
**Please cite the original projects, not this redistribution:**
|
| 42 |
+
|
| 43 |
+
- **GigaAM-v3** (ASR): Sber AI, SaluteDevices —
|
| 44 |
+
https://github.com/salute-developers/GigaAM
|
| 45 |
+
- **3D-Speaker** (CAM++, ERes2Net, ERes2NetV2): ModelScope, Alibaba —
|
| 46 |
+
https://github.com/modelscope/3D-Speaker
|
| 47 |
+
|
| 48 |
+
The ONNX conversions and runtime were prepared via [sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx)
|
| 49 |
+
(Apache-2.0).
|
| 50 |
+
|
| 51 |
+
## Why this redistribution
|
| 52 |
+
|
| 53 |
+
The ProtocolVoice mobile app needs to download these models on first run from a
|
| 54 |
+
mirror that:
|
| 55 |
+
- supports files larger than 100 MB without git-lfs limits,
|
| 56 |
+
- has fast CDN reachable from Russia,
|
| 57 |
+
- is the conventional hosting platform for ML models.
|
| 58 |
+
|
| 59 |
+
All redistributed files retain their original licenses. This README serves as
|
| 60 |
+
the required attribution under those licenses.
|
| 61 |
+
|
| 62 |
+
## How to use
|
| 63 |
+
|
| 64 |
+
Each model is loaded by [sherpa-onnx](https://github.com/k2-fsa/sherpa-onnx) on
|
| 65 |
+
the device. The ProtocolVoice app:
|
| 66 |
+
|
| 67 |
+
1. Downloads each `.onnx` file by HTTP from
|
| 68 |
+
`https://huggingface.co/protocolvoice/asr-models/resolve/main/{filename}`,
|
| 69 |
+
2. Verifies SHA-256 against `manifest.json`,
|
| 70 |
+
3. Loads via sherpa-onnx for offline inference.
|
| 71 |
+
|
| 72 |
+
You can also use these files directly with sherpa-onnx in any project that
|
| 73 |
+
respects the original licenses.
|
| 74 |
+
|
| 75 |
+
## Verifying integrity
|
| 76 |
+
|
| 77 |
+
```python
|
| 78 |
+
import hashlib
|
| 79 |
+
|
| 80 |
+
with open("gigaam_v3_e2e_ctc_int8.onnx", "rb") as f:
|
| 81 |
+
print(hashlib.sha256(f.read()).hexdigest())
|
| 82 |
+
# expected: 0aacb41f70f0f5aaac4b45dd430337b9e16b180f22c72af04db8516e7609c3c0
|
| 83 |
+
```
|
| 84 |
+
|
| 85 |
+
Hashes for all files are in `manifest.json`.
|
| 86 |
+
|
| 87 |
+
## License
|
| 88 |
+
|
| 89 |
+
This repository's metadata, README, and packaging scripts are released under
|
| 90 |
+
**Apache-2.0**. Each model file remains under its original license (see the
|
| 91 |
+
table above). By using a model, you accept its original license — not just
|
| 92 |
+
this repository's.
|
| 93 |
+
|
| 94 |
+
## Removal request
|
| 95 |
+
|
| 96 |
+
If you are an author of one of the upstream projects and have any concerns
|
| 97 |
+
about this redistribution (attribution, hosting, anything else), please open
|
| 98 |
+
a discussion on this Hugging Face repo or email the maintainers — the files
|
| 99 |
+
will be amended or removed as requested.
|