Delete ckpts/iic
Browse files- ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/.mdl +0 -0
- ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/.msc +0 -0
- ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/.mv +0 -1
- ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/README.md +0 -139
- ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/campplus_cn_en_common.pt +0 -3
- ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/config.yaml +0 -23
- ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/configuration.json +0 -23
- ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/dingding.jpg +0 -3
- ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/examples/speaker1_a_cn_16k.wav +0 -3
- ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/examples/speaker1_b_cn_16k.wav +0 -3
- ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/examples/speaker2_a_cn_16k.wav +0 -3
- ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/quickstart.md +0 -36
- ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/structure.png +0 -3
ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/.mdl
DELETED
|
Binary file (71 Bytes)
|
|
|
ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/.msc
DELETED
|
Binary file (760 Bytes)
|
|
|
ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/.mv
DELETED
|
@@ -1 +0,0 @@
|
|
| 1 |
-
Revision:v1.0.0,CreatedAt:1708583355
|
|
|
|
|
|
ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/README.md
DELETED
|
@@ -1,139 +0,0 @@
|
|
| 1 |
-
---
|
| 2 |
-
tasks:
|
| 3 |
-
- speaker-verification
|
| 4 |
-
model_type:
|
| 5 |
-
- CAM++
|
| 6 |
-
domain:
|
| 7 |
-
- audio
|
| 8 |
-
frameworks:
|
| 9 |
-
- pytorch
|
| 10 |
-
backbone:
|
| 11 |
-
- CAM++
|
| 12 |
-
license: Apache License 2.0
|
| 13 |
-
language:
|
| 14 |
-
- cn
|
| 15 |
-
- en
|
| 16 |
-
tags:
|
| 17 |
-
- speaker verification
|
| 18 |
-
- CAM++
|
| 19 |
-
- 大规模中英文数据集训练
|
| 20 |
-
widgets:
|
| 21 |
-
- task: speaker-verification
|
| 22 |
-
model_revision: v1.0.0
|
| 23 |
-
inputs:
|
| 24 |
-
- type: audio
|
| 25 |
-
name: input
|
| 26 |
-
title: 音频
|
| 27 |
-
extendsParameters:
|
| 28 |
-
thr: 0.33
|
| 29 |
-
examples:
|
| 30 |
-
- name: 1
|
| 31 |
-
title: 示例1
|
| 32 |
-
inputs:
|
| 33 |
-
- name: enroll
|
| 34 |
-
data: git://examples/speaker1_a_cn_16k.wav
|
| 35 |
-
- name: input
|
| 36 |
-
data: git://examples/speaker1_b_cn_16k.wav
|
| 37 |
-
- name: 2
|
| 38 |
-
title: 示例2
|
| 39 |
-
inputs:
|
| 40 |
-
- name: enroll
|
| 41 |
-
data: git://examples/speaker1_a_cn_16k.wav
|
| 42 |
-
- name: input
|
| 43 |
-
data: git://examples/speaker2_a_cn_16k.wav
|
| 44 |
-
inferencespec:
|
| 45 |
-
cpu: 8 #CPU数量
|
| 46 |
-
memory: 1024
|
| 47 |
-
---
|
| 48 |
-
|
| 49 |
-
# CAM++说话人识别模型
|
| 50 |
-
CAM++模型是基于密集连接时延神经网络的说话人识别模型,具有准确的说话人识别效果和更快的推理速度。该模型使用大规模的中英文说话人数据集进行训练,适用于中英文语种的说话人识别任务。
|
| 51 |
-
## 模型简述
|
| 52 |
-
CAM++兼顾识别性能和推理效率,在公开的中文数据集CN-Celeb和英文数据集VoxCeleb上,相比主流的说话人识别模型ResNet34和ECAPA-TDNN,获得了更高的准确率,同时具有更快的推理速度。其模型结构如下图所示,整个模型包含两部分,残差卷积网络作为前端,时延神经网络结构作为主干。前端模块是2维卷积结构,用于提取更加局部和精细的时频特征。主干模块采用密集型连接,复用层级特征,提高计算效率。同时每一层中嵌入了一个轻量级的上下文相关的掩蔽(Context-aware Mask)模块,该模块通过多粒度的pooling操作提取不同尺度的上下文信息,生成的mask可以去除掉特征中的无关噪声,并保留关键的说话人信息。
|
| 53 |
-
|
| 54 |
-
<div align=center>
|
| 55 |
-
<img src="structure.png" width="400" />
|
| 56 |
-
</div>
|
| 57 |
-
|
| 58 |
-
更详细的信息见
|
| 59 |
-
- 论文:[CAM++: A Fast and Efficient Network for Speaker Verification Using Context-Aware Masking](https://arxiv.org/abs/2303.00332)
|
| 60 |
-
- github项目地址:[3D-Speaker](https://github.com/alibaba-damo-academy/3D-Speaker)
|
| 61 |
-
|
| 62 |
-
## 训练数据
|
| 63 |
-
本模型使用大规模中文和英文说话人数据集进行训练。
|
| 64 |
-
## 模型效果评估
|
| 65 |
-
在CN-Celeb中文测试集和Voxceleb-O英文测试集的EER评测结果:
|
| 66 |
-
| Test set | EER | minDCF(p_target:0.01) |
|
| 67 |
-
|:-----:|:------:|:------:|
|
| 68 |
-
|CN-Celeb Test|5.98%|0.3805|
|
| 69 |
-
|Voxceleb-O|1.16%|0.1271|
|
| 70 |
-
|
| 71 |
-
# 如何快速体验模型效果
|
| 72 |
-
## 在Notebook中体验
|
| 73 |
-
对于有开发需求的使用者,特别推荐您使用Notebook进行离线处理。先登录ModelScope账号,点击模型页面右上角的“在Notebook中打开”按钮出现对话框,首次使用会提示您关联阿里云账号,按提示操作即可。关联账号后可进入选择启动实例界面,选择计算资源,建立实例,待实例创建完成后进入开发环境,输入api调用实例。
|
| 74 |
-
```python
|
| 75 |
-
from modelscope.pipelines import pipeline
|
| 76 |
-
sv_pipeline = pipeline(
|
| 77 |
-
task='speaker-verification',
|
| 78 |
-
model='iic/speech_campplus_sv_zh_en_16k-common_advanced',
|
| 79 |
-
model_revision='v1.0.0'
|
| 80 |
-
)
|
| 81 |
-
speaker1_a_wav = 'https://modelscope.cn/api/v1/models/iic/speech_campplus_sv_zh_en_16k-common_advanced/repo?Revision=master&FilePath=examples/speaker1_a_cn_16k.wav'
|
| 82 |
-
speaker1_b_wav = 'https://modelscope.cn/api/v1/models/iic/speech_campplus_sv_zh_en_16k-common_advanced/repo?Revision=master&FilePath=examples/speaker1_b_cn_16k.wav'
|
| 83 |
-
speaker2_a_wav = 'https://modelscope.cn/api/v1/models/iic/speech_campplus_sv_zh_en_16k-common_advanced/repo?Revision=master&FilePath=examples/speaker2_a_cn_16k.wav'
|
| 84 |
-
# 相同说话人语音
|
| 85 |
-
result = sv_pipeline([speaker1_a_wav, speaker1_b_wav])
|
| 86 |
-
print(result)
|
| 87 |
-
# 不同说话人语音
|
| 88 |
-
result = sv_pipeline([speaker1_a_wav, speaker2_a_wav])
|
| 89 |
-
print(result)
|
| 90 |
-
# 可以自定义得分阈值来进行识别,阈值越高,判定为同一人的条件越严格
|
| 91 |
-
result = sv_pipeline([speaker1_a_wav, speaker2_a_wav], thr=0.33)
|
| 92 |
-
print(result)
|
| 93 |
-
# 可以传入output_emb参数,输出结果中就会包含提取到的说话人embedding
|
| 94 |
-
result = sv_pipeline([speaker1_a_wav, speaker2_a_wav], output_emb=True)
|
| 95 |
-
print(result['embs'], result['outputs'])
|
| 96 |
-
# 可以传入save_dir参数,提取到的说话人embedding会存储在save_dir目录中
|
| 97 |
-
result = sv_pipeline([speaker1_a_wav, speaker2_a_wav], save_dir='savePath/')
|
| 98 |
-
```
|
| 99 |
-
## 训练和测试自己的CAM++模型
|
| 100 |
-
本项目已在[3D-Speaker](https://github.com/alibaba-damo-academy/3D-Speaker)开源了训练、测试和推理代码,使用者可按下面方式下载安装使用:
|
| 101 |
-
```sh
|
| 102 |
-
git clone https://github.com/alibaba-damo-academy/3D-Speaker.git && cd 3D-Speaker
|
| 103 |
-
conda create -n 3D-Speaker python=3.8
|
| 104 |
-
conda activate 3D-Speaker
|
| 105 |
-
pip install -r requirements.txt
|
| 106 |
-
```
|
| 107 |
-
|
| 108 |
-
运行CAM++在VoxCeleb��上的训练样例
|
| 109 |
-
```sh
|
| 110 |
-
cd egs/voxceleb/sv-cam++
|
| 111 |
-
# 需要在run.sh中提前配置训练使用的GPU信息,默认是4卡
|
| 112 |
-
bash run.sh
|
| 113 |
-
```
|
| 114 |
-
|
| 115 |
-
## 使用本预训练模型快速提取embedding
|
| 116 |
-
```sh
|
| 117 |
-
pip install modelscope
|
| 118 |
-
cd 3D-Speaker
|
| 119 |
-
# 配置模型名称并指定wav路径,wav路径可以是单个wav,也可以包含多条wav路径的list文件
|
| 120 |
-
model_id=iic/speech_campplus_sv_zh_en_16k-common_advanced
|
| 121 |
-
# 提取embedding
|
| 122 |
-
python speakerlab/bin/infer_sv.py --model_id $model_id --wavs $wav_path
|
| 123 |
-
```
|
| 124 |
-
|
| 125 |
-
|
| 126 |
-
# 相关论文以及引用信息
|
| 127 |
-
如果你觉得这个该模型有所帮助,请引用下面的相关的论文
|
| 128 |
-
```BibTeX
|
| 129 |
-
@article{cam++,
|
| 130 |
-
title={CAM++: A Fast and Efficient Network for Speaker Verification Using Context-Aware Masking},
|
| 131 |
-
author={Hui Wang and Siqi Zheng and Yafeng Chen and Luyao Cheng and Qian Chen},
|
| 132 |
-
journal={arXiv preprint arXiv:2303.00332},
|
| 133 |
-
}
|
| 134 |
-
```
|
| 135 |
-
|
| 136 |
-
# 3D-Speaker 开发者社区钉钉群
|
| 137 |
-
<div align=left>
|
| 138 |
-
<img src="dingding.jpg" width="260" />
|
| 139 |
-
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/campplus_cn_en_common.pt
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:92f29b94e6948786a26778c9e302525d185bb08c8b9f5252ed98776902840199
|
| 3 |
-
size 28044640
|
|
|
|
|
|
|
|
|
|
|
|
ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/config.yaml
DELETED
|
@@ -1,23 +0,0 @@
|
|
| 1 |
-
# This is an example that demonstrates how to configure a model file.
|
| 2 |
-
# You can modify the configuration according to your own requirements.
|
| 3 |
-
|
| 4 |
-
# to print the register_table:
|
| 5 |
-
# from funasr.register import tables
|
| 6 |
-
# tables.print()
|
| 7 |
-
|
| 8 |
-
# network architecture
|
| 9 |
-
model: CAMPPlus
|
| 10 |
-
model_conf:
|
| 11 |
-
feat_dim: 80
|
| 12 |
-
embedding_size: 192
|
| 13 |
-
growth_rate: 32
|
| 14 |
-
bn_size: 4
|
| 15 |
-
init_channels: 128
|
| 16 |
-
config_str: 'batchnorm-relu'
|
| 17 |
-
memory_efficient: True
|
| 18 |
-
output_level: 'segment'
|
| 19 |
-
|
| 20 |
-
# frontend related
|
| 21 |
-
frontend: WavFrontend
|
| 22 |
-
frontend_conf:
|
| 23 |
-
fs: 16000
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/configuration.json
DELETED
|
@@ -1,23 +0,0 @@
|
|
| 1 |
-
{
|
| 2 |
-
"framework": "pytorch",
|
| 3 |
-
"task": "speaker-verification",
|
| 4 |
-
"model_config": "config.yaml",
|
| 5 |
-
"model_file": "campplus_cn_en_common.pt",
|
| 6 |
-
"model": {
|
| 7 |
-
"type": "cam++-sv",
|
| 8 |
-
"model_config": {
|
| 9 |
-
"sample_rate": 16000,
|
| 10 |
-
"fbank_dim": 80,
|
| 11 |
-
"emb_size": 192
|
| 12 |
-
},
|
| 13 |
-
"pretrained_model": "campplus_cn_en_common.pt",
|
| 14 |
-
"yesOrno_thr": 0.33
|
| 15 |
-
},
|
| 16 |
-
"pipeline": {
|
| 17 |
-
"type": "speaker-verification"
|
| 18 |
-
},
|
| 19 |
-
"file_path_metas": {
|
| 20 |
-
"init_param":"campplus_cn_en_common.pt",
|
| 21 |
-
"config":"config.yaml"
|
| 22 |
-
}
|
| 23 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/dingding.jpg
DELETED
Git LFS Details
|
ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/examples/speaker1_a_cn_16k.wav
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:5f20ce0ddc378ca3239d3ce864b1142726a46a1221ae553912e4e142045df58b
|
| 3 |
-
size 118932
|
|
|
|
|
|
|
|
|
|
|
|
ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/examples/speaker1_b_cn_16k.wav
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:20745dc08a4281894d146140b99b9ef7417ac681119b7f7202f553cdf1a85f65
|
| 3 |
-
size 157058
|
|
|
|
|
|
|
|
|
|
|
|
ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/examples/speaker2_a_cn_16k.wav
DELETED
|
@@ -1,3 +0,0 @@
|
|
| 1 |
-
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:8a6cffa452df32ef10503f7992f22ffcdd7f16c4e0273d13311bc5cdcb13abf4
|
| 3 |
-
size 170028
|
|
|
|
|
|
|
|
|
|
|
|
ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/quickstart.md
DELETED
|
@@ -1,36 +0,0 @@
|
|
| 1 |
-
---
|
| 2 |
-
---
|
| 3 |
-
## 模型加载和推理
|
| 4 |
-
更多关于模型加载和推理的问题参考[模型的推理Pipeline](https://modelscope.cn/docs/%E6%A8%A1%E5%9E%8B%E7%9A%84%E6%8E%A8%E7%90%86Pipeline)。
|
| 5 |
-
|
| 6 |
-
```python
|
| 7 |
-
from modelscope.pipelines import pipeline
|
| 8 |
-
from modelscope.utils.constant import Tasks
|
| 9 |
-
|
| 10 |
-
p = pipeline('speaker-verification', 'iic/speech_campplus_sv_zh_en_16k-common_advanced')
|
| 11 |
-
```
|
| 12 |
-
|
| 13 |
-
提供input输入
|
| 14 |
-
```python
|
| 15 |
-
wav1 = 'https://modelscope.cn/api/v1/models/iic/speech_campplus_sv_zh_en_16k-common_advanced/repo?Revision=master&FilePath=examples/speaker1_a_cn_16k.wav'
|
| 16 |
-
wav2 = 'https://modelscope.cn/api/v1/models/iic/speech_campplus_sv_zh_en_16k-common_advanced/repo?Revision=master&FilePath=examples/speaker1_b_cn_16k.wav'
|
| 17 |
-
p([wav1, wav2])
|
| 18 |
-
```
|
| 19 |
-
|
| 20 |
-
可以自定义阈值,阈值越高,判断为同一个说话人的条件越严格
|
| 21 |
-
```python
|
| 22 |
-
wav1 = 'https://modelscope.cn/api/v1/models/iic/speech_campplus_sv_zh_en_16k-common_advanced/repo?Revision=master&FilePath=examples/speaker1_a_cn_16k.wav'
|
| 23 |
-
wav2 = 'https://modelscope.cn/api/v1/models/iic/speech_campplus_sv_zh_en_16k-common_advanced/repo?Revision=master&FilePath=examples/speaker1_b_cn_16k.wav'
|
| 24 |
-
p([wav1, wav2], thr=0.33)
|
| 25 |
-
```
|
| 26 |
-
|
| 27 |
-
更多使用说明请参阅[ModelScope文档中心](http://www.modelscope.cn/#/docs)。
|
| 28 |
-
---
|
| 29 |
-
|
| 30 |
-
---
|
| 31 |
-
## 下载并安装ModelScope library
|
| 32 |
-
更多关于下载安装ModelScope library的问题参考[环境安装](https://modelscope.cn/docs/%E7%8E%AF%E5%A2%83%E5%AE%89%E8%A3%85)。
|
| 33 |
-
|
| 34 |
-
```python
|
| 35 |
-
pip install "modelscope[audio]" -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html
|
| 36 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/structure.png
DELETED
Git LFS Details
|