yhaha commited on
Commit
55b2b59
·
verified ·
1 Parent(s): bbc4b0b

Upload ckpts/iic

Browse files
.gitattributes CHANGED
@@ -40,3 +40,8 @@ ckpts/hub/s3prl_s3prl_main/s3prl/downstream/voxceleb2_amsoftmax_segment_eval/cac
40
  ckpts/hub/s3prl_s3prl_main/s3prl/downstream/voxceleb2_amsoftmax_segment_eval/cache_wav_paths/cache_Voxceleb2.p filter=lfs diff=lfs merge=lfs -text
41
  ckpts/hub/s3prl_s3prl_main/s3prl/downstream/voxceleb2_amsoftmax_segment_eval/cache_wav_paths/cache_dev_segment.p filter=lfs diff=lfs merge=lfs -text
42
  ckpts/hub/s3prl_s3prl_main/s3prl/downstream/voxceleb2_amsoftmax_segment_eval/cache_wav_paths/cache_test_segment.p filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
40
  ckpts/hub/s3prl_s3prl_main/s3prl/downstream/voxceleb2_amsoftmax_segment_eval/cache_wav_paths/cache_Voxceleb2.p filter=lfs diff=lfs merge=lfs -text
41
  ckpts/hub/s3prl_s3prl_main/s3prl/downstream/voxceleb2_amsoftmax_segment_eval/cache_wav_paths/cache_dev_segment.p filter=lfs diff=lfs merge=lfs -text
42
  ckpts/hub/s3prl_s3prl_main/s3prl/downstream/voxceleb2_amsoftmax_segment_eval/cache_wav_paths/cache_test_segment.p filter=lfs diff=lfs merge=lfs -text
43
+ ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/dingding.jpg filter=lfs diff=lfs merge=lfs -text
44
+ ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/examples/speaker1_a_cn_16k.wav filter=lfs diff=lfs merge=lfs -text
45
+ ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/examples/speaker1_b_cn_16k.wav filter=lfs diff=lfs merge=lfs -text
46
+ ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/examples/speaker2_a_cn_16k.wav filter=lfs diff=lfs merge=lfs -text
47
+ ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/structure.png filter=lfs diff=lfs merge=lfs -text
ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/.mdl ADDED
Binary file (71 Bytes). View file
 
ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/.msc ADDED
Binary file (760 Bytes). View file
 
ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/.mv ADDED
@@ -0,0 +1 @@
 
 
1
+ Revision:v1.0.0,CreatedAt:1708583355
ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/README.md ADDED
@@ -0,0 +1,139 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tasks:
3
+ - speaker-verification
4
+ model_type:
5
+ - CAM++
6
+ domain:
7
+ - audio
8
+ frameworks:
9
+ - pytorch
10
+ backbone:
11
+ - CAM++
12
+ license: Apache License 2.0
13
+ language:
14
+ - cn
15
+ - en
16
+ tags:
17
+ - speaker verification
18
+ - CAM++
19
+ - 大规模中英文数据集训练
20
+ widgets:
21
+ - task: speaker-verification
22
+ model_revision: v1.0.0
23
+ inputs:
24
+ - type: audio
25
+ name: input
26
+ title: 音频
27
+ extendsParameters:
28
+ thr: 0.33
29
+ examples:
30
+ - name: 1
31
+ title: 示例1
32
+ inputs:
33
+ - name: enroll
34
+ data: git://examples/speaker1_a_cn_16k.wav
35
+ - name: input
36
+ data: git://examples/speaker1_b_cn_16k.wav
37
+ - name: 2
38
+ title: 示例2
39
+ inputs:
40
+ - name: enroll
41
+ data: git://examples/speaker1_a_cn_16k.wav
42
+ - name: input
43
+ data: git://examples/speaker2_a_cn_16k.wav
44
+ inferencespec:
45
+ cpu: 8 #CPU数量
46
+ memory: 1024
47
+ ---
48
+
49
+ # CAM++说话人识别模型
50
+ CAM++模型是基于密集连接时延神经网络的说话人识别模型,具有准确的说话人识别效果和更快的推理速度。该模型使用大规模的中英文说话人数据集进行训练,适用于中英文语种的说话人识别任务。
51
+ ## 模型简述
52
+ CAM++兼顾识别性能和推理效率,在公开的中文数据集CN-Celeb和英文数据集VoxCeleb上,相比主流的说话人识别模型ResNet34和ECAPA-TDNN,获得了更高的准确率,同时具有更快的推理速度。其模型结构如下图所示,整个模型包含两部分,残差卷积网络作为前端,时延神经网络结构作为主干。前端模块是2维卷积结构,用于提取更加局部和精细的时频特征。主干模块采用密集型连接,复用层级特征,提高计算效率。同时每一层中嵌入了一个轻量级的上下文相关的掩蔽(Context-aware Mask)模块,该模块通过多粒度的pooling操作提取不同尺度的上下文信息,生成的mask可以去除掉特征中的无关噪声,并保留关键的说话人信息。
53
+
54
+ <div align=center>
55
+ <img src="structure.png" width="400" />
56
+ </div>
57
+
58
+ 更详细的信息见
59
+ - 论文:[CAM++: A Fast and Efficient Network for Speaker Verification Using Context-Aware Masking](https://arxiv.org/abs/2303.00332)
60
+ - github项目地址:[3D-Speaker](https://github.com/alibaba-damo-academy/3D-Speaker)
61
+
62
+ ## 训练数据
63
+ 本模型使用大规模中文和英文说话人数据集进行训练。
64
+ ## 模型效果评估
65
+ 在CN-Celeb中文测试集和Voxceleb-O英文测试集的EER评测结果:
66
+ | Test set | EER | minDCF(p_target:0.01) |
67
+ |:-----:|:------:|:------:|
68
+ |CN-Celeb Test|5.98%|0.3805|
69
+ |Voxceleb-O|1.16%|0.1271|
70
+
71
+ # 如何快速体验模型效果
72
+ ## 在Notebook中体验
73
+ 对于有开发需求的使用者,特别推荐您使用Notebook进行离线处理。先登录ModelScope账号,点击模型页面右上角的“在Notebook中打开”按钮出现对话框,首次使用会提示您关联阿里云账号,按提示操作即可。关联账号后可进入选择启动实例界面,选择计算资源,建立实例,待实例创建完成后进入开发环境,输入api调用实例。
74
+ ```python
75
+ from modelscope.pipelines import pipeline
76
+ sv_pipeline = pipeline(
77
+ task='speaker-verification',
78
+ model='iic/speech_campplus_sv_zh_en_16k-common_advanced',
79
+ model_revision='v1.0.0'
80
+ )
81
+ speaker1_a_wav = 'https://modelscope.cn/api/v1/models/iic/speech_campplus_sv_zh_en_16k-common_advanced/repo?Revision=master&FilePath=examples/speaker1_a_cn_16k.wav'
82
+ speaker1_b_wav = 'https://modelscope.cn/api/v1/models/iic/speech_campplus_sv_zh_en_16k-common_advanced/repo?Revision=master&FilePath=examples/speaker1_b_cn_16k.wav'
83
+ speaker2_a_wav = 'https://modelscope.cn/api/v1/models/iic/speech_campplus_sv_zh_en_16k-common_advanced/repo?Revision=master&FilePath=examples/speaker2_a_cn_16k.wav'
84
+ # 相同说话人语音
85
+ result = sv_pipeline([speaker1_a_wav, speaker1_b_wav])
86
+ print(result)
87
+ # 不同说话人语音
88
+ result = sv_pipeline([speaker1_a_wav, speaker2_a_wav])
89
+ print(result)
90
+ # 可以自定义得分阈值来进行识别,阈值越高,判定为同一人的条件越严格
91
+ result = sv_pipeline([speaker1_a_wav, speaker2_a_wav], thr=0.33)
92
+ print(result)
93
+ # 可以传入output_emb参数,输出结果中就会包含提取到的说话人embedding
94
+ result = sv_pipeline([speaker1_a_wav, speaker2_a_wav], output_emb=True)
95
+ print(result['embs'], result['outputs'])
96
+ # 可以传入save_dir参数,提取到的说话人embedding会存储在save_dir目录中
97
+ result = sv_pipeline([speaker1_a_wav, speaker2_a_wav], save_dir='savePath/')
98
+ ```
99
+ ## 训练和测试自己的CAM++模型
100
+ 本项目已在[3D-Speaker](https://github.com/alibaba-damo-academy/3D-Speaker)开源了训练、测试和推理代码,使用者可按下面方式下载安装使用:
101
+ ```sh
102
+ git clone https://github.com/alibaba-damo-academy/3D-Speaker.git && cd 3D-Speaker
103
+ conda create -n 3D-Speaker python=3.8
104
+ conda activate 3D-Speaker
105
+ pip install -r requirements.txt
106
+ ```
107
+
108
+ 运行CAM++在VoxCeleb��上的训练样例
109
+ ```sh
110
+ cd egs/voxceleb/sv-cam++
111
+ # 需要在run.sh中提前配置训练使用的GPU信息,默认是4卡
112
+ bash run.sh
113
+ ```
114
+
115
+ ## 使用本预训练模型快速提取embedding
116
+ ```sh
117
+ pip install modelscope
118
+ cd 3D-Speaker
119
+ # 配置模型名称并指定wav路径,wav路径可以是单个wav,也可以包含多条wav路径的list文件
120
+ model_id=iic/speech_campplus_sv_zh_en_16k-common_advanced
121
+ # 提取embedding
122
+ python speakerlab/bin/infer_sv.py --model_id $model_id --wavs $wav_path
123
+ ```
124
+
125
+
126
+ # 相关论文以及引用信息
127
+ 如果你觉得这个该模型有所帮助,请引用下面的相关的论文
128
+ ```BibTeX
129
+ @article{cam++,
130
+ title={CAM++: A Fast and Efficient Network for Speaker Verification Using Context-Aware Masking},
131
+ author={Hui Wang and Siqi Zheng and Yafeng Chen and Luyao Cheng and Qian Chen},
132
+ journal={arXiv preprint arXiv:2303.00332},
133
+ }
134
+ ```
135
+
136
+ # 3D-Speaker 开发者社区钉钉群
137
+ <div align=left>
138
+ <img src="dingding.jpg" width="260" />
139
+ </div>
ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/campplus_cn_en_common.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:92f29b94e6948786a26778c9e302525d185bb08c8b9f5252ed98776902840199
3
+ size 28044640
ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/config.yaml ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # This is an example that demonstrates how to configure a model file.
2
+ # You can modify the configuration according to your own requirements.
3
+
4
+ # to print the register_table:
5
+ # from funasr.register import tables
6
+ # tables.print()
7
+
8
+ # network architecture
9
+ model: CAMPPlus
10
+ model_conf:
11
+ feat_dim: 80
12
+ embedding_size: 192
13
+ growth_rate: 32
14
+ bn_size: 4
15
+ init_channels: 128
16
+ config_str: 'batchnorm-relu'
17
+ memory_efficient: True
18
+ output_level: 'segment'
19
+
20
+ # frontend related
21
+ frontend: WavFrontend
22
+ frontend_conf:
23
+ fs: 16000
ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/configuration.json ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "framework": "pytorch",
3
+ "task": "speaker-verification",
4
+ "model_config": "config.yaml",
5
+ "model_file": "campplus_cn_en_common.pt",
6
+ "model": {
7
+ "type": "cam++-sv",
8
+ "model_config": {
9
+ "sample_rate": 16000,
10
+ "fbank_dim": 80,
11
+ "emb_size": 192
12
+ },
13
+ "pretrained_model": "campplus_cn_en_common.pt",
14
+ "yesOrno_thr": 0.33
15
+ },
16
+ "pipeline": {
17
+ "type": "speaker-verification"
18
+ },
19
+ "file_path_metas": {
20
+ "init_param":"campplus_cn_en_common.pt",
21
+ "config":"config.yaml"
22
+ }
23
+ }
ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/dingding.jpg ADDED

Git LFS Details

  • SHA256: e06e800d10edb766768dff0a1677b70715f5f517a58a05369a171cc9bb7499c0
  • Pointer size: 131 Bytes
  • Size of remote file: 184 kB
ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/examples/speaker1_a_cn_16k.wav ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5f20ce0ddc378ca3239d3ce864b1142726a46a1221ae553912e4e142045df58b
3
+ size 118932
ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/examples/speaker1_b_cn_16k.wav ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:20745dc08a4281894d146140b99b9ef7417ac681119b7f7202f553cdf1a85f65
3
+ size 157058
ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/examples/speaker2_a_cn_16k.wav ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8a6cffa452df32ef10503f7992f22ffcdd7f16c4e0273d13311bc5cdcb13abf4
3
+ size 170028
ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/quickstart.md ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ ---
3
+ ## 模型加载和推理
4
+ 更多关于模型加载和推理的问题参考[模型的推理Pipeline](https://modelscope.cn/docs/%E6%A8%A1%E5%9E%8B%E7%9A%84%E6%8E%A8%E7%90%86Pipeline)。
5
+
6
+ ```python
7
+ from modelscope.pipelines import pipeline
8
+ from modelscope.utils.constant import Tasks
9
+
10
+ p = pipeline('speaker-verification', 'iic/speech_campplus_sv_zh_en_16k-common_advanced')
11
+ ```
12
+
13
+ 提供input输入
14
+ ```python
15
+ wav1 = 'https://modelscope.cn/api/v1/models/iic/speech_campplus_sv_zh_en_16k-common_advanced/repo?Revision=master&FilePath=examples/speaker1_a_cn_16k.wav'
16
+ wav2 = 'https://modelscope.cn/api/v1/models/iic/speech_campplus_sv_zh_en_16k-common_advanced/repo?Revision=master&FilePath=examples/speaker1_b_cn_16k.wav'
17
+ p([wav1, wav2])
18
+ ```
19
+
20
+ 可以自定义阈值,阈值越高,判断为同一个说话人的条件越严格
21
+ ```python
22
+ wav1 = 'https://modelscope.cn/api/v1/models/iic/speech_campplus_sv_zh_en_16k-common_advanced/repo?Revision=master&FilePath=examples/speaker1_a_cn_16k.wav'
23
+ wav2 = 'https://modelscope.cn/api/v1/models/iic/speech_campplus_sv_zh_en_16k-common_advanced/repo?Revision=master&FilePath=examples/speaker1_b_cn_16k.wav'
24
+ p([wav1, wav2], thr=0.33)
25
+ ```
26
+
27
+ 更多使用说明请参阅[ModelScope文档中心](http://www.modelscope.cn/#/docs)。
28
+ ---
29
+
30
+ ---
31
+ ## 下载并安装ModelScope library
32
+ 更多关于下载安装ModelScope library的问题参考[环境安装](https://modelscope.cn/docs/%E7%8E%AF%E5%A2%83%E5%AE%89%E8%A3%85)。
33
+
34
+ ```python
35
+ pip install "modelscope[audio]" -f https://modelscope.oss-cn-beijing.aliyuncs.com/releases/repo.html
36
+ ```
ckpts/iic/speech_campplus_sv_zh_en_16k-common_advanced/structure.png ADDED

Git LFS Details

  • SHA256: 1ff916275cbfe40e1e5584ef66f81b776ef992e9997d8658328394d023dba1b8
  • Pointer size: 131 Bytes
  • Size of remote file: 286 kB