hbredin commited on
Commit
342711a
Β·
verified Β·
1 Parent(s): 34c21ef

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -1
README.md CHANGED
@@ -9,7 +9,7 @@ pinned: false
9
 
10
  ![Identify who speaks when with pyannote](https://github.com/pyannote/.github/raw/main/profile/banner.jpg)
11
 
12
- ## πŸ’šΒ Simply detect, segment, label, and separate speakers in any language
13
 
14
  <div align="center">
15
  <a href="https://github.com/pyannote/pyannote-audio"><img alt="Github" src="https://img.shields.io/badge/Open%20source%20toolkit-059669?style=flat&logo=github&logoColor=FFFFFF"></a>
@@ -22,6 +22,8 @@ pinned: false
22
 
23
  </div>
24
 
 
 
25
  ### 🎀 What is speaker diarization?
26
 
27
  ![Diarization](https://github.com/pyannote/.github/raw/main/profile/diarization.jpg)
@@ -79,6 +81,8 @@ Read [`community-1` model card](https://hf.co/pyannote/speaker-diarization-commu
79
 
80
  __[Diarization error rate](http://pyannote.github.io/pyannote-metrics/reference.html#diarization) (in %, the lower, the better)__
81
 
 
 
82
  ### ⏩️ Going further, better, and faster
83
 
84
  [`precision-2`](https://www.pyannote.ai/blog/precision-2) premium model further improves accuracy, processing speed, as well as brings additional features.
@@ -91,6 +95,7 @@ __[Diarization error rate](http://pyannote.github.io/pyannote-metrics/reference.
91
  | Speaker confidence scores | ❌ | βœ… |
92
  | Voiceprinting | ❌ | βœ… |
93
  | Speaker identification | ❌ | βœ… |
 
94
  | Time to process 1h of audio (on H100) | 37s | 14s |
95
 
96
 
@@ -101,3 +106,31 @@ Create a [`pyannoteAI`](https://dashboard.pyannote.ai) account, change one line
101
  pipeline = Pipeline.from_pretrained('pyannote/speaker-diarization-precision-2', token="PYANNOTEAI_API_KEY")
102
  better_output = pipeline('/path/to/audio.wav')
103
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
 
10
  ![Identify who speaks when with pyannote](https://github.com/pyannote/.github/raw/main/profile/banner.jpg)
11
 
12
+ ## πŸ’š Simply detect, segment, label, and separate speakers in any language
13
 
14
  <div align="center">
15
  <a href="https://github.com/pyannote/pyannote-audio"><img alt="Github" src="https://img.shields.io/badge/Open%20source%20toolkit-059669?style=flat&logo=github&logoColor=FFFFFF"></a>
 
22
 
23
  </div>
24
 
25
+ [pyannoteAI](https://www.pyannote.ai/) facilitates the understanding of speakers and conversation context. We focus on identifying speakers and conversation metadata under conditions that reflect real conversations rather than controlled recordings.
26
+
27
  ### 🎀 What is speaker diarization?
28
 
29
  ![Diarization](https://github.com/pyannote/.github/raw/main/profile/diarization.jpg)
 
81
 
82
  __[Diarization error rate](http://pyannote.github.io/pyannote-metrics/reference.html#diarization) (in %, the lower, the better)__
83
 
84
+ Our models achieve competitive performance across multiple public diarization datasets, explore pyannoteAI performance benchmark ➑️ [https://www.pyannote.ai/benchmark](https://www.pyannote.ai/benchmark)
85
+
86
  ### ⏩️ Going further, better, and faster
87
 
88
  [`precision-2`](https://www.pyannote.ai/blog/precision-2) premium model further improves accuracy, processing speed, as well as brings additional features.
 
95
  | Speaker confidence scores | ❌ | βœ… |
96
  | Voiceprinting | ❌ | βœ… |
97
  | Speaker identification | ❌ | βœ… |
98
+ | STT Orchestration | ❌ | βœ… |
99
  | Time to process 1h of audio (on H100) | 37s | 14s |
100
 
101
 
 
106
  pipeline = Pipeline.from_pretrained('pyannote/speaker-diarization-precision-2', token="PYANNOTEAI_API_KEY")
107
  better_output = pipeline('/path/to/audio.wav')
108
  ```
109
+ ### πŸ”Œ Get speaker-attributed transcripts
110
+
111
+ We host open-source transcription models like [**Nvidia Parakeet-tdt-0.6b-v3**](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3) and [**OpenAI whisper-large-v3-turbo**](https://huggingface.co/dropbox-dash/faster-whisper-large-v3-turbo) with specialized STT + diarization reconciliation logic for speaker-attributed transcripts.
112
+
113
+ STT orchestration orchestrates pyannoteAI diarization `Precision-2` with transcription services. Instead of running diarization and transcription separately, then reconciling outputs manually, you make one API call and receive speaker-attributed transcripts.
114
+
115
+ ![STT Orchestration](https://github.com/pyannote/.github/raw/main/profile/stt-orchestration.png)
116
+
117
+ To use this feature, make a request to the diarize API endpoint with the `transcription:true` flag.
118
+
119
+ ```python
120
+ # pip install pyannoteai-sdk
121
+
122
+ from pyannoteai.sdk import Client
123
+ client = Client("your-api-key")
124
+
125
+ job_id = client.diarize(
126
+ "[https://www.example/audio.wav](https://www.example/audio.wav)",
127
+ transcription=True)
128
+
129
+ job_output = client.retrieve(job_id)
130
+
131
+ for word in job_output['output']['wordLevelTranscription']:
132
+ print(word['start'], word['end'], word['speaker'], word['text'])
133
+
134
+ for turn in job_output['output']['turnLevelTranscription']:
135
+ print(turn['start'], turn['end'], turn['speaker'], turn['text'])
136
+