Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -9,7 +9,7 @@ pinned: false
|
|
| 9 |
|
| 10 |

|
| 11 |
|
| 12 |
-
## π
|
| 13 |
|
| 14 |
<div align="center">
|
| 15 |
<a href="https://github.com/pyannote/pyannote-audio"><img alt="Github" src="https://img.shields.io/badge/Open%20source%20toolkit-059669?style=flat&logo=github&logoColor=FFFFFF"></a>
|
|
@@ -22,6 +22,8 @@ pinned: false
|
|
| 22 |
|
| 23 |
</div>
|
| 24 |
|
|
|
|
|
|
|
| 25 |
### π€ What is speaker diarization?
|
| 26 |
|
| 27 |

|
|
@@ -79,6 +81,8 @@ Read [`community-1` model card](https://hf.co/pyannote/speaker-diarization-commu
|
|
| 79 |
|
| 80 |
__[Diarization error rate](http://pyannote.github.io/pyannote-metrics/reference.html#diarization) (in %, the lower, the better)__
|
| 81 |
|
|
|
|
|
|
|
| 82 |
### β©οΈ Going further, better, and faster
|
| 83 |
|
| 84 |
[`precision-2`](https://www.pyannote.ai/blog/precision-2) premium model further improves accuracy, processing speed, as well as brings additional features.
|
|
@@ -91,6 +95,7 @@ __[Diarization error rate](http://pyannote.github.io/pyannote-metrics/reference.
|
|
| 91 |
| Speaker confidence scores | β | β
|
|
| 92 |
| Voiceprinting | β | β
|
|
| 93 |
| Speaker identification | β | β
|
|
|
|
|
| 94 |
| Time to process 1h of audio (on H100) | 37s | 14s |
|
| 95 |
|
| 96 |
|
|
@@ -101,3 +106,31 @@ Create a [`pyannoteAI`](https://dashboard.pyannote.ai) account, change one line
|
|
| 101 |
pipeline = Pipeline.from_pretrained('pyannote/speaker-diarization-precision-2', token="PYANNOTEAI_API_KEY")
|
| 102 |
better_output = pipeline('/path/to/audio.wav')
|
| 103 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
|
| 10 |

|
| 11 |
|
| 12 |
+
## π Simply detect, segment, label, and separate speakers in any language
|
| 13 |
|
| 14 |
<div align="center">
|
| 15 |
<a href="https://github.com/pyannote/pyannote-audio"><img alt="Github" src="https://img.shields.io/badge/Open%20source%20toolkit-059669?style=flat&logo=github&logoColor=FFFFFF"></a>
|
|
|
|
| 22 |
|
| 23 |
</div>
|
| 24 |
|
| 25 |
+
[pyannoteAI](https://www.pyannote.ai/) facilitates the understanding of speakers and conversation context. We focus on identifying speakers and conversation metadata under conditions that reflect real conversations rather than controlled recordings.
|
| 26 |
+
|
| 27 |
### π€ What is speaker diarization?
|
| 28 |
|
| 29 |

|
|
|
|
| 81 |
|
| 82 |
__[Diarization error rate](http://pyannote.github.io/pyannote-metrics/reference.html#diarization) (in %, the lower, the better)__
|
| 83 |
|
| 84 |
+
Our models achieve competitive performance across multiple public diarization datasets, explore pyannoteAI performance benchmark β‘οΈ [https://www.pyannote.ai/benchmark](https://www.pyannote.ai/benchmark)
|
| 85 |
+
|
| 86 |
### β©οΈ Going further, better, and faster
|
| 87 |
|
| 88 |
[`precision-2`](https://www.pyannote.ai/blog/precision-2) premium model further improves accuracy, processing speed, as well as brings additional features.
|
|
|
|
| 95 |
| Speaker confidence scores | β | β
|
|
| 96 |
| Voiceprinting | β | β
|
|
| 97 |
| Speaker identification | β | β
|
|
| 98 |
+
| STT Orchestration | β | β
|
|
| 99 |
| Time to process 1h of audio (on H100) | 37s | 14s |
|
| 100 |
|
| 101 |
|
|
|
|
| 106 |
pipeline = Pipeline.from_pretrained('pyannote/speaker-diarization-precision-2', token="PYANNOTEAI_API_KEY")
|
| 107 |
better_output = pipeline('/path/to/audio.wav')
|
| 108 |
```
|
| 109 |
+
### π Get speaker-attributed transcripts
|
| 110 |
+
|
| 111 |
+
We host open-source transcription models like [**Nvidia Parakeet-tdt-0.6b-v3**](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3) and [**OpenAI whisper-large-v3-turbo**](https://huggingface.co/dropbox-dash/faster-whisper-large-v3-turbo) with specialized STT + diarization reconciliation logic for speaker-attributed transcripts.
|
| 112 |
+
|
| 113 |
+
STT orchestration orchestrates pyannoteAI diarization `Precision-2` with transcription services. Instead of running diarization and transcription separately, then reconciling outputs manually, you make one API call and receive speaker-attributed transcripts.
|
| 114 |
+
|
| 115 |
+

|
| 116 |
+
|
| 117 |
+
To use this feature, make a request to the diarize API endpoint with the `transcription:true` flag.
|
| 118 |
+
|
| 119 |
+
```python
|
| 120 |
+
# pip install pyannoteai-sdk
|
| 121 |
+
|
| 122 |
+
from pyannoteai.sdk import Client
|
| 123 |
+
client = Client("your-api-key")
|
| 124 |
+
|
| 125 |
+
job_id = client.diarize(
|
| 126 |
+
"[https://www.example/audio.wav](https://www.example/audio.wav)",
|
| 127 |
+
transcription=True)
|
| 128 |
+
|
| 129 |
+
job_output = client.retrieve(job_id)
|
| 130 |
+
|
| 131 |
+
for word in job_output['output']['wordLevelTranscription']:
|
| 132 |
+
print(word['start'], word['end'], word['speaker'], word['text'])
|
| 133 |
+
|
| 134 |
+
for turn in job_output['output']['turnLevelTranscription']:
|
| 135 |
+
print(turn['start'], turn['end'], turn['speaker'], turn['text'])
|
| 136 |
+
|