CoreML Speech Models
Collection
Speech AI models for Apple Neural Engine via CoreML. iOS/macOS ready. ASR, TTS, VAD, diarization. โข 17 items โข Updated โข 1
CoreML conversion of WeSpeaker ResNet34-LM for Apple Neural Engine.
Produces 256-dimensional L2-normalized speaker embeddings from audio.
| Detail | Value |
|---|---|
| Architecture | ResNet34 with statistics pooling |
| Parameters | ~6.6M |
| Input | 80-bin log-mel spectrogram (16kHz) |
| Output | 256-dim L2-normalized speaker embedding |
| BatchNorm | Fused into Conv2d at conversion time |
let model = try await WeSpeakerModel.fromPretrained(backend: .coreML)
let embedding = model.embed(audio: samples, sampleRate: 16000)
let similarity = WeSpeakerModel.cosineSimilarity(embeddingA, embeddingB)
| Variant | Backend | Model ID |
|---|---|---|
| MLX | GPU | aufklarer/WeSpeaker-ResNet34-LM-MLX |
| CoreML | Neural Engine | aufklarer/WeSpeaker-ResNet34-LM-CoreML |
Base model
pyannote/wespeaker-voxceleb-resnet34-LM