YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Onnx model that also outputs pred_dur to be able to get word level timestamps. I am using this in c#, so don't have any python samples
model_slim.onnx is model.onnx run through https://github.com/inisis/OnnxSlim
usage
var (phonemeOutput, tokens) = _phonemizer.G2P(text);
...
var inputs = await CreateModelInputs(audioData, voice, speed);
using var results = _session.Run(inputs);
var waveform = results[0].AsTensor<float>().ToArray().ToMemory();
var duration = results[1].AsTensor<long>().ToArray().ToMemory();
JoinTimestamps(tokens, duration);
private void JoinTimestamps(List<MToken> tokens, Memory<long> predDur)
{
const int MAGIC_DIVISOR = 80;
var predDurSpan = predDur.Span;
// Multiply by 600 to go from pred_dur frames to sample_rate 24000
// Equivalent to dividing pred_dur frames by 40 to get timestamp in seconds
// We will count nice round half-frames, so the divisor is 80
if ( tokens.Count == 0 || predDurSpan.Length < 3 )
{
// We expect at least 3: <bos>, token, <eos>
return;
}
// We track 2 counts, measured in half-frames: (left, right)
// This way we can cut space characters in half
// TODO: Is -3 an appropriate offset?
var left = 2 * Math.Max(0, predDurSpan[0] - 3);
var right = left;
// Updates:
// left = right + (2 * token_dur) + space_dur
// right = left + space_dur
var i = 1;
foreach ( var t in tokens )
{
if ( i >= predDurSpan.Length - 1 )
{
break;
}
if ( string.IsNullOrEmpty(t.Phonemes) )
{
if ( t.IsWhitespace )
{
i++;
left = right + predDurSpan[i];
right = left + predDurSpan[i];
i++;
}
continue;
}
var j = i + t.Phonemes.Length;
if ( j >= predDur.Length )
{
break;
}
t.StartTs = (double)left / MAGIC_DIVISOR;
// Calculate token duration by summing the span
long tokenDur = 0;
for ( var k = i; k < i + t.Phonemes.Length; k++ )
{
tokenDur += predDurSpan[k];
}
var spaceDur = t.IsWhitespace ? predDurSpan[j] : 0;
left = right + 2 * tokenDur + spaceDur;
t.EndTs = (double)left / MAGIC_DIVISOR;
right = left + spaceDur;
i = j + (t.IsWhitespace ? 1 : 0);
}
}
Credits
https://github.com/hexgrad/kokoro https://github.com/adrianlyjak/kokoro-onnx-export
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support