YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Onnx model that also outputs pred_dur to be able to get word level timestamps. I am using this in c#, so don't have any python samples

model_slim.onnx is model.onnx run through https://github.com/inisis/OnnxSlim

usage


var (phonemeOutput, tokens) = _phonemizer.G2P(text);

...

var       inputs    = await CreateModelInputs(audioData, voice, speed);
using var results   = _session.Run(inputs);

var waveform = results[0].AsTensor<float>().ToArray().ToMemory();
var duration = results[1].AsTensor<long>().ToArray().ToMemory();

JoinTimestamps(tokens, duration);

private void JoinTimestamps(List<MToken> tokens, Memory<long> predDur)
    {
        const int MAGIC_DIVISOR = 80;

        var predDurSpan = predDur.Span;

        // Multiply by 600 to go from pred_dur frames to sample_rate 24000
        // Equivalent to dividing pred_dur frames by 40 to get timestamp in seconds
        // We will count nice round half-frames, so the divisor is 80

        if ( tokens.Count == 0 || predDurSpan.Length < 3 )
        {
            // We expect at least 3: <bos>, token, <eos>
            return;
        }

        // We track 2 counts, measured in half-frames: (left, right)
        // This way we can cut space characters in half
        // TODO: Is -3 an appropriate offset?
        var left  = 2 * Math.Max(0, predDurSpan[0] - 3);
        var right = left;

        // Updates:
        // left = right + (2 * token_dur) + space_dur
        // right = left + space_dur
        var i = 1;
        foreach ( var t in tokens )
        {
            if ( i >= predDurSpan.Length - 1 )
            {
                break;
            }

            if ( string.IsNullOrEmpty(t.Phonemes) )
            {
                if ( t.IsWhitespace )
                {
                    i++;
                    left  = right + predDurSpan[i];
                    right = left + predDurSpan[i];
                    i++;
                }

                continue;
            }

            var j = i + t.Phonemes.Length;
            if ( j >= predDur.Length )
            {
                break;
            }

            t.StartTs = (double)left / MAGIC_DIVISOR;

            // Calculate token duration by summing the span
            long tokenDur = 0;
            for ( var k = i; k < i + t.Phonemes.Length; k++ )
            {
                tokenDur += predDurSpan[k];
            }

            var spaceDur = t.IsWhitespace ? predDurSpan[j] : 0;

            left    = right + 2 * tokenDur + spaceDur;
            t.EndTs = (double)left / MAGIC_DIVISOR;
            right   = left + spaceDur;

            i = j + (t.IsWhitespace ? 1 : 0);
        }
    }

Credits

https://github.com/hexgrad/kokoro https://github.com/adrianlyjak/kokoro-onnx-export

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support