about timestamps
Hi, I tested the model and it works really fast compared with whisper and only CPU! Thank you for your contribution!
CoreML gave me an error, but it's CPU is already working faster than Whisper Metal on Mac:
Error: Ort(Error { code: GenericFailure, msg: "Non-zero status code returned while running
12615810092392341640_CoreML_12615810092392341640_3 node.
Name:'CoreMLExecutionProvider_12615810092392341640_CoreML_12615810092392341640_3_3' Status Message: Error executing model:
Unable to compute the prediction using a neural network model. It can be an invalid input data or broken/unsupported model
(error code: -1)." })"
My question is, can we use word/segment timestamps as in the original? Or is this ONNX export not currently supported this feature?
ops, sorry I thought you converted that model π :
https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3
But it seems you converted that one, which I believe does not support the timestamps
You should actually be able to get pretty good token-level timestamps with CTC models. The model outputs token probabilities for each timestamp in the sequence (of shape [batch_size, sequence_length, vocab_size]), so simply based on the position of each token, you can predict the relative timestamp based on the number of frames and the length of the audio.
You should actually be able to get pretty good token-level timestamps with CTC models. The model outputs token probabilities for each timestamp in the sequence (of shape [batch_size, sequence_length, vocab_size]), so simply based on the position of each token, you can predict the relative timestamp based on the number of frames and the length of the audio.
thank you for guidance! It worked! π
https://github.com/altunenes/parakeet-rs