FunASR Paraformer ASR Model

Paraformer large model for automatic speech recognition supporting Mandarin, Cantonese, and English (16kHz).

Original Source: ModelScope

License: Apache 2.0

Usage: This model is used by Step Audio EditX for VQ02 audio tokenization.

Model Details

  • Architecture: Paraformer (Non-autoregressive Transformer)
  • Languages: Mandarin, Cantonese, English
  • Sample Rate: 16kHz
  • Size: ~881MB

Citation

@inproceedings{gao2022paraformer,
  title={Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition},
  author={Gao, Zhifu and Zhang, Shiliang and McLoughlin, Ian and Yan, Zhijie},
  booktitle={INTERSPEECH},
  year={2022}
}
Downloads last month
177
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support