ESPnet: End-to-End Speech Processing Toolkit
Abstract
ESPnet is an open-source platform for end-to-end speech processing, focusing on automatic speech recognition using Chainer and PyTorch, with Kaldi-style data processing.
This paper introduces a new open source platform for end-to-end speech processing named ESPnet. ESPnet mainly focuses on end-to-end automatic speech recognition (ASR), and adopts widely-used dynamic neural network toolkits, Chainer and PyTorch, as a main deep learning engine. ESPnet also follows the Kaldi ASR toolkit style for data processing, feature extraction/format, and recipes to provide a complete setup for speech recognition and other speech processing experiments. This paper explains a major architecture of this software platform, several important functionalities, which differentiate ESPnet from other open source ASR toolkits, and experimental results with major ASR benchmarks.
Get this paper in your agent:
hf papers read 1804.00015 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 714
Browse 714 models citing this paperDatasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 279
Collections including this paper 0
No Collection including this paper