Voice Scribe mirror parakeet from FluidInference/parakeet-tdt-0.6b-v3-ov@dfd55eb6c85a

2cf18a9 verified about 1 month ago

3.35 kB

	---
	license: cc-by-4.0
	language:
	- en
	- es
	- it
	- fr
	- de
	- nl
	- ru
	- pl
	- uk
	- sk
	- bg
	- fi
	- ro
	- hr
	- cs
	- sv
	- et
	- hu
	- lt
	- da
	- mt
	- sl
	- lv
	- el
	pipeline_tag: automatic-speech-recognition
	thumbnail: null
	tags:
	- automatic-speech-recognition
	- speech
	- audio
	- Transducer
	- TDT
	- FastConformer
	- Conformer
	- multilingual
	- NeMo
	- OpenVINO
	base_model:
	- nvidia/parakeet-tdt-1.1b
	---

	# Parakeet TDT 1.1B V3 - OpenVINO

	[![Discord](https://img.shields.io/badge/Discord-Join%20Chat-7289da.svg)](https://discord.gg/WNsvaCtmDe)
	[![GitHub Repo stars](https://img.shields.io/github/stars/FluidInference/eddy?style=flat&logo=github)](https://github.com/FluidInference/eddy)

	OpenVINO-optimized version of NVIDIA's Parakeet TDT 1.1B V3 model for high-performance multilingual automatic speech recognition on Intel NPUs and CPUs.

	## Benchmark Results

	Hardware: Intel Core Ultra 7 155H (Meteor Lake) with Intel AI Boost NPU
	Software: OpenVINO 2025.x

	### LibriSpeech test-clean (English)

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Average WER \| 3.7% \|
	\| Median WER \| 0.0% \|
	\| Average CER \| 1.9% \|
	\| RTFx (NPU) \| 25.7× \|
	\| RTFx (CPU) \| 5-8× \|
	\| Files processed \| 2,620 (5.4 hours) \|

	### FLEURS Multilingual (24 Languages)

	\| Metric \| Value \|
	\|--------\|-------\|
	\| Average WER \| 17.0% \|
	\| Average CER \| 5.4% \|
	\| Average RTFx \| 41.1× \|
	\| Total samples \| ~15,000+ \|

	Best performing languages (WER): Italian 4.3%, Spanish 5.4%, English 6.1%, German 7.4%, French 7.7%

	See [BENCHMARK_RESULTS.md](https://github.com/FluidInference/eddy/blob/main/BENCHMARK_RESULTS.md) for complete per-language results.

	## Performance Comparison

	\| Implementation \| Device \| RTFx (Avg) \| WER (LibriSpeech) \|
	\|----------------\|--------\|------------\|-------------------\|
	\| eddy (OpenVINO) \| Intel Core Ultra 7 155H NPU \| 25.7× \| 3.7% \|
	\| Parakeet (PyTorch) \| Intel Arc 140V GPU \| ~20×* \| ~2.5%* \|
	\| eddy (OpenVINO) \| Intel Core Ultra 7 155H CPU \| 5-8× \| 3.7% \|

	> Note: Benchmarked on HP EliteBook Ultra G1i. eddy NPU is ~1.3× faster than PyTorch on Intel Arc GPU, with lower power consumption. *V3 estimated from V2 benchmark.

	## Supported Languages

	24 European languages: English, Spanish, Italian, French, German, Dutch, Russian, Polish, Ukrainian, Slovak, Bulgarian, Finnish, Romanian, Croatian, Czech, Swedish, Estonian, Hungarian, Lithuanian, Danish, Maltese, Slovenian, Latvian, Greek

	## Usage

	Python usage via ctypes available - see [eddy repository](https://github.com/FluidInference/eddy) for details.

	## Model Details

	- Parameters: 1.1B
	- Architecture: FastConformer-RNNT (4-model pipeline)
	- Languages: 24 European languages
	- Blank token ID: 8192
	- Context window: 10s chunks with 3s overlap
	- Features: LSTM state continuity, token deduplication, per-token timestamps

	## License

	CC-BY-4.0 - See [LICENSE](LICENSE) for details.

	## Links

	- GitHub: [FluidInference/eddy](https://github.com/FluidInference/eddy)
	- Base Model: [nvidia/parakeet-tdt-1.1b](https://huggingface.co/nvidia/parakeet-tdt-1.1b)
	- Documentation: [Benchmark Results](https://github.com/FluidInference/eddy/blob/main/BENCHMARK_RESULTS.md)

	## Acknowledgments

	Based on NVIDIA's Parakeet TDT model. OpenVINO conversion and optimization by the FluidInference team.