File size: 3,345 Bytes
2cf18a9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
---
license: cc-by-4.0
language:
- en
- es
- it
- fr
- de
- nl
- ru
- pl
- uk
- sk
- bg
- fi
- ro
- hr
- cs
- sv
- et
- hu
- lt
- da
- mt
- sl
- lv
- el
pipeline_tag: automatic-speech-recognition
thumbnail: null
tags:
- automatic-speech-recognition
- speech
- audio
- Transducer
- TDT
- FastConformer
- Conformer
- multilingual
- NeMo
- OpenVINO
base_model:
- nvidia/parakeet-tdt-1.1b
---

# Parakeet TDT 1.1B V3 - OpenVINO

[![Discord](https://img.shields.io/badge/Discord-Join%20Chat-7289da.svg)](https://discord.gg/WNsvaCtmDe)
[![GitHub Repo stars](https://img.shields.io/github/stars/FluidInference/eddy?style=flat&logo=github)](https://github.com/FluidInference/eddy)

OpenVINO-optimized version of NVIDIA's Parakeet TDT 1.1B V3 model for high-performance multilingual automatic speech recognition on Intel NPUs and CPUs.

## Benchmark Results

**Hardware**: Intel Core Ultra 7 155H (Meteor Lake) with Intel AI Boost NPU
**Software**: OpenVINO 2025.x

### LibriSpeech test-clean (English)

| Metric | Value |
|--------|-------|
| **Average WER** | 3.7% |
| **Median WER** | 0.0% |
| **Average CER** | 1.9% |
| **RTFx (NPU)** | 25.7× |
| **RTFx (CPU)** | 5-8× |
| **Files processed** | 2,620 (5.4 hours) |

### FLEURS Multilingual (24 Languages)

| Metric | Value |
|--------|-------|
| **Average WER** | 17.0% |
| **Average CER** | 5.4% |
| **Average RTFx** | 41.1× |
| **Total samples** | ~15,000+ |

**Best performing languages** (WER): Italian 4.3%, Spanish 5.4%, English 6.1%, German 7.4%, French 7.7%

See [BENCHMARK_RESULTS.md](https://github.com/FluidInference/eddy/blob/main/BENCHMARK_RESULTS.md) for complete per-language results.

## Performance Comparison

| Implementation | Device | RTFx (Avg) | WER (LibriSpeech) |
|----------------|--------|------------|-------------------|
| **eddy (OpenVINO)** | Intel Core Ultra 7 155H NPU | **25.7×** | 3.7% |
| Parakeet (PyTorch) | Intel Arc 140V GPU | ~20×* | ~2.5%* |
| **eddy (OpenVINO)** | Intel Core Ultra 7 155H CPU | **5-8×** | 3.7% |

> **Note**: Benchmarked on HP EliteBook Ultra G1i. eddy NPU is ~1.3× faster than PyTorch on Intel Arc GPU, with lower power consumption. *V3 estimated from V2 benchmark.

## Supported Languages

**24 European languages**: English, Spanish, Italian, French, German, Dutch, Russian, Polish, Ukrainian, Slovak, Bulgarian, Finnish, Romanian, Croatian, Czech, Swedish, Estonian, Hungarian, Lithuanian, Danish, Maltese, Slovenian, Latvian, Greek

## Usage

Python usage via ctypes available - see [eddy repository](https://github.com/FluidInference/eddy) for details.

## Model Details

- **Parameters**: 1.1B
- **Architecture**: FastConformer-RNNT (4-model pipeline)
- **Languages**: 24 European languages
- **Blank token ID**: 8192
- **Context window**: 10s chunks with 3s overlap
- **Features**: LSTM state continuity, token deduplication, per-token timestamps

## License

CC-BY-4.0 - See [LICENSE](LICENSE) for details.

## Links

- **GitHub**: [FluidInference/eddy](https://github.com/FluidInference/eddy)
- **Base Model**: [nvidia/parakeet-tdt-1.1b](https://huggingface.co/nvidia/parakeet-tdt-1.1b)
- **Documentation**: [Benchmark Results](https://github.com/FluidInference/eddy/blob/main/BENCHMARK_RESULTS.md)

## Acknowledgments

Based on NVIDIA's Parakeet TDT model. OpenVINO conversion and optimization by the FluidInference team.