File size: 4,783 Bytes
3985ab8
c30c618
 
 
 
 
 
 
3985ab8
 
 
 
4af1e8c
c30c618
27c8c7f
3985ab8
c30c618
3985ab8
 
c30c618
3985ab8
 
 
 
 
 
4af1e8c
c30c618
 
3985ab8
c30c618
3985ab8
 
 
c30c618
 
 
3985ab8
c30c618
3985ab8
c30c618
3985ab8
 
 
 
 
 
c30c618
3985ab8
c30c618
3985ab8
 
 
16aeec3
c30c618
 
 
3985ab8
c30c618
 
3985ab8
c30c618
 
3985ab8
c30c618
3985ab8
 
 
c30c618
a38c602
 
 
 
a2d672f
c30c618
3985ab8
 
 
 
 
c30c618
3985ab8
c30c618
3985ab8
 
 
c30c618
3985ab8
 
 
 
c30c618
3985ab8
 
 
c30c618
3985ab8
 
 
c30c618
3985ab8
 
 
 
c30c618
16aeec3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c30c618
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186

---
license: apache-2.0
pipeline_tag: audio-to-audio
tags:
  - speech_enhancement
  - noise_suppression
  - real_time
  - streaming
  - causal
  - onnx
  - tflite
  - fullband
---

# DPDFNet

DPDFNet is a family of **causal, single‑channel** speech enhancement models for **real‑time noise suppression**.\
It builds on **DeepFilterNet2** by adding **Dual‑Path RNN (DPRNN)** blocks in the encoder for stronger long‑range modeling while staying streaming‑friendly.

**Links**
- Project page (audio samples + architecture): https://ceva-ip.github.io/DPDFNet/
- Paper (arXiv): https://arxiv.org/abs/2512.16420
- Code (GitHub): https://github.com/ceva-ip/DPDFNet
- Demo Space: https://huggingface.co/spaces/Ceva-IP/DPDFNetDemo
- Evaluation set: https://huggingface.co/datasets/Ceva-IP/DPDFNet_EvalSet

---

## What’s in this repo

- **TFLite**: `*.tflite` (root)
- **ONNX**: `onnx/*.onnx`
- **PyTorch checkpoints**: `checkpoints/*.pth`

---

## Model variants

### 16 kHz models

| Model | DPRNN blocks | Params (M) | MACs (G) |
|---|:---:|:---:|:---:|
| `baseline` | 0 | 2.31 | 0.36 |
| `dpdfnet2` | 2 | 2.49 | 1.35 |
| `dpdfnet4` | 4 | 2.84 | 2.36 |
| `dpdfnet8` | 8 | 3.54 | 4.37 |

### 48 kHz fullband model

| Model | DPRNN blocks | Params (M) | MACs (G) |
|---|:---:|:---:|:---:|
| `dpdfnet2_48khz_hr` | 2 | 2.58 | 2.42 |
| `dpdfnet8_48khz_hr` | 8 | 3.63 | 7.17 |

---

## Recommended inference (CPU-only, ONNX)

```bash
pip install dpdfnet
```

### CLI

```bash
# Enhance one file
dpdfnet enhance noisy.wav enhanced.wav --model dpdfnet4

# Enhance a directory (uses all CPU cores by default)
dpdfnet enhance-dir ./noisy_wavs ./enhanced_wavs --model dpdfnet2

# Enhance a directory with a fixed worker count
dpdfnet enhance-dir ./noisy_wavs ./enhanced_wavs --model dpdfnet2 --workers 4

# Download models
dpdfnet download
dpdfnet download dpdfnet8
dpdfnet download dpdfnet4 --force
```

### Python API

```python
import soundfile as sf
import dpdfnet

# In-memory enhancement:
audio, sr = sf.read("noisy.wav")
enhanced = dpdfnet.enhance(audio, sample_rate=sr, model="dpdfnet4")
sf.write("enhanced.wav", enhanced, sr)

# Enhance one file:
out_path = dpdfnet.enhance_file("noisy.wav", model="dpdfnet2")
print(out_path)

# Model listing:
for row in dpdfnet.available_models():
    print(row["name"], row["ready"], row["cached"])

# Download models:
dpdfnet.download()				# All models
dpdfnet.download("dpdfnet4")	# Specific model
```

### Real-time Microphone Enhancement

Install `sounddevice` (not included in `dpdfnet` dependencies):

```bash
pip install sounddevice
```

`StreamEnhancer` processes audio chunk-by-chunk, preserving RNN state across
calls.  Any chunk size works; enhanced samples are returned as soon as enough
data has accumulated for the first model frame (20 ms).

```python
import numpy as np
import sounddevice as sd
import dpdfnet

INPUT_SR   = 48000
# Use one model hop (10 ms) as the block size so process() returns
# exactly one hop's worth of enhanced audio on every callback.
BLOCK_SIZE = int(INPUT_SR * 0.010)   # 480 samples at 48 kHz

enhancer = dpdfnet.StreamEnhancer(model="dpdfnet2_48khz_hr")

def callback(indata, outdata, frames, time, status):
    mono_in = indata[:, 0] if indata.ndim > 1 else indata.ravel()
    enhanced = enhancer.process(mono_in, sample_rate=INPUT_SR)
    n = min(len(enhanced), frames)
    outdata[:n, 0] = enhanced[:n]
    if n < frames:
        outdata[n:] = 0.0   # silence while the first window accumulates

with sd.Stream(
    samplerate=INPUT_SR,
    blocksize=BLOCK_SIZE,
    channels=1,
    dtype="float32",
    callback=callback,
):
    print("Enhancing microphone input - press Ctrl+C to stop")
    try:
        while True:
            sd.sleep(100)
    except KeyboardInterrupt:
        pass

# Optional: drain the final partial window at the end of a recording
tail = enhancer.flush()
```

> [!NOTE]
> **Latency**  
> The first enhanced output arrives after one full model window (~20 ms) has been buffered. All subsequent blocks are returned with ~10 ms additional delay.
>
> **Sample rate**  
> `StreamEnhancer` resamples internally. Pass your device's native rate as `sample_rate`; the return value is at the same rate.
>
> **Block size**  
> Using `BLOCK_SIZE = int(SR * 0.010)` (one model hop) gives one enhanced block per callback. Other sizes also work but may produce empty returns while the buffer fills.
>
> **Multiple streams**  
> Create a separate `StreamEnhancer` per stream. Call `enhancer.reset()` between independent audio segments to clear RNN state.

---

## Citation

```bibtex
@article{rika2025dpdfnet,
  title  = {DPDFNet: Boosting DeepFilterNet2 via Dual-Path RNN},
  author = {Rika, Daniel and Sapir, Nino and Gus, Ido},
  year   = {2025}
}
```

---

## License

Apache-2.0