File size: 4,783 Bytes
3985ab8 c30c618 3985ab8 4af1e8c c30c618 27c8c7f 3985ab8 c30c618 3985ab8 c30c618 3985ab8 4af1e8c c30c618 3985ab8 c30c618 3985ab8 c30c618 3985ab8 c30c618 3985ab8 c30c618 3985ab8 c30c618 3985ab8 c30c618 3985ab8 16aeec3 c30c618 3985ab8 c30c618 3985ab8 c30c618 3985ab8 c30c618 3985ab8 c30c618 a38c602 a2d672f c30c618 3985ab8 c30c618 3985ab8 c30c618 3985ab8 c30c618 3985ab8 c30c618 3985ab8 c30c618 3985ab8 c30c618 3985ab8 c30c618 16aeec3 c30c618 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 |
---
license: apache-2.0
pipeline_tag: audio-to-audio
tags:
- speech_enhancement
- noise_suppression
- real_time
- streaming
- causal
- onnx
- tflite
- fullband
---
# DPDFNet
DPDFNet is a family of **causal, single‑channel** speech enhancement models for **real‑time noise suppression**.\
It builds on **DeepFilterNet2** by adding **Dual‑Path RNN (DPRNN)** blocks in the encoder for stronger long‑range modeling while staying streaming‑friendly.
**Links**
- Project page (audio samples + architecture): https://ceva-ip.github.io/DPDFNet/
- Paper (arXiv): https://arxiv.org/abs/2512.16420
- Code (GitHub): https://github.com/ceva-ip/DPDFNet
- Demo Space: https://huggingface.co/spaces/Ceva-IP/DPDFNetDemo
- Evaluation set: https://huggingface.co/datasets/Ceva-IP/DPDFNet_EvalSet
---
## What’s in this repo
- **TFLite**: `*.tflite` (root)
- **ONNX**: `onnx/*.onnx`
- **PyTorch checkpoints**: `checkpoints/*.pth`
---
## Model variants
### 16 kHz models
| Model | DPRNN blocks | Params (M) | MACs (G) |
|---|:---:|:---:|:---:|
| `baseline` | 0 | 2.31 | 0.36 |
| `dpdfnet2` | 2 | 2.49 | 1.35 |
| `dpdfnet4` | 4 | 2.84 | 2.36 |
| `dpdfnet8` | 8 | 3.54 | 4.37 |
### 48 kHz fullband model
| Model | DPRNN blocks | Params (M) | MACs (G) |
|---|:---:|:---:|:---:|
| `dpdfnet2_48khz_hr` | 2 | 2.58 | 2.42 |
| `dpdfnet8_48khz_hr` | 8 | 3.63 | 7.17 |
---
## Recommended inference (CPU-only, ONNX)
```bash
pip install dpdfnet
```
### CLI
```bash
# Enhance one file
dpdfnet enhance noisy.wav enhanced.wav --model dpdfnet4
# Enhance a directory (uses all CPU cores by default)
dpdfnet enhance-dir ./noisy_wavs ./enhanced_wavs --model dpdfnet2
# Enhance a directory with a fixed worker count
dpdfnet enhance-dir ./noisy_wavs ./enhanced_wavs --model dpdfnet2 --workers 4
# Download models
dpdfnet download
dpdfnet download dpdfnet8
dpdfnet download dpdfnet4 --force
```
### Python API
```python
import soundfile as sf
import dpdfnet
# In-memory enhancement:
audio, sr = sf.read("noisy.wav")
enhanced = dpdfnet.enhance(audio, sample_rate=sr, model="dpdfnet4")
sf.write("enhanced.wav", enhanced, sr)
# Enhance one file:
out_path = dpdfnet.enhance_file("noisy.wav", model="dpdfnet2")
print(out_path)
# Model listing:
for row in dpdfnet.available_models():
print(row["name"], row["ready"], row["cached"])
# Download models:
dpdfnet.download() # All models
dpdfnet.download("dpdfnet4") # Specific model
```
### Real-time Microphone Enhancement
Install `sounddevice` (not included in `dpdfnet` dependencies):
```bash
pip install sounddevice
```
`StreamEnhancer` processes audio chunk-by-chunk, preserving RNN state across
calls. Any chunk size works; enhanced samples are returned as soon as enough
data has accumulated for the first model frame (20 ms).
```python
import numpy as np
import sounddevice as sd
import dpdfnet
INPUT_SR = 48000
# Use one model hop (10 ms) as the block size so process() returns
# exactly one hop's worth of enhanced audio on every callback.
BLOCK_SIZE = int(INPUT_SR * 0.010) # 480 samples at 48 kHz
enhancer = dpdfnet.StreamEnhancer(model="dpdfnet2_48khz_hr")
def callback(indata, outdata, frames, time, status):
mono_in = indata[:, 0] if indata.ndim > 1 else indata.ravel()
enhanced = enhancer.process(mono_in, sample_rate=INPUT_SR)
n = min(len(enhanced), frames)
outdata[:n, 0] = enhanced[:n]
if n < frames:
outdata[n:] = 0.0 # silence while the first window accumulates
with sd.Stream(
samplerate=INPUT_SR,
blocksize=BLOCK_SIZE,
channels=1,
dtype="float32",
callback=callback,
):
print("Enhancing microphone input - press Ctrl+C to stop")
try:
while True:
sd.sleep(100)
except KeyboardInterrupt:
pass
# Optional: drain the final partial window at the end of a recording
tail = enhancer.flush()
```
> [!NOTE]
> **Latency**
> The first enhanced output arrives after one full model window (~20 ms) has been buffered. All subsequent blocks are returned with ~10 ms additional delay.
>
> **Sample rate**
> `StreamEnhancer` resamples internally. Pass your device's native rate as `sample_rate`; the return value is at the same rate.
>
> **Block size**
> Using `BLOCK_SIZE = int(SR * 0.010)` (one model hop) gives one enhanced block per callback. Other sizes also work but may produce empty returns while the buffer fills.
>
> **Multiple streams**
> Create a separate `StreamEnhancer` per stream. Call `enhancer.reset()` between independent audio segments to clear RNN state.
---
## Citation
```bibtex
@article{rika2025dpdfnet,
title = {DPDFNet: Boosting DeepFilterNet2 via Dual-Path RNN},
author = {Rika, Daniel and Sapir, Nino and Gus, Ido},
year = {2025}
}
```
---
## License
Apache-2.0
|