Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,34 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
pipeline_tag: audio-to-audio
|
| 6 |
+
---
|
| 7 |
+
|
| 8 |
+
# Regularized Schrödinger Bridge (RSB) via Distortion-Perception Perturbation for High-Fidelity Speech Enhancement
|
| 9 |
+
[](LICENSE)
|
| 10 |
+
|
| 11 |
+
Regularized Schrödinger Bridge (RSB) is a generative speech enhancement approach that reconciles fidelity and realism while mitigating exposure bias. RSB regularizes training with a Distortion-Perception Perturbation that constructs time-varying targets by interpolating between clean speech and posterior-mean estimates, and trains the network on perturbed intermediate states to correct toward the ground truth progressively. Consequently, such perturbation simulates inference-time prediction errors, mitigating the training–inference mismatch and thereby reducing exposure bias. Furthermore, it also injects posterior-mean estimates as fidelity-preserving guidance, facilitating reconstruction fidelity.
|
| 12 |
+
|
| 13 |
+
- Official PyTorch implementation of the paper:
|
| 14 |
+
[Regularized Schrödinger Bridge via Distortion-Perception Perturbation for High-Fidelity Speech Enhancement]()
|
| 15 |
+
- **Links**: Paper | [Audio Demo](https://yorch233.github.io/RSB/) | [Online Demo](https://huggingface.co/spaces/Yorch233/RSB) | [Github](https://github.com/Yorch233/RSB) | [Huggingface](https://huggingface.co/Yorch233/RSB)
|
| 16 |
+
|
| 17 |
+
<img src="https://github.com/Yorch233/RSB/raw/main/asset/RSB_schematic.png" width="600" />
|
| 18 |
+
|
| 19 |
+
#### Pretrained Model Download
|
| 20 |
+
We have publicly released a checkpoint of MISB's generative model, which is based the `ncsnpp_base` architecture and was trained on the `Voicebank+Demand` dataset.
|
| 21 |
+
|
| 22 |
+
There are two ways to download:
|
| 23 |
+
|
| 24 |
+
+ Download via `CLI`
|
| 25 |
+
```Bash
|
| 26 |
+
python -m cli.download_pretrained_model
|
| 27 |
+
```
|
| 28 |
+
|
| 29 |
+
+ Download via `Google Drive`.
|
| 30 |
+
Download the folder from [Google Drive](https://drive.google.com/drive/folders/1b5wI-DIvq3yyLH_PKJPeb3d8VwxfwYIR?usp=sharing) and place it in the `pretrained_models/` directory.
|
| 31 |
+
|
| 32 |
+
## License
|
| 33 |
+
|
| 34 |
+
This project is licensed under the [Apache License 2.0](LICENSE).
|