Upload 2 files
Browse files- README.md +61 -3
- pitchflower_diagram.png +0 -0
README.md
CHANGED
|
@@ -1,3 +1,61 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: cc-by-nc-sa-4.0
|
| 3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-nc-sa-4.0
|
| 3 |
+
---
|
| 4 |
+
|
| 5 |
+
# 🌸 PitchFlower
|
| 6 |
+
|
| 7 |
+
<p align="left">
|
| 8 |
+
<a href="https://arxiv.org/abs/2510.25566">
|
| 9 |
+
<img src="https://img.shields.io/badge/arXiv-PitchFlower-b31b1b?logo=arxiv&logoColor=white" alt="arXiv">
|
| 10 |
+
</a>
|
| 11 |
+
<a href="https://github.com/diegotg2000/PitchFlower">
|
| 12 |
+
<img src="https://img.shields.io/badge/GitHub-PitchFlower-181717?logo=github" alt="GitHub">
|
| 13 |
+
</a>
|
| 14 |
+
</p>
|
| 15 |
+
|
| 16 |
+
Official pretrained checkpoint of the paper *PitchFlower: A flow-based neural audio codec with pitch controllability*.
|
| 17 |
+
|
| 18 |
+
## 🧠 Overview
|
| 19 |
+
|
| 20 |
+
PitchFlower achieves pitch controllability by means of a perturbation strategy. During inference, pitch information is removed by applying a random flattening and shifting operation. The model is trained with a reconstruction task, providing pitch information explicitly.
|
| 21 |
+
|
| 22 |
+
<p align="center">
|
| 23 |
+
<img src="pitchflower_diagram.png" alt="PitchFlower architecture" width="600">
|
| 24 |
+
</p>
|
| 25 |
+
|
| 26 |
+
We use an autoencoder with an RVQ bottleneck and a flow-based decoder to produce high-quality audio. More details can be found in the paper.
|
| 27 |
+
|
| 28 |
+
## 📦 Installation and Usage
|
| 29 |
+
|
| 30 |
+
Check out our GitHub repo to learn how to use PitchFlower https://github.com/diegotg2000/PitchFlower
|
| 31 |
+
|
| 32 |
+
## 🙌 Acknowledgements
|
| 33 |
+
|
| 34 |
+
We'd like to acknowledge the repositories from which we draw inspiration and parts of the code
|
| 35 |
+
|
| 36 |
+
- Vocos: https://github.com/gemelo-ai/vocos
|
| 37 |
+
- WavTokenizer: https://github.com/jishengpeng/WavTokenizer
|
| 38 |
+
- Encodec: https://github.com/facebookresearch/encodec
|
| 39 |
+
|
| 40 |
+
This work has been done in the [Analysis/Synthesis team of the STMS laboratory](https://www.stms-lab.fr/team/analyse-et-synthese-des-sons/) at IRCAM. It has been funded by the [ANR project EVA](https://anr.fr/Project-ANR-23-CE23-0018).
|
| 41 |
+
|
| 42 |
+
## 📫 Contact
|
| 43 |
+
|
| 44 |
+
For questions or collaboration opportunities, feel free to reach out: dtorres@ircam.fr
|
| 45 |
+
|
| 46 |
+
## 🧩 Citation
|
| 47 |
+
|
| 48 |
+
```bibtex
|
| 49 |
+
@misc{pitchflower,
|
| 50 |
+
title={PitchFlower: A flow-based neural audio codec with pitch controllability},
|
| 51 |
+
author={Diego Torres and Axel Roebel and Nicolas Obin},
|
| 52 |
+
year={2025},
|
| 53 |
+
eprint={2510.25566},
|
| 54 |
+
archivePrefix={arXiv},
|
| 55 |
+
url={https://arxiv.org/abs/2510.25566},
|
| 56 |
+
}
|
| 57 |
+
```
|
| 58 |
+
|
| 59 |
+
## 📜 License
|
| 60 |
+
|
| 61 |
+
This project is licensed under the [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/) license.
|
pitchflower_diagram.png
ADDED
|