SofieneK commited on
Commit
4bd17bf
·
verified ·
1 Parent(s): 41c23e8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -0
README.md CHANGED
@@ -8,4 +8,39 @@ tags:
8
  - speech
9
  - SE
10
  - Neural-Audio-Codec
 
11
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  - speech
9
  - SE
10
  - Neural-Audio-Codec
11
+ pipeline_tag: audio-to-audio
12
  ---
13
+ # Modeling strategies for speech enhancement in the latent space of a neural audio codec
14
+
15
+ This repository provides the official model checkpoints for the paper *[Modeling strategies for speech enhancement in the latent space of a neural audio codec](https://arxiv.org/abs/2510.26299)* authored by Sofiene Kammoun, Xavier Alameda-Pineda, and Simon Leglaive, and published at IEEE ICASSP 2026.
16
+
17
+ We explore different modeling strategies (autoregressive vs. non-autoregressive) and representation spaces (discrete vs. continuous) for speech enhancement using neural audio codecs and Conformer-based architectures.
18
+
19
+ [arXiv](https://arxiv.org/abs/2510.26299) | [Code and Audio examples](https://sofienekammoun.github.io/SE-NAC-25/) | [Bibtex](#citation)
20
+
21
+
22
+ ## Overview
23
+
24
+ Our work introduces and compares a family of speech enhancement models that systematically vary along two main axes:
25
+
26
+ - **Representation Type**
27
+ - Discrete tokens
28
+ - Continuous latent vectors
29
+
30
+ - **Modeling Strategy**
31
+ - Autoregressive (AR): Sequential prediction of clean speech representation
32
+ - Non-Autoregressive (NAR): Parallel prediction of clean speech representation
33
+
34
+ The current release includes the following models:
35
+
36
+ | Model Name | Modeling Strategy | Input Representation | Output Representation | Model Checkpoint |
37
+ |-------------|------|----------------|----------------|----------------|
38
+ | **D-AR** | Autoregressive | Discrete |Discrete | `D-AR_ckpt_300.pt` |
39
+ | **D-NAR** | Non-Autoregressive | Discrete |Discrete | `D-NAR_ckpt_300.pt` |
40
+ | **D-NAR*** | Non-Autoregressive | Continuous |Discrete | `D-NAR_star_ckpt_300.pt` |
41
+ | **C-AR** | Autoregressive | Continuous | Continuous | `C-AR_ckpt_300.pt` |
42
+ | **C-NAR** | Non-Autoregressive | Continuous | Continuous | `C-NAR_ckpt_300.pt` |
43
+
44
+ Additional models:
45
+ - **C-FT** (`C-FT-encoder_ckpt_300.pt`) and **D-FT** (`D-FT-encoder_ckpt_300.pt`), where we only finetune the NAC's encoder with an MSE loss and a cross-entropy loss, respectively.
46
+ - **STFT-NAR** (`STFT_NAR_Mask_ckpt_300.pt`), where instead of the embeddings of the NAC, we work with STFT representations, and we train the model to output an STFT mask.