SofieneK
/

SE-NAC

Neural-Audio-Codec

Model card Files Files and versions

SE-NAC / README.md

SofieneK's picture

Update README.md

4bd17bf verified about 16 hours ago

|

history blame contribute delete

2.27 kB

	---
	license: mit
	datasets:
	- westbrook/LibriMix
	language:
	- en
	tags:
	- speech
	- SE
	- Neural-Audio-Codec
	pipeline_tag: audio-to-audio
	---
	# Modeling strategies for speech enhancement in the latent space of a neural audio codec

	This repository provides the official model checkpoints for the paper [Modeling strategies for speech enhancement in the latent space of a neural audio codec](https://arxiv.org/abs/2510.26299) authored by Sofiene Kammoun, Xavier Alameda-Pineda, and Simon Leglaive, and published at IEEE ICASSP 2026.

	We explore different modeling strategies (autoregressive vs. non-autoregressive) and representation spaces (discrete vs. continuous) for speech enhancement using neural audio codecs and Conformer-based architectures.

	[arXiv](https://arxiv.org/abs/2510.26299) \| [Code and Audio examples](https://sofienekammoun.github.io/SE-NAC-25/) \| [Bibtex](#citation)


	## Overview

	Our work introduces and compares a family of speech enhancement models that systematically vary along two main axes:

	- Representation Type
	- Discrete tokens
	- Continuous latent vectors

	- Modeling Strategy
	- Autoregressive (AR): Sequential prediction of clean speech representation
	- Non-Autoregressive (NAR): Parallel prediction of clean speech representation

	The current release includes the following models:

	\| Model Name \| Modeling Strategy \| Input Representation \| Output Representation \| Model Checkpoint \|
	\|-------------\|------\|----------------\|----------------\|----------------\|
	\| D-AR \| Autoregressive \| Discrete \|Discrete \| `D-AR_ckpt_300.pt` \|
	\| D-NAR \| Non-Autoregressive \| Discrete \|Discrete \| `D-NAR_ckpt_300.pt` \|
	\| D-NAR* \| Non-Autoregressive \| Continuous \|Discrete \| `D-NAR_star_ckpt_300.pt` \|
	\| C-AR \| Autoregressive \| Continuous \| Continuous \| `C-AR_ckpt_300.pt` \|
	\| C-NAR \| Non-Autoregressive \| Continuous \| Continuous \| `C-NAR_ckpt_300.pt` \|

	Additional models:
	- C-FT (`C-FT-encoder_ckpt_300.pt`) and D-FT (`D-FT-encoder_ckpt_300.pt`), where we only finetune the NAC's encoder with an MSE loss and a cross-entropy loss, respectively.
	- STFT-NAR (`STFT_NAR_Mask_ckpt_300.pt`), where instead of the embeddings of the NAC, we work with STFT representations, and we train the model to output an STFT mask.