Duplicate from neuphonic/distill-neucodec

Browse files

Co-authored-by: Lohith Konathala <lohithk480@users.noreply.huggingface.co>

Files changed (4) hide show

.gitattributes +35 -0
README.md +77 -0
meta.yaml +2 -0
pytorch_model.bin +3 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,35 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,77 @@

+---
+license: apache-2.0
+tags:
+- audio
+- speech
+- audio-to-audio
+- speech-language-models
+datasets:
+- amphion/Emilia-Dataset
+- facebook/multilingual_librispeech
+- CSTR-Edinburgh/vctk
+- google/fleurs
+- mozilla-foundation/common_voice_13_0
+- mythicinfinity/libritts_r
+---
+# Model Details
+Distill-NeuCodec is a version of NeuCodec with a compatible, distilled encoder.
+The distilled encoder is 10x smaller in parameter count and uses ~7.5x less MACs at inference time.
+The distilled model makes the following adjustments to the model:
+* Swap the notoriuously slow [BigCodec](https://arxiv.org/abs/2409.05377) acoustic encoder for the [SQCodec](https://arxiv.org/abs/2504.04949) acoustic encoder (70m → 36m)
+* Swap the [w2v-bert-2.0](https://huggingface.co/facebook/w2v-bert-2.0) semantic encoder for [DistilHuBERT](https://huggingface.co/ntu-spml/distilhubert) (600m → 21m)
+Our work is largely based on extending the work of [X-Codec2.0](https://huggingface.co/HKUSTAudio/xcodec2) and [SQCodec](https://arxiv.org/abs/2504.04949).
+- **Developed by:** Neuphonic
+- **Model type:** Neural Audio Codec
+- **License:** apache-2.0
+- **Repository:** https://github.com/neuphonic/neucodec
+- **Paper:** [arXiv](https://arxiv.org/abs/2509.09550)
+- **Pre-encoded Datasets:**
+  - [Emilia-YODAS-EN](https://huggingface.co/datasets/neuphonic/emilia-yodas-english-neucodec)
+  - *More coming soon!*
+## Get Started
+Use the code below to get started with the model.
+To install from pypi in a dedicated environment, using Python 3.10 or above:
+```bash
+conda create -n neucodec python=3.10
+conda activate neucodec
+pip install neucodec
+```
+Then, to use in python:
+```python
+import librosa
+import torch
+import torchaudio
+from torchaudio import transforms as T
+from neucodec import DistillNeuCodec
+model = DistillNeuCodec.from_pretrained("neuphonic/distill-neucodec")
+model.eval().cuda()
+y, sr = torchaudio.load(librosa.ex("libri1"))
+if sr != 16_000:
+    y = T.Resample(sr, 16_000)(y)[None, ...] # (B, 1, T_16)
+with torch.no_grad():
+    fsq_codes = model.encode_code(y)
+    # fsq_codes = model.encode_code(librosa.ex("libri1")) # or directly pass your filepath!
+    print(f"Codes shape: {fsq_codes.shape}")
+    recon = model.decode_code(fsq_codes).cpu() # (B, 1, T_24)
+torchaudio.save("reconstructed.wav", recon[0, :, :], 24_000)
+```
+## Training Details
+The model was trained using the same data as the full model, with an additional distillation loss (MSE between distilled and original encoder ouputs).

meta.yaml ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ author: neuphonic
2	+ license: apache-2.0

pytorch_model.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:adace21f679b30f071c02e0cb3502d965ab08b50be936a5e81944674a5ae101e
+size 1025488162