aoiandroid lohithk480 commited on
Commit
40492d0
·
0 Parent(s):

Duplicate from neuphonic/distill-neucodec

Browse files

Co-authored-by: Lohith Konathala <lohithk480@users.noreply.huggingface.co>

Files changed (4) hide show
  1. .gitattributes +35 -0
  2. README.md +77 -0
  3. meta.yaml +2 -0
  4. pytorch_model.bin +3 -0
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - audio
5
+ - speech
6
+ - audio-to-audio
7
+ - speech-language-models
8
+ datasets:
9
+ - amphion/Emilia-Dataset
10
+ - facebook/multilingual_librispeech
11
+ - CSTR-Edinburgh/vctk
12
+ - google/fleurs
13
+ - mozilla-foundation/common_voice_13_0
14
+ - mythicinfinity/libritts_r
15
+ ---
16
+
17
+ # Model Details
18
+
19
+ Distill-NeuCodec is a version of NeuCodec with a compatible, distilled encoder.
20
+
21
+ The distilled encoder is 10x smaller in parameter count and uses ~7.5x less MACs at inference time.
22
+
23
+ The distilled model makes the following adjustments to the model:
24
+ * Swap the notoriuously slow [BigCodec](https://arxiv.org/abs/2409.05377) acoustic encoder for the [SQCodec](https://arxiv.org/abs/2504.04949) acoustic encoder (70m → 36m)
25
+ * Swap the [w2v-bert-2.0](https://huggingface.co/facebook/w2v-bert-2.0) semantic encoder for [DistilHuBERT](https://huggingface.co/ntu-spml/distilhubert) (600m → 21m)
26
+
27
+ Our work is largely based on extending the work of [X-Codec2.0](https://huggingface.co/HKUSTAudio/xcodec2) and [SQCodec](https://arxiv.org/abs/2504.04949).
28
+
29
+ - **Developed by:** Neuphonic
30
+ - **Model type:** Neural Audio Codec
31
+ - **License:** apache-2.0
32
+ - **Repository:** https://github.com/neuphonic/neucodec
33
+ - **Paper:** [arXiv](https://arxiv.org/abs/2509.09550)
34
+ - **Pre-encoded Datasets:**
35
+ - [Emilia-YODAS-EN](https://huggingface.co/datasets/neuphonic/emilia-yodas-english-neucodec)
36
+ - *More coming soon!*
37
+
38
+
39
+ ## Get Started
40
+
41
+ Use the code below to get started with the model.
42
+
43
+ To install from pypi in a dedicated environment, using Python 3.10 or above:
44
+
45
+ ```bash
46
+ conda create -n neucodec python=3.10
47
+ conda activate neucodec
48
+ pip install neucodec
49
+ ```
50
+ Then, to use in python:
51
+
52
+ ```python
53
+ import librosa
54
+ import torch
55
+ import torchaudio
56
+ from torchaudio import transforms as T
57
+ from neucodec import DistillNeuCodec
58
+
59
+ model = DistillNeuCodec.from_pretrained("neuphonic/distill-neucodec")
60
+ model.eval().cuda()
61
+
62
+ y, sr = torchaudio.load(librosa.ex("libri1"))
63
+ if sr != 16_000:
64
+ y = T.Resample(sr, 16_000)(y)[None, ...] # (B, 1, T_16)
65
+
66
+ with torch.no_grad():
67
+ fsq_codes = model.encode_code(y)
68
+ # fsq_codes = model.encode_code(librosa.ex("libri1")) # or directly pass your filepath!
69
+ print(f"Codes shape: {fsq_codes.shape}")
70
+ recon = model.decode_code(fsq_codes).cpu() # (B, 1, T_24)
71
+
72
+ torchaudio.save("reconstructed.wav", recon[0, :, :], 24_000)
73
+ ```
74
+
75
+ ## Training Details
76
+
77
+ The model was trained using the same data as the full model, with an additional distillation loss (MSE between distilled and original encoder ouputs).
meta.yaml ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ author: neuphonic
2
+ license: apache-2.0
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:adace21f679b30f071c02e0cb3502d965ab08b50be936a5e81944674a5ae101e
3
+ size 1025488162