BlueCodec β speech autoencoder (codec only)
This repository publishes only the neural audio codec used by BlueTTS: a 44.1 kHz speech autoencoder that maps waveforms to a low-rate continuous latent sequence and back. It is not a full TTS model (no text encoder, duration model, or flow stack).
| If you need⦠| Use |
|---|---|
| End-to-end ONNX TTS | notmax123/blue-onnx + BlueTTS |
| Full PyTorch stack + stats (training / voice export) | notmax123/blue β includes blue_codec.safetensors alongside TTL/DP weights |
| Training the codec from scratch | maxmelichov/blue-codec (standalone repo & training doc) |
Project home: https://github.com/maxmelichov/BlueTTS Β· Live demo: Hugging Face Space β notmax123/Blue
What it does
- Encoder: waveform β spectrogram features β 24-dimensional latents at ~86 Hz (compact trajectory for downstream TTS).
- Decoder: latents β high-quality 44.1 kHz audio (causal stack + vocoder head).
Downstream BlueTTS modules (flow matching, duration, text-to-latent) run in this latent space; keeping synthesis lightweight and fast.
Architecture (summary)
| Piece | Details |
|---|---|
| Input | 1253-channel spectrogram (1025 log-linear + 228 log-mel; FFT 2048, hop 512) |
| Encoder (~25.6M params) | Conv1d stem (1253β512) + 10 ConvNeXt blocks + projection (512β24) |
| Decoder (~25.3M params) | CausalConv1d stem (24β512) + 10 causal dilated ConvNeXt blocks + vocoder head |
| Latent | 24-D @ ~86 Hz |
Checkpoint in this repo
| File | Role |
|---|---|
model.safetensors |
Encoder + decoder weights (Safetensors). State dict keys are typically prefixed with encoder.* and decoder.*. |
*(An older naming convention in some local scripts is ae_latest.safetensors; the file served from this Hub repo is model.safetensors.)*
Download
hf download notmax123/blue-codec --repo-type model --local-dir ./blue_codec_only
Equivalent:
huggingface-cli download notmax123/blue-codec --repo-type model --local-dir ./blue_codec_only
Repo id is case-sensitive: notmax123/blue-codec.
License
MIT β align usage with BlueTTS and the blue-codec repository for any training or redistribution terms that apply to your use case.