Fish Audio S2 Pro -- INT8 Quantized (AMD ROCm optimized)

This is an INT8 weight-only quantized version of fishaudio/s2-pro, optimized for AMD GPU inference via ROCm.

Maintained by Imagilux as part of the Imagilux/fish-speech fork, which adds AMD ROCm support, VRAM management, and CPU offloading to the original Fish Speech project.

What changed

INT8 weight-only quantization of the DualAR transformer (4B params) using symmetric per-channel quantization
Model size reduced from ~10.3 GB (bf16) to ~5.1 GB, enabling inference on 16 GB VRAM cards
Embeddings, layer norms, and the VQ-GAN codec remain in bf16
Quality impact is negligible (~0.04% WER difference from bf16)

Intended use

This checkpoint is designed for AMD ROCm users running Fish Speech on VRAM-constrained consumer GPUs (RX 7900 XTX, RX 9070 XT, etc.) through the Imagilux/fish-speech fork.

It also works on NVIDIA CUDA devices.

Quick start (Docker, AMD ROCm)

git clone https://github.com/Imagilux/fish-speech.git
cd fish-speech

# Download this checkpoint
huggingface-cli download Imagilux/fishaudio-s2-pro --local-dir checkpoints/fish-speech-s2-pro-int8

# Start the WebUI
docker compose -f compose.yml -f compose.rocm.yml --profile webui up

See the Imagilux/fish-speech README for environment variables (VRAM_FRACTION, MAX_SEQ_LEN, OFFLOAD_WEIGHTS_TO_CPU, MIOPEN_FIND_MODE).

Benchmarks (RX 9070 XT 16GB, ROCm 7.2.1)

Configuration	Render time	VRAM used
INT8 + COMPILE=1 + MIOpen tuning	34.9s	~10.9 GB
bf16 + CPU offload (AVX-512)	100.1s	~1.9 GB

Original model

This quantized checkpoint is derived from fishaudio/s2-pro by the Fish Audio team. All original model architecture, training, and capabilities are their work.

If you find this useful, please cite the original authors:

@misc{liao2026fishaudios2technical,
      title={Fish Audio S2 Technical Report},
      author={Shijia Liao and Yuxuan Wang and Songting Liu and Yifan Cheng and Ruoyi Zhang and Tianyu Li and Shidong Li and Yisheng Zheng and Xingwei Liu and Qingzheng Wang and Zhizhuo Zhou and Jiahua Liu and Xin Chen and Dawei Han},
      year={2026},
      eprint={2603.08823},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2603.08823},
}

License

Built with Fish Audio.

Research and non-commercial use is permitted free of charge. Commercial use requires a separate license from Fish Audio -- contact business@fish.audio. See LICENSE.md for full terms.

Downloads last month: 42

Collection including Imagilux/fishaudio-s2-pro

Fishaudio

Collection

Fish Speech TTS models — INT8 quantized for AMD ROCm and VRAM-constrained GPUs • 1 item • Updated 23 days ago

Paper for Imagilux/fishaudio-s2-pro

Fish Audio S2 Technical Report

Paper • 2603.08823 • Published Mar 9 • 37