Fish Audio S2 Pro -- INT8 Quantized (AMD ROCm optimized)

This is an INT8 weight-only quantized version of fishaudio/s2-pro, optimized for AMD GPU inference via ROCm.

Maintained by Imagilux as part of the Imagilux/fish-speech fork, which adds AMD ROCm support, VRAM management, and CPU offloading to the original Fish Speech project.

What changed

  • INT8 weight-only quantization of the DualAR transformer (4B params) using symmetric per-channel quantization
  • Model size reduced from ~10.3 GB (bf16) to ~5.1 GB, enabling inference on 16 GB VRAM cards
  • Embeddings, layer norms, and the VQ-GAN codec remain in bf16
  • Quality impact is negligible (~0.04% WER difference from bf16)

Intended use

This checkpoint is designed for AMD ROCm users running Fish Speech on VRAM-constrained consumer GPUs (RX 7900 XTX, RX 9070 XT, etc.) through the Imagilux/fish-speech fork.

It also works on NVIDIA CUDA devices.

Quick start (Docker, AMD ROCm)

git clone https://github.com/Imagilux/fish-speech.git
cd fish-speech

# Download this checkpoint
huggingface-cli download Imagilux/fishaudio-s2-pro --local-dir checkpoints/fish-speech-s2-pro-int8

# Start the WebUI
docker compose -f compose.yml -f compose.rocm.yml --profile webui up

See the Imagilux/fish-speech README for environment variables (VRAM_FRACTION, MAX_SEQ_LEN, OFFLOAD_WEIGHTS_TO_CPU, MIOPEN_FIND_MODE).

Benchmarks (RX 9070 XT 16GB, ROCm 7.2.1)

Configuration Render time VRAM used
INT8 + COMPILE=1 + MIOpen tuning 34.9s ~10.9 GB
bf16 + CPU offload (AVX-512) 100.1s ~1.9 GB

Original model

This quantized checkpoint is derived from fishaudio/s2-pro by the Fish Audio team. All original model architecture, training, and capabilities are their work.

If you find this useful, please cite the original authors:

@misc{liao2026fishaudios2technical,
      title={Fish Audio S2 Technical Report},
      author={Shijia Liao and Yuxuan Wang and Songting Liu and Yifan Cheng and Ruoyi Zhang and Tianyu Li and Shidong Li and Yisheng Zheng and Xingwei Liu and Qingzheng Wang and Zhizhuo Zhou and Jiahua Liu and Xin Chen and Dawei Han},
      year={2026},
      eprint={2603.08823},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2603.08823},
}

License

This model is licensed under the Fish Audio Research License, Copyright 39 AI, INC. All Rights Reserved.

Built with Fish Audio.

Research and non-commercial use is permitted free of charge. Commercial use requires a separate license from Fish Audio -- contact business@fish.audio. See LICENSE.md for full terms.

Downloads last month
42
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Imagilux/fishaudio-s2-pro

Paper for Imagilux/fishaudio-s2-pro