Fish Audio S2 Pro -- INT8 Quantized (AMD ROCm optimized)
This is an INT8 weight-only quantized version of fishaudio/s2-pro, optimized for AMD GPU inference via ROCm.
Maintained by Imagilux as part of the Imagilux/fish-speech fork, which adds AMD ROCm support, VRAM management, and CPU offloading to the original Fish Speech project.
What changed
- INT8 weight-only quantization of the DualAR transformer (4B params) using symmetric per-channel quantization
- Model size reduced from ~10.3 GB (bf16) to ~5.1 GB, enabling inference on 16 GB VRAM cards
- Embeddings, layer norms, and the VQ-GAN codec remain in bf16
- Quality impact is negligible (~0.04% WER difference from bf16)
Intended use
This checkpoint is designed for AMD ROCm users running Fish Speech on VRAM-constrained consumer GPUs (RX 7900 XTX, RX 9070 XT, etc.) through the Imagilux/fish-speech fork.
It also works on NVIDIA CUDA devices.
Quick start (Docker, AMD ROCm)
git clone https://github.com/Imagilux/fish-speech.git
cd fish-speech
# Download this checkpoint
huggingface-cli download Imagilux/fishaudio-s2-pro --local-dir checkpoints/fish-speech-s2-pro-int8
# Start the WebUI
docker compose -f compose.yml -f compose.rocm.yml --profile webui up
See the Imagilux/fish-speech README for environment variables (VRAM_FRACTION, MAX_SEQ_LEN, OFFLOAD_WEIGHTS_TO_CPU, MIOPEN_FIND_MODE).
Benchmarks (RX 9070 XT 16GB, ROCm 7.2.1)
| Configuration | Render time | VRAM used |
|---|---|---|
| INT8 + COMPILE=1 + MIOpen tuning | 34.9s | ~10.9 GB |
| bf16 + CPU offload (AVX-512) | 100.1s | ~1.9 GB |
Original model
This quantized checkpoint is derived from fishaudio/s2-pro by the Fish Audio team. All original model architecture, training, and capabilities are their work.
If you find this useful, please cite the original authors:
@misc{liao2026fishaudios2technical,
title={Fish Audio S2 Technical Report},
author={Shijia Liao and Yuxuan Wang and Songting Liu and Yifan Cheng and Ruoyi Zhang and Tianyu Li and Shidong Li and Yisheng Zheng and Xingwei Liu and Qingzheng Wang and Zhizhuo Zhou and Jiahua Liu and Xin Chen and Dawei Han},
year={2026},
eprint={2603.08823},
archivePrefix={arXiv},
primaryClass={cs.SD},
url={https://arxiv.org/abs/2603.08823},
}
License
This model is licensed under the Fish Audio Research License, Copyright 39 AI, INC. All Rights Reserved.
Built with Fish Audio.
Research and non-commercial use is permitted free of charge. Commercial use requires a separate license from Fish Audio -- contact business@fish.audio. See LICENSE.md for full terms.
- Downloads last month
- 42