Official documentation doesn't work for CPU backend

#19
by Fl1ntSt0n3 - opened

Using the official documentation page to spawn a new local instance using CPU doesn't work:
https://speech.fish.audio/install/#docker-setup

First, it miss a paragraph explaining that you need to download the checkpoints, yes it is written on the Dockerfile but the whole point of compose it to not worry about the image itself.

So don't forget to download the checkpoints first:
hf download fishaudio/s2-pro --local-dir ./checkpoints/s2-pro

Then, even if BACKEND=CPU is set, the official images can't set torchaudio properly:

2026-03-25 11:18:56.682 | INFO     | __main__:<module>:74 - Decoder model loaded, warming up...
Traceback (most recent call last):
  File "/app/tools/run_webui.py", line 77, in <module>
    inference_engine = TTSInferenceEngine(
                       ^^^^^^^^^^^^^^^^^^^
  File "/app/fish_speech/inference_engine/__init__.py", line 32, in __init__
    super().__init__()
  File "/app/fish_speech/inference_engine/reference_loader.py", line 39, in __init__
    backends = torchaudio.list_audio_backends()
               ^^^^^^^^^^
UnboundLocalError: cannot access local variable 'torchaudio' where it is not associated with a value

It needs this fix on fish_speech/inference_engine/reference_loader.py line 48:

Replace:

import torchaudio.io._load_audio_fileobj  # noqa: F401

With:

from importlib import import_module
import_module("torchaudio.io._load_audio_fileobj")

I hope this will help you folks!

Fix indentation.

Fl1ntSt0n3 changed discussion status to closed

Sign up or log in to comment