Spaces:
Running
Running Qwen3 Locally?
Has anyone had any success getting Qwen3 to run locally? I've tried many different engines to no avail (olLama, vLLM, and currently, LM Studio). I can get the model to respond for a few prompts, but as the conversation grows, the responses become garbled.
I've finally managed to get everything to run (including the flash_attn 2 install on a windows machine/python 3.12/Cuda 12.8)
When running the python app.py command I get a kernel problem (although Cuda works locally - see at bottom of output)๐ (venv) c:\qwen3-tts>python app.py
Loading all models to CUDA...
Loading VoiceDesign 1.7B model...
Fetching 13 files: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 13/13 [00:00<?, ?it/s]
Fetching 0 files: 0it [00:00, ?it/s]
Traceback (most recent call last):
File "c:\qwen3-tts\venv\Lib\site-packages\kernels\utils.py", line 198, in install_kernel
return _find_kernel_in_repo_path(repo_path, package_name, variant_locks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\qwen3-tts\venv\Lib\site-packages\kernels\utils.py", line 220, in _find_kernel_in_repo_path
raise FileNotFoundError(
FileNotFoundError: Kernel at path C:\Users\ericv\.cache\huggingface\hub\models--kernels-community--flash-attn3\snapshots\5d9293232e0bea36728880ffeb631901a736387b does not have one of build variants: torch27-cu128-x86_64-windows, torch-cuda, torch-universal
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "c:\qwen3-tts\venv\Lib\site-packages\transformers\integrations\hub_kernels.py", line 207, in load_and_register_kernel
kernel = get_kernel(repo_id, revision=rev)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\qwen3-tts\venv\Lib\site-packages\kernels\utils.py", line 312, in get_kernel
package_name, variant_path = install_kernel(
^^^^^^^^^^^^^^^
File "c:\qwen3-tts\venv\Lib\site-packages\kernels\utils.py", line 200, in install_kernel
raise FileNotFoundError(
FileNotFoundError: Cannot install kernel from repo kernels-community/flash-attn3 (revision: main)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "c:\qwen3-tts\app.py", line 39, in
voice_design_model = Qwen3TTSModel.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\qwen3-tts\qwen_tts\inference\qwen3_tts_model.py", line 112, in from_pretrained
model = AutoModel.from_pretrained(pretrained_model_name_or_path, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\qwen3-tts\venv\Lib\site-packages\transformers\models\auto\auto_factory.py", line 604, in from_pretrained
return model_class.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\qwen3-tts\qwen_tts\core\models\modeling_qwen3_tts.py", line 1876, in from_pretrained
model = super().from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\qwen3-tts\venv\Lib\site-packages\transformers\modeling_utils.py", line 277, in _wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "c:\qwen3-tts\venv\Lib\site-packages\transformers\modeling_utils.py", line 4971, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\qwen3-tts\qwen_tts\core\models\modeling_qwen3_tts.py", line 1817, in init
super().init(config)
File "c:\qwen3-tts\venv\Lib\site-packages\transformers\modeling_utils.py", line 2076, in init
self.config._attn_implementation_internal = self._check_and_adjust_attn_implementation(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "c:\qwen3-tts\venv\Lib\site-packages\transformers\modeling_utils.py", line 2684, in _check_and_adjust_attn_implementation
raise e
File "c:\qwen3-tts\venv\Lib\site-packages\transformers\modeling_utils.py", line 2668, in _check_and_adjust_attn_implementation
load_and_register_kernel(applicable_attn_implementation)
File "c:\qwen3-tts\venv\Lib\site-packages\transformers\integrations\hub_kernels.py", line 209, in load_and_register_kernel
raise ValueError(f"An error occurred while trying to load from '{repo_id}': {e}.")
ValueError: An error occurred while trying to load from 'kernels-community/flash-attn3': Cannot install kernel from repo kernels-community/flash-attn3 (revision: main).
(venv) c:\qwen3-tts>import torch
'import' is not recognized as an internal or external command,
operable program or batch file.
(venv) c:\qwen3-tts>python
Python 3.12.9 (tags/v3.12.9:fdb8142, Feb 4 2025, 15:27:58) [MSC v.1942 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
import torch
torch.cuda.is_available()
True
x = torch.rand(5, 3)
print(x)
tensor([[0.2097, 0.9006, 0.8568],
[0.5569, 0.7345, 0.6012],
[0.8467, 0.6681, 0.4402],
[0.1582, 0.2006, 0.4323],
[0.1103, 0.7644, 0.8739]])
exit()
(venv) c:\qwen3-tts>
Does anybody have an idea what when wrong?