404 Repository Not Found when serving model Qwen3‑235B‑A22B‑Thinking‑2507‑FP8 with vLLM 0.9.0.1

by RekklesAI - opened Jul 30, 2025

Jul 30, 2025

•

edited Jul 30, 2025

When I try to start a vLLM server with the model Qwen3-235B-A22B-Thinking-2507-FP8, vLLM cannot find the repository on Hugging Face and terminates with a RepositoryNotFoundError (404).

Describe the issue
I’m trying to serve the model Qwen3-235B-A22B-Thinking-2507-FP8 with vLLM, but the Hugging Face Hub returns RepositoryNotFound (404) for this repo_id, and vLLM aborts.

Exact command

vllm serve Qwen3-235B-A22B-Thinking-2507-FP8
--tensor-parallel-size 2
--no-disable-sliding-window
--enable-auto-tool-choice
--tool-call-parser hermes
What I expected
vLLM should resolve the repo_id, download the model (or use local cache), and start the server.

What actually happened (abridged)
(base) root@ahg2:~# vllm serve Qwen3-235B-A22B-Thinking-2507-FP8 --tensor-parallel-size 2 --no-disable-sliding-window --enable-auto-tool-choice --tool-call-parser hermes
INFO 07-30 15:48:23 [init.py:243] Automatically detected platform cuda.
INFO 07-30 15:48:25 [init.py:31] Available plugins for group vllm.general_plugins:
INFO 07-30 15:48:25 [init.py:33] - lora_filesystem_resolver -> vllm.plugins.lora_resolvers.filesystem_resolver:register_filesystem_resolver
INFO 07-30 15:48:25 [init.py:36] All plugins in this group will be loaded. Set VLLM_PLUGINS to control which plugins to load.
INFO 07-30 15:48:27 [api_server.py:1289] vLLM API server version 0.9.0.1
INFO 07-30 15:48:27 [cli_args.py:300] non-default args: {'enable_auto_tool_choice': True, 'tool_call_parser': 'hermes', 'tensor_parallel_size': 2}
ERROR 07-30 15:48:28 [config.py:107] Error retrieving file list: 404 Client Error. (Request ID: Root=1-6889b22c-62debdd2748e55bc4d7de56a;ef626483-f610-4392-956d-8755d5a99480)
ERROR 07-30 15:48:28 [config.py:107]
ERROR 07-30 15:48:28 [config.py:107] Repository Not Found for url: https://huggingface.co/api/models/Qwen3-235B-A22B-Thinking-2507-FP8/tree/main?recursive=True&expand=False.
ERROR 07-30 15:48:28 [config.py:107] Please make sure you specified the correct repo_id and repo_type.
ERROR 07-30 15:48:28 [config.py:107] If you are trying to access a private or gated repo, make sure you are authenticated. For more details, see https://huggingface.co/docs/huggingface_hub/authentication, retrying 1 of 2
ERROR 07-30 15:48:30 [config.py:105] Error retrieving file list: 404 Client Error. (Request ID: Root=1-6889b22e-3981fec13bf28c95166d96fe;17271689-9773-4cc4-98c1-7129258d6b3a)
ERROR 07-30 15:48:30 [config.py:105]
ERROR 07-30 15:48:30 [config.py:105] Repository Not Found for url: https://huggingface.co/api/models/Qwen3-235B-A22B-Thinking-2507-FP8/tree/main?recursive=True&expand=False.
ERROR 07-30 15:48:30 [config.py:105] Please make sure you specified the correct repo_id and repo_type.
ERROR 07-30 15:48:30 [config.py:105] If you are trying to access a private or gated repo, make sure you are authenticated. For more details, see https://huggingface.co/docs/huggingface_hub/authentication
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_http.py", line 409, in hf_raise_for_status
response.raise_for_status()
File "/usr/local/lib/python3.10/dist-packages/requests/models.py", line 1024, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/api/models/Qwen3-235B-A22B-Thinking-2507-FP8/tree/main?recursive=True&expand=False

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/vllm/transformers_utils/config.py", line 279, in get_config
if is_gguf or file_or_path_exists(
File "/usr/local/lib/python3.10/dist-packages/vllm/transformers_utils/config.py", line 184, in file_or_path_exists
return file_exists(str(model),
File "/usr/local/lib/python3.10/dist-packages/vllm/transformers_utils/config.py", line 159, in file_exists
file_list = list_repo_files(repo_id,
File "/usr/local/lib/python3.10/dist-packages/vllm/transformers_utils/config.py", line 148, in list_repo_files
return with_retry(lookup_files, "Error retrieving file list")
File "/usr/local/lib/python3.10/dist-packages/vllm/transformers_utils/config.py", line 102, in with_retry
return func()
File "/usr/local/lib/python3.10/dist-packages/vllm/transformers_utils/config.py", line 138, in lookup_files
return hf_list_repo_files(repo_id,
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/hf_api.py", line 3003, in list_repo_files
return [
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/hf_api.py", line 3003, in
return [
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/hf_api.py", line 3140, in list_repo_tree
for path_info in paginate(path=tree_url, headers=headers, params={"recursive": recursive, "expand": expand}):
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_pagination.py", line 37, in paginate
hf_raise_for_status(r)
File "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_http.py", line 459, in hf_raise_for_status
raise _format(RepositoryNotFoundError, message, response) from e
huggingface_hub.errors.RepositoryNotFoundError: 404 Client Error. (Request ID: Root=1-6889b22e-3981fec13bf28c95166d96fe;17271689-9773-4cc4-98c1-7129258d6b3a)

Environment
yaml
Copy code
Python: 3.13.2 (Anaconda)
Platform: Linux-6.8.0-64-generic-x86_64-with-glibc2.35
vLLM: [pending — will add output of vllm --version]
huggingface_hub: 0.34.3
PyTorch: 2.7.1+cu126
CUDA available: True
Torch CUDA: 12.6
GPUs: 2x NVIDIA H200 NVL
NVIDIA driver: 550.163.01 (CUDA reported by driver: 12.4)
nvcc: not installed

Auth status

Logged in: yes (fine-grained token present; canReadGatedRepos = True)
Model lookup

HfApi().model_info("Qwen3-235B-A22B-Thinking-2507-FP8")
-> 404 Repository Not Found
What I’ve checked
Verified auth: I’m logged in and can access gated repos with this token.

Cleared local cache in prior attempts.

Confirmed that the repo_id Qwen3-235B-A22B-Thinking-2507-FP8 returns 404 via API and browser.

Will add which vllm and vllm --version from the runtime environment if needed.

Questions
Is this model private, renamed, or removed from the Hub?

If it was migrated, what is the current repo_id?

If access is restricted, what is the correct procedure to request access?

Thanks!

RekklesAI changed discussion status to closed Jul 30, 2025

RekklesAI

Jul 30, 2025

Sorry, I just realized I made a silly mistake.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment