Tested on B300 and got roughly the same output as FP8

#4
by O-delicious - opened
# Build and install flashinfer from source with fp4 quantization fix.
# Keep this aligned with the v0.5.9 base image's flashinfer_python version.
RUN --mount=type=cache,target=/root/.cache/pip \
    --mount=type=cache,target=/sgl-workspace/flashinfer-build \
    git config --global http.proxy "${http_proxy:-$HTTP_PROXY}" && \
    git config --global https.proxy "${https_proxy:-$HTTPS_PROXY}" && \
    if [ ! -d /sgl-workspace/flashinfer-build/flashinfer/.git ]; then \
        rm -rf /sgl-workspace/flashinfer-build/flashinfer && \
        git clone https://github.com/flashinfer-ai/flashinfer.git /sgl-workspace/flashinfer-build/flashinfer; \
    fi && \
    cd /sgl-workspace/flashinfer-build/flashinfer && \
    git fetch --tags origin && \
    git checkout -f v0.6.3 && \
    git submodule sync --recursive && \
    git submodule update --init --recursive --force && \
    git config user.email "build@example.com" && \
    git config user.name "Build" && \
    git remote add nvjullin https://github.com/nvjullin/flashinfer 2>/dev/null || true && \
    git fetch nvjullin fix-fp4-quant-padding && \
    git cherry-pick --skip || true && \
    git cherry-pick a022c4d4 72d6572b && \
    git submodule sync --recursive && \
    git submodule update --init --recursive --force && \
    test -f 3rdparty/spdlog/include/spdlog/sinks/stdout_color_sinks.h && \
    cd flashinfer-jit-cache && \
    MAX_JOBS=32 FLASHINFER_NVCC_THREADS=2 FLASHINFER_CUDA_ARCH_LIST="10.0a 10.3a" python -m build --no-isolation --skip-dependency-check --wheel && \
    python -m pip install dist/*.whl

Change a bit on the cherry-pick commit id.

Hosted with

python3 -m sglang.launch_server \
          --model /data/GLM-5-NVFP4 \
          --port 8000 \
          --tensor-parallel-size 8 \
          --ep-size 8 \
          --quantization modelopt_fp4 \ 
          --tool-call-parser glm47 \
          --reasoning-parser glm45 \
          --trust-remote-code \
          --chunked-prefill-size 16384 \
          --max-prefill-tokens 4096 \
          --mem-fraction-static 0.80 \
          --max-running-requests 32 \
          --disable-custom-all-reduce \
          --served-model-name glm-5-fp8

And I got benchmark in B300

#Input tokens: 1048576
#Output tokens: 16384
Starting initial single prompt test run...
Backend: sglang
API URL: http://127.0.0.1:8000/generate
Base URL: http://127.0.0.1:8000
Model ID: glm-5-fp8
Host: 0.0.0.0, Port: 30000
Testing base URL connectivity...
Base URL http://127.0.0.1:8000 responded with status: 200
Testing API endpoint http://127.0.0.1:8000/generate...
API URL http://127.0.0.1:8000/generate responded with status: 400
Test prompt length: 65536
Test output length: 32
Initial test run completed. Starting main benchmark run...
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 16/16 [01:55<00:00,  7.20s/it]

============ Serving Benchmark Result ============
Backend:                                 sglang
Traffic request rate:                    2.0
Max reqeuest concurrency:                16
Successful requests:                     16
Benchmark duration (s):                  115.13
Total input tokens:                      1048576
Total generated tokens:                  16384
Total generated tokens (retokenized):    16384
Request throughput (req/s):              0.14
Input token throughput (tok/s):          9107.59
Output token throughput (tok/s):         142.31
Total token throughput (tok/s):          9249.90
Concurrency:                             15.74
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   113289.81
Median E2E Latency (ms):                 113700.03
---------------Time to First Token----------------
Mean TTFT (ms):                          41681.64
Median TTFT (ms):                        42160.25
P99 TTFT (ms):                           75511.16
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          70.00
Median TPOT (ms):                        69.93
P99 TPOT (ms):                           105.97
---------------Inter-token Latency----------------
Mean ITL (ms):                           70.00
Median ITL (ms):                         33.94
P99 ITL (ms):                            34.55
==================================================

It turns out that it is roughly the same throughput compared to GLM-FP8 hosted with vllm. (16 concurrency, 64k/1k in/out)

Is it a normal result?

With speculative decoding we can achieve

============ Serving Benchmark Result ============
Backend:                                 sglang
Traffic request rate:                    2.0
Max reqeuest concurrency:                16
Successful requests:                     16
Benchmark duration (s):                  96.43
Total input tokens:                      1048576
Total generated tokens:                  16384
Total generated tokens (retokenized):    16384
Request throughput (req/s):              0.17
Input token throughput (tok/s):          10873.56
Output token throughput (tok/s):         169.90
Total token throughput (tok/s):          11043.46
Concurrency:                             15.67
----------------End-to-End Latency----------------
Mean E2E Latency (ms):                   94425.83
Median E2E Latency (ms):                 94824.34
---------------Time to First Token----------------
Mean TTFT (ms):                          42753.15
Median TTFT (ms):                        43100.63
P99 TTFT (ms):                           78462.45
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          50.51
Median TPOT (ms):                        50.56
P99 TPOT (ms):                           87.76
---------------Inter-token Latency----------------
Mean ITL (ms):                           201.26
Median ITL (ms):                         49.49
P99 ITL (ms):                            55.94
==================================================

But it seems not stable. It never survived 8h and crashed with error:


[WARNING] batch_size=11711 exceeds shared mem limit, falling back to low-smem kernel
[WARNING] batch_size=11711 exceeds shared mem limit, falling back to low-smem kernel
[WARNING] batch_size=11711 exceeds shared mem limit, falling back to low-smem kernel
[WARNING] batch_size=11711 exceeds shared mem limit, falling back to low-smem kernel
[WARNING] batch_size=11711 exceeds shared mem limit, falling back to low-smem kernel
[WARNING] batch_size=11711 exceeds shared mem limit, falling back to low-smem kernel
[WARNING] batch_size=11711 exceeds shared mem limit, falling back to low-smem kernel
[WARNING] batch_size=11711 exceeds shared mem limit, falling back to low-smem kernel
[WARNING] batch_size=11711 exceeds shared mem limit, falling back to low-smem kernel
[WARNING] batch_size=11711 exceeds shared mem limit, falling back to low-smem kernel
[WARNING] batch_size=11711 exceeds shared mem limit, falling back to low-smem kernel
[WARNING] batch_size=11711 exceeds shared mem limit, falling back to low-smem kernel
[WARNING] batch_size=11711 exceeds shared mem limit, falling back to low-smem kernel
[WARNING] batch_size=11711 exceeds shared mem limit, falling back to low-smem kernel
[WARNING] batch_size=11711 exceeds shared mem limit, falling back to low-smem kernel
[WARNING] batch_size=11711 exceeds shared mem limit, falling back to low-smem kernel
[2026-03-24 07:17:20 TP0] Prefill batch, #new-seq: 1, #new-token: 11712, #cached-token: 0, token usage: 0.04, #running-req: 1, #queue-req: 0, input throughput (token/s): 15732.26, cuda graph: False
[2026-03-24 07:17:21 TP0] Decode batch, #running-req: 1, #token: 47104, token usage: 0.02, accept len: 3.44, accept rate: 0.86, cuda graph: True, gen throughput (token/s): 53.02, #queue-req: 0
[2026-03-24 07:17:22 TP0] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3171, in run_scheduler_process
    scheduler.event_loop_normal()
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1127, in event_loop_normal
    self.self_check_during_idle()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler_runtime_checker_mixin.py", line 332, in self_check_during_idle
    self.check_memory()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler_runtime_checker_mixin.py", line 244, in check_memory
    raise_error_or_warn(
  File "/sgl-workspace/sglang/python/sglang/srt/utils/common.py", line 4082, in raise_error_or_warn
    raise ValueError(message)
ValueError: token_to_kv_pool_allocator memory leak detected! self.max_total_num_tokens=2954112, available_size=18048, evictable_size=2935808, protected_size=0


[2026-03-24 07:17:22 TP1] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3171, in run_scheduler_process
    scheduler.event_loop_normal()
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1127, in event_loop_normal
    self.self_check_during_idle()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler_runtime_checker_mixin.py", line 332, in self_check_during_idle
    self.check_memory()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler_runtime_checker_mixin.py", line 244, in check_memory
    raise_error_or_warn(
  File "/sgl-workspace/sglang/python/sglang/srt/utils/common.py", line 4082, in raise_error_or_warn
    raise ValueError(message)
ValueError: token_to_kv_pool_allocator memory leak detected! self.max_total_num_tokens=2954112, available_size=18048, evictable_size=2935808, protected_size=0


[2026-03-24 07:17:22 TP4] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3171, in run_scheduler_process
    scheduler.event_loop_normal()
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1127, in event_loop_normal
    self.self_check_during_idle()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler_runtime_checker_mixin.py", line 332, in self_check_during_idle
    self.check_memory()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler_runtime_checker_mixin.py", line 244, in check_memory
    raise_error_or_warn(
  File "/sgl-workspace/sglang/python/sglang/srt/utils/common.py", line 4082, in raise_error_or_warn
    raise ValueError(message)
ValueError: token_to_kv_pool_allocator memory leak detected! self.max_total_num_tokens=2954112, available_size=18048, evictable_size=2935808, protected_size=0


[2026-03-24 07:17:22 TP7] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3171, in run_scheduler_process
    scheduler.event_loop_normal()
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1127, in event_loop_normal
    self.self_check_during_idle()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler_runtime_checker_mixin.py", line 332, in self_check_during_idle
    self.check_memory()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler_runtime_checker_mixin.py", line 244, in check_memory
    raise_error_or_warn(
  File "/sgl-workspace/sglang/python/sglang/srt/utils/common.py", line 4082, in raise_error_or_warn
    raise ValueError(message)
ValueError: token_to_kv_pool_allocator memory leak detected! self.max_total_num_tokens=2954112, available_size=18048, evictable_size=2935808, protected_size=0


[2026-03-24 07:17:22 TP3] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3171, in run_scheduler_process
    scheduler.event_loop_normal()
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1127, in event_loop_normal
    self.self_check_during_idle()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler_runtime_checker_mixin.py", line 332, in self_check_during_idle
    self.check_memory()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler_runtime_checker_mixin.py", line 244, in check_memory
    raise_error_or_warn(
  File "/sgl-workspace/sglang/python/sglang/srt/utils/common.py", line 4082, in raise_error_or_warn
    raise ValueError(message)
ValueError: token_to_kv_pool_allocator memory leak detected! self.max_total_num_tokens=2954112, available_size=18048, evictable_size=2935808, protected_size=0


[2026-03-24 07:17:22 TP2] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3171, in run_scheduler_process
    scheduler.event_loop_normal()
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1127, in event_loop_normal
    self.self_check_during_idle()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler_runtime_checker_mixin.py", line 332, in self_check_during_idle
    self.check_memory()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler_runtime_checker_mixin.py", line 244, in check_memory
    raise_error_or_warn(
  File "/sgl-workspace/sglang/python/sglang/srt/utils/common.py", line 4082, in raise_error_or_warn
    raise ValueError(message)
ValueError: token_to_kv_pool_allocator memory leak detected! self.max_total_num_tokens=2954112, available_size=18048, evictable_size=2935808, protected_size=0


[2026-03-24 07:17:22] SIGQUIT received. signum=None, frame=None. It usually means one child failed.
[2026-03-24 07:17:22 TP5] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3171, in run_scheduler_process
    scheduler.event_loop_normal()
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1127, in event_loop_normal
    self.self_check_during_idle()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler_runtime_checker_mixin.py", line 332, in self_check_during_idle
    self.check_memory()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler_runtime_checker_mixin.py", line 244, in check_memory
    raise_error_or_warn(
  File "/sgl-workspace/sglang/python/sglang/srt/utils/common.py", line 4082, in raise_error_or_warn
    raise ValueError(message)
ValueError: token_to_kv_pool_allocator memory leak detected! self.max_total_num_tokens=2954112, available_size=18048, evictable_size=2935808, protected_size=0


[2026-03-24 07:17:22 TP6] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3171, in run_scheduler_process
    scheduler.event_loop_normal()
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1127, in event_loop_normal
    self.self_check_during_idle()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler_runtime_checker_mixin.py", line 332, in self_check_during_idle
    self.check_memory()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler_runtime_checker_mixin.py", line 244, in check_memory
    raise_error_or_warn(
  File "/sgl-workspace/sglang/python/sglang/srt/utils/common.py", line 4082, in raise_error_or_warn
    raise ValueError(message)
ValueError: token_to_kv_pool_allocator memory leak detected! self.max_total_num_tokens=2954112, available_size=18048, evictable_size=2935808, protected_size=0


[2026-03-24 07:17:23] SIGQUIT received. signum=None, frame=None. It usually means one child failed.
[2026-03-24 07:17:23] SIGQUIT received. signum=None, frame=None. It usually means one child failed.
[2026-03-24 07:17:23] SIGQUIT received. signum=None, frame=None. It usually means one child failed.
[2026-03-24 07:17:23] SIGQUIT received. signum=None, frame=None. It usually means one child failed.
[2026-03-24 07:17:24] SIGQUIT received. signum=None, frame=None. It usually means one child failed.
[2026-03-24 07:17:24] ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "uvloop/loop.pyx", line 1512, in uvloop.loop.Loop.run_until_complete
  File "uvloop/loop.pyx", line 1505, in uvloop.loop.Loop.run_until_complete
  File "uvloop/loop.pyx", line 1379, in uvloop.loop.Loop.run_forever
  File "uvloop/loop.pyx", line 557, in uvloop.loop.Loop._run
  File "uvloop/loop.pyx", line 476, in uvloop.loop.Loop._on_idle
  File "uvloop/cbhandles.pyx", line 83, in uvloop.loop.Handle._run
  File "uvloop/cbhandles.pyx", line 61, in uvloop.loop.Handle._run
  File "/sgl-workspace/sglang/python/sglang/srt/managers/tokenizer_manager.py", line 2511, in running_phase_sigquit_handler
    kill_process_tree(os.getpid())
  File "/sgl-workspace/sglang/python/sglang/srt/utils/common.py", line 1153, in kill_process_tree
    sys.exit(0)
SystemExit: 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/uvicorn/protocols/http/h11_impl.py", line 410, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/uvicorn/middleware/proxy_headers.py", line 60, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/fastapi/applications.py", line 1134, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.12/dist-packages/starlette/applications.py", line 107, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.12/dist-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.12/dist-packages/starlette/middleware/cors.py", line 87, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.12/dist-packages/starlette/middleware/exceptions.py", line 63, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/usr/local/lib/python3.12/dist-packages/starlette/_exception_handler.py", line 42, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.12/dist-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.12/dist-packages/starlette/routing.py", line 716, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.12/dist-packages/starlette/routing.py", line 736, in app
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.12/dist-packages/starlette/routing.py", line 290, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.12/dist-packages/fastapi/routing.py", line 119, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/usr/local/lib/python3.12/dist-packages/starlette/_exception_handler.py", line 42, in wrapped_app
    await app(scope, receive, sender)
  File "/usr/local/lib/python3.12/dist-packages/fastapi/routing.py", line 106, in app
    await response(scope, receive, send)
  File "/usr/local/lib/python3.12/dist-packages/starlette/responses.py", line 280, in __call__
    await self.background()
  File "/usr/local/lib/python3.12/dist-packages/starlette/background.py", line 36, in __call__
    await task()
  File "/usr/local/lib/python3.12/dist-packages/starlette/background.py", line 21, in __call__
    await self.func(*self.args, **self.kwargs)
  File "/sgl-workspace/sglang/python/sglang/srt/managers/tokenizer_manager.py", line 1431, in abort_request
    await asyncio.sleep(2)
  File "/usr/lib/python3.12/asyncio/tasks.py", line 665, in sleep
    return await future
           ^^^^^^^^^^^^
asyncio.exceptions.CancelledError
[2026-03-24 07:17:24] ERROR:    Traceback (most recent call last):
  File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "uvloop/loop.pyx", line 1512, in uvloop.loop.Loop.run_until_complete
  File "uvloop/loop.pyx", line 1505, in uvloop.loop.Loop.run_until_complete
  File "uvloop/loop.pyx", line 1379, in uvloop.loop.Loop.run_forever
  File "uvloop/loop.pyx", line 557, in uvloop.loop.Loop._run
  File "uvloop/loop.pyx", line 476, in uvloop.loop.Loop._on_idle
  File "uvloop/cbhandles.pyx", line 83, in uvloop.loop.Handle._run
  File "uvloop/cbhandles.pyx", line 61, in uvloop.loop.Handle._run
  File "/sgl-workspace/sglang/python/sglang/srt/managers/tokenizer_manager.py", line 2511, in running_phase_sigquit_handler
    kill_process_tree(os.getpid())
  File "/sgl-workspace/sglang/python/sglang/srt/utils/common.py", line 1153, in kill_process_tree
    sys.exit(0)
SystemExit: 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/starlette/routing.py", line 701, in lifespan
    await receive()
  File "/usr/local/lib/python3.12/dist-packages/uvicorn/lifespan/on.py", line 137, in receive
    return await self.receive_queue.get()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/queues.py", line 158, in get
    await getter
asyncio.exceptions.CancelledError

[W324 07:17:41.738975876 AllocatorConfig.cpp:28] Warning: PYTORCH_CUDA_ALLOC_CONF is deprecated, use PYTORCH_ALLOC_CONF instead (function operator())

Sign up or log in to comment