HuggingFaceM4/faster-qwen3-tts-demo · error: Offset increment outside graph capture :(

gabbo1995

Feb 26

When I try it, an error appears in red and says:
Generation failed: Offset increment outside graph capture encountered unexpectedly.

could be due to the random seed in the generator, but I'm not sure.

gabbo1995

Feb 26

Now it says:

Generation failed: CUDA error: device-side assert triggered Search for cudaErrorAssert' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.

:(

andito

HuggingFaceM4 org Feb 26

live debugging 😅

gabbo1995

Feb 26

Sorry for the insistence, I just got carried away by curiosity. <3 @andito

andito

HuggingFaceM4 org Feb 26

No worries. It took me some time but I figured out the issue. Lots of people trying to use the space at the same time. I set up a queue for inference, but because there were five possible models, and I couldn't load them all together in the GPU, I had to hot swap them. In that hot swapping there were issues. I decided to solve it by disabling 4 of the models in this space and only keeping the voice cloning demo with the large model. It's, to me, the coolest. Also I am providing 3 default voices, which are amazing.
If you want to clone the space, you can set the unset the env variable for model selection and you get again all five, and can then experiment. But here with the traffic I'm seeing, it's hard to solve it differently. Spaces weren't really designed to host many models and do hot swapping. And with zero GPU I would need to do the graph before inference so no one would experience the low latency in inference, which is the whole point here.

andito

HuggingFaceM4 org Feb 26

Still, leaving the thread open in case there are other issues. I also added a queue for requests, maybe I'll manage to get an OOM if enough people queue their voices to clone xD

greendra

Feb 26

•

edited Feb 26

Generation failed: CUDA error: device-side assert triggered Search for cudaErrorAssert' in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information. CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA` to enable device-side assertions.

Still happening for me!

andito

HuggingFaceM4 org Feb 27

Yes, this time it was because someone submitted a super long text. The pain is that once it breaks, I need to restart the space to recover. I added a guard against long texts

andito changed discussion status to closed Mar 5

techietech22

14 days ago

@andito hey what causes this offset issue if i may ask, i have been trying to implement parallel generation with another tts which has cuda graph in it. and it shows the same error so how do we solve it. (even with making it open multiple instances its not working GPU : gh200)