TomoroAI/tomoro-ai-colqwen3-embed-4b-awq · Model initialization takes a long time after files are downloaded

Model initialization takes a long time after files are downloaded — is this expected?

by shadowT - opened Dec 18, 2025

Dec 18, 2025

Hi,
thank you for sharing this model.

I’m using the model with PyTorch 2.8 and loading it via AutoModel.from_pretrained.
The model files finish downloading successfully, but the initialization step (before control returns from from_pretrained) takes a long time and appears to be “stuck” with no logs or progress.

From what I can see, this happens after the download phase, likely during model initialization.

Could you please confirm:

Is this long delay during the first load expected?

Just want to make sure this behavior is normal and not a misconfiguration on my side.

Thanks in advance

hxssgaa

Tomoro AI Ltd org Dec 18, 2025

Hi @shadowT , we didn’t experience this issue, and it’s unexpected. Could you make sure your torch and torchvision is properly installed with cuda support? What’s your GPU type?

shadowT

Dec 18, 2025

•

edited Dec 18, 2025

Thank you for your prompt reply @hxssgaa .

Half an hour after the file download and installation completed, I received the following warnings:

NOTE: Redirects are currently not supported in Windows or MacOs.
WARNING: AutoScheme is currently supported only on Linux.
WARNING: Better backend found, please install all the following requirements to enable it:
pip install -v "gptqmodel>=2.0" --no-build-isolation
pip install 'numpy<2.0'

I have another model that requires NumPy >= 2.0, so I cannot downgrade.
When I try to install gptqmodel, it says it is not supported on Windows.

My setup:

GPU: NVIDIA GeForce RTX 3060 12GB

CUDA version: 13.0

Is there an alternative to gptqmodel that works on Windows with these constraints?

I will still try to run the model and see if it works.

hxssgaa

Tomoro AI Ltd org Dec 18, 2025

hi @shadowT , I would highly recommend using wsl2 if possible for windows machine as it’s tested. We haven’t tested running directly on windows and we aren’t sure about dependencies at the moment.

shadowT

Dec 18, 2025

Hi @hxssgaa , I'll try WSL2 later.

For now, it seems to be working fine on Windows:

q = EmbeddingM().encode_queries(queries)
d = EmbeddingM().encode_docs(doc_urls)

s = EmbeddingM().get_score(q, d)
print(s)

encode_queries executed in 1.438 seconds
encode_docs executed in 1.407 seconds
tensor([[8.5625, 8.5625], [7.9688, 7.9688]])

encode_queries executed in 0.771 seconds
encode_docs executed in 0.823 seconds
tensor([[8.5625, 8.5625], [7.9688, 7.9688]])

So far, the encoding seems stable, though Windows may still have limitations for some dependencies.

Thanks for your help.

shadowT changed discussion status to closed Dec 18, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment