Model initialization takes a long time after files are downloaded — is this expected?

#1
by shadowT - opened

Hi,
thank you for sharing this model.

I’m using the model with PyTorch 2.8 and loading it via AutoModel.from_pretrained.
The model files finish downloading successfully, but the initialization step (before control returns from from_pretrained) takes a long time and appears to be “stuck” with no logs or progress.

From what I can see, this happens after the download phase, likely during model initialization.

Could you please confirm:

Is this long delay during the first load expected?

Just want to make sure this behavior is normal and not a misconfiguration on my side.

Thanks in advance

Tomoro AI Ltd org

Hi @shadowT , we didn’t experience this issue, and it’s unexpected. Could you make sure your torch and torchvision is properly installed with cuda support? What’s your GPU type?

Thank you for your prompt reply @hxssgaa .

Half an hour after the file download and installation completed, I received the following warnings:

NOTE: Redirects are currently not supported in Windows or MacOs.
WARNING: AutoScheme is currently supported only on Linux.
WARNING: Better backend found, please install all the following requirements to enable it:
pip install -v "gptqmodel>=2.0" --no-build-isolation
pip install 'numpy<2.0'

I have another model that requires NumPy >= 2.0, so I cannot downgrade.
When I try to install gptqmodel, it says it is not supported on Windows.

My setup:

GPU: NVIDIA GeForce RTX 3060 12GB

CUDA version: 13.0

Is there an alternative to gptqmodel that works on Windows with these constraints?

I will still try to run the model and see if it works.

Tomoro AI Ltd org

hi @shadowT , I would highly recommend using wsl2 if possible for windows machine as it’s tested. We haven’t tested running directly on windows and we aren’t sure about dependencies at the moment.

Hi @hxssgaa , I'll try WSL2 later.

For now, it seems to be working fine on Windows:

q = EmbeddingM().encode_queries(queries)
d = EmbeddingM().encode_docs(doc_urls)

s = EmbeddingM().get_score(q, d)
print(s)

encode_queries executed in 1.438 seconds
encode_docs executed in 1.407 seconds
tensor([[8.5625, 8.5625], [7.9688, 7.9688]])

encode_queries executed in 0.771 seconds
encode_docs executed in 0.823 seconds
tensor([[8.5625, 8.5625], [7.9688, 7.9688]])

So far, the encoding seems stable, though Windows may still have limitations for some dependencies.

Thanks for your help.

shadowT changed discussion status to closed

Sign up or log in to comment