Model initialization takes a long time after files are downloaded — is this expected?
Hi,
thank you for sharing this model.
I’m using the model with PyTorch 2.8 and loading it via AutoModel.from_pretrained.
The model files finish downloading successfully, but the initialization step (before control returns from from_pretrained) takes a long time and appears to be “stuck” with no logs or progress.
From what I can see, this happens after the download phase, likely during model initialization.
Could you please confirm:
Is this long delay during the first load expected?
Just want to make sure this behavior is normal and not a misconfiguration on my side.
Thanks in advance
Thank you for your prompt reply @hxssgaa .
Half an hour after the file download and installation completed, I received the following warnings:
NOTE: Redirects are currently not supported in Windows or MacOs.WARNING: AutoScheme is currently supported only on Linux.WARNING: Better backend found, please install all the following requirements to enable it:
pip install -v "gptqmodel>=2.0" --no-build-isolation
pip install 'numpy<2.0'
I have another model that requires NumPy >= 2.0, so I cannot downgrade.
When I try to install gptqmodel, it says it is not supported on Windows.
My setup:
GPU: NVIDIA GeForce RTX 3060 12GB
CUDA version: 13.0
Is there an alternative to gptqmodel that works on Windows with these constraints?
I will still try to run the model and see if it works.
Hi @hxssgaa , I'll try WSL2 later.
For now, it seems to be working fine on Windows:
q = EmbeddingM().encode_queries(queries)d = EmbeddingM().encode_docs(doc_urls)
s = EmbeddingM().get_score(q, d)print(s)
encode_queries executed in 1.438 secondsencode_docs executed in 1.407 secondstensor([[8.5625, 8.5625], [7.9688, 7.9688]])
encode_queries executed in 0.771 secondsencode_docs executed in 0.823 secondstensor([[8.5625, 8.5625], [7.9688, 7.9688]])
So far, the encoding seems stable, though Windows may still have limitations for some dependencies.
Thanks for your help.