Works well on RTX 3080 ti 12 GB VRAM
I have just tested the model on my RTX 3080 ti with 12 GB VRAM and it works well. It is really impressive compared to usual voice assistant which are not full duplex and give not the same natural language feeling.
The only thing that did not work was the use of pre-quantized weights (model_bnb_4bit.pt).
I even tried to DL the quantization file via "huggingface-cli download" without success.
Appreciate the feedback, I'll look into it and update the repo with a fix soon.
I tried to use .pt file but got this error
RuntimeError: Error(s) in loading state_dict for LMModel:
size mismatch for transformer.layers.0.self_attn.in_proj.weight: copying a param with shape torch.Size([25165824, 1]) from checkpoint, the shape in current model is torch.Size([12288, 4096]).
size mismatch for transformer.layers.0.self_attn.out_proj.weight: copying a param with shape torch.Size([8388608, 1]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for transformer.layers.0.gating.linear_in.weight: copying a param with shape torch.Size([46137344, 1]) from checkpoint, the shape in current model is torch.Size([22528, 4096]).
size mismatch for transformer.layers.0.gating.linear_out.weight: copying a param with shape torch.Size([23068672, 1]) from checkpoint, the shape in current model is torch.Size([4096, 11264]).
size mismatch for transformer.layers.1.self_attn.in_proj.weight: copying a param with shape torch.Size([25165824, 1]) from checkpoint, the shape in current model is torch.Size([12288, 4096]).
size mismatch for transformer.layers.1.self_attn.out_proj.weight: copying a param with shape torch.Size([8388608, 1]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for transformer.layers.1.gating.linear_in.weight: copying a param with shape torch.Size([46137344, 1]) from checkpoint, the shape in current model is torch.Size([22528, 4096]).
size mismatch for transformer.layers.1.gating.linear_out.weight: copying a param with shape torch.Size([23068672, 1]) from checkpoint, the shape in current model is torch.Size([4096, 11264]).
size mismatch for transformer.layers.2.self_attn.in_proj.weight: copying a param with shape torch.Size([25165824, 1]) from checkpoint, the shape in current model is torch.Size([12288, 4096]).
size mismatch for transformer.layers.2.self_attn.out_proj.weight: copying a param with shape torch.Size([8388608, 1]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for transformer.layers.2.gating.linear_in.weight: copying a param with shape torch.Size([46137344, 1]) from checkpoint, the shape in current model is torch.Size([22528, 4096]).
size mismatch for transformer.layers.2.gating.linear_out.weight: copying a param with shape torch.Size([23068672, 1]) from checkpoint, the shape in current model is torch.Size([4096, 11264]).
size mismatch for transformer.layers.3.self_attn.in_proj.weight: copying a param with shape torch.Size([25165824, 1]) from checkpoint, the shape in current model is torch.Size([12288, 4096]).
size mismatch for transformer.layers.3.self_attn.out_proj.weight: copying a param with shape torch.Size([8388608, 1]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for transformer.layers.3.gating.linear_in.weight: copying a param with shape torch.Size([46137344, 1]) from checkpoint, the shape in current model is torch.Size([22528, 4096]).
size mismatch for transformer.layers.3.gating.linear_out.weight: copying a param with shape torch.Size([23068672, 1]) from checkpoint, the shape in current model is torch.Size([4096, 11264]).
size mismatch for transformer.layers.4.self_attn.in_proj.weight: copying a param with shape torch.Size([25165824, 1]) from checkpoint, the shape in current model is torch.Size([12288, 4096]).
size mismatch for transformer.layers.4.self_attn.out_proj.weight: copying a param with shape torch.Size([8388608, 1]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for transformer.layers.4.gating.linear_in.weight: copying a param with shape torch.Size([46137344, 1]) from checkpoint, the shape in current model is torch.Size([22528, 4096]).
size mismatch for transformer.layers.4.gating.linear_out.weight: copying a param with shape torch.Size([23068672, 1]) from checkpoint, the shape in current model is torch.Size([4096, 11264]).
size mismatch for transformer.layers.5.self_attn.in_proj.weight: copying a param with shape torch.Size([25165824, 1]) from checkpoint, the shape in current model is torch.Size([12288, 4096]).
size mismatch for transformer.layers.5.self_attn.out_proj.weight: copying a param with shape torch.Size([8388608, 1]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for transformer.layers.5.gating.linear_in.weight: copying a param with shape torch.Size([46137344, 1]) from checkpoint, the shape in current model is torch.Size([22528, 4096]).
size mismatch for transformer.layers.5.gating.linear_out.weight: copying a param with shape torch.Size([23068672, 1]) from checkpoint, the shape in current model is torch.Size([4096, 11264]).
size mismatch for transformer.layers.6.self_attn.in_proj.weight: copying a param with shape torch.Size([25165824, 1]) from checkpoint, the shape in current model is torch.Size([12288, 4096]).
size mismatch for transformer.layers.6.self_attn.out_proj.weight: copying a param with shape torch.Size([8388608, 1]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for transformer.layers.6.gating.linear_in.weight: copying a param with shape torch.Size([46137344, 1]) from checkpoint, the shape in current model is torch.Size([22528, 4096]).
size mismatch for transformer.layers.6.gating.linear_out.weight: copying a param with shape torch.Size([23068672, 1]) from checkpoint, the shape in current model is torch.Size([4096, 11264]).
size mismatch for transformer.layers.7.self_attn.in_proj.weight: copying a param with shape torch.Size([25165824, 1]) from checkpoint, the shape in current model is torch.Size([12288, 4096]).
size mismatch for transformer.layers.7.self_attn.out_proj.weight: copying a param with shape torch.Size([8388608, 1]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for transformer.layers.7.gating.linear_in.weight: copying a param with shape torch.Size([46137344, 1]) from checkpoint, the shape in current model is torch.Size([22528, 4096]).
size mismatch for transformer.layers.7.gating.linear_out.weight: copying a param with shape torch.Size([23068672, 1]) from checkpoint, the shape in current model is torch.Size([4096, 11264]).
size mismatch for transformer.layers.8.self_attn.in_proj.weight: copying a param with shape torch.Size([25165824, 1]) from checkpoint, the shape in current model is torch.Size([12288, 4096]).
size mismatch for transformer.layers.8.self_attn.out_proj.weight: copying a param with shape torch.Size([8388608, 1]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for transformer.layers.8.gating.linear_in.weight: copying a param with shape torch.Size([46137344, 1]) from checkpoint, the shape in current model is torch.Size([22528, 4096]).
size mismatch for transformer.layers.8.gating.linear_out.weight: copying a param with shape torch.Size([23068672, 1]) from checkpoint, the shape in current model is torch.Size([4096, 11264]).
size mismatch for transformer.layers.9.self_attn.in_proj.weight: copying a param with shape torch.Size([25165824, 1]) from checkpoint, the shape in current model is torch.Size([12288, 4096]).
size mismatch for transformer.layers.9.self_attn.out_proj.weight: copying a param with shape torch.Size([8388608, 1]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for transformer.layers.9.gating.linear_in.weight: copying a param with shape torch.Size([46137344, 1]) from checkpoint, the shape in current model is torch.Size([22528, 4096]).
size mismatch for transformer.layers.9.gating.linear_out.weight: copying a param with shape torch.Size([23068672, 1]) from checkpoint, the shape in current model is torch.Size([4096, 11264]).
size mismatch for transformer.layers.10.self_attn.in_proj.weight: copying a param with shape torch.Size([25165824, 1]) from checkpoint, the shape in current model is torch.Size([12288, 4096]).
size mismatch for transformer.layers.10.self_attn.out_proj.weight: copying a param with shape torch.Size([8388608, 1]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for transformer.layers.10.gating.linear_in.weight: copying a param with shape torch.Size([46137344, 1]) from checkpoint, the shape in current model is torch.Size([22528, 4096]).
size mismatch for transformer.layers.10.gating.linear_out.weight: copying a param with shape torch.Size([23068672, 1]) from checkpoint, the shape in current model is torch.Size([4096, 11264]).
size mismatch for transformer.layers.11.self_attn.in_proj.weight: copying a param with shape torch.Size([25165824, 1]) from checkpoint, the shape in current model is torch.Size([12288, 4096]).
size mismatch for transformer.layers.11.self_attn.out_proj.weight: copying a param with shape torch.Size([8388608, 1]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for transformer.layers.11.gating.linear_in.weight: copying a param with shape torch.Size([46137344, 1]) from checkpoint, the shape in current model is torch.Size([22528, 4096]).
size mismatch for transformer.layers.11.gating.linear_out.weight: copying a param with shape torch.Size([23068672, 1]) from checkpoint, the shape in current model is torch.Size([4096, 11264]).
size mismatch for transformer.layers.12.self_attn.in_proj.weight: copying a param with shape torch.Size([25165824, 1]) from checkpoint, the shape in current model is torch.Size([12288, 4096]).
size mismatch for transformer.layers.12.self_attn.out_proj.weight: copying a param with shape torch.Size([8388608, 1]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for transformer.layers.12.gating.linear_in.weight: copying a param with shape torch.Size([46137344, 1]) from checkpoint, the shape in current model is torch.Size([22528, 4096]).
size mismatch for transformer.layers.12.gating.linear_out.weight: copying a param with shape torch.Size([23068672, 1]) from checkpoint, the shape in current model is torch.Size([4096, 11264]).
size mismatch for transformer.layers.13.self_attn.in_proj.weight: copying a param with shape torch.Size([25165824, 1]) from checkpoint, the shape in current model is torch.Size([12288, 4096]).
size mismatch for transformer.layers.13.self_attn.out_proj.weight: copying a param with shape torch.Size([8388608, 1]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for transformer.layers.13.gating.linear_in.weight: copying a param with shape torch.Size([46137344, 1]) from checkpoint, the shape in current model is torch.Size([22528, 4096]).
size mismatch for transformer.layers.13.gating.linear_out.weight: copying a param with shape torch.Size([23068672, 1]) from checkpoint, the shape in current model is torch.Size([4096, 11264]).
size mismatch for transformer.layers.14.self_attn.in_proj.weight: copying a param with shape torch.Size([25165824, 1]) from checkpoint, the shape in current model is torch.Size([12288, 4096]).
size mismatch for transformer.layers.14.self_attn.out_proj.weight: copying a param with shape torch.Size([8388608, 1]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for transformer.layers.14.gating.linear_in.weight: copying a param with shape torch.Size([46137344, 1]) from checkpoint, the shape in current model is torch.Size([22528, 4096]).
size mismatch for transformer.layers.14.gating.linear_out.weight: copying a param with shape torch.Size([23068672, 1]) from checkpoint, the shape in current model is torch.Size([4096, 11264]).
size mismatch for transformer.layers.15.self_attn.in_proj.weight: copying a param with shape torch.Size([25165824, 1]) from checkpoint, the shape in current model is torch.Size([12288, 4096]).
size mismatch for transformer.layers.15.self_attn.out_proj.weight: copying a param with shape torch.Size([8388608, 1]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for transformer.layers.15.gating.linear_in.weight: copying a param with shape torch.Size([46137344, 1]) from checkpoint, the shape in current model is torch.Size([22528, 4096]).
size mismatch for transformer.layers.15.gating.linear_out.weight: copying a param with shape torch.Size([23068672, 1]) from checkpoint, the shape in current model is torch.Size([4096, 11264]).
size mismatch for transformer.layers.16.self_attn.in_proj.weight: copying a param with shape torch.Size([25165824, 1]) from checkpoint, the shape in current model is torch.Size([12288, 4096]).
size mismatch for transformer.layers.16.self_attn.out_proj.weight: copying a param with shape torch.Size([8388608, 1]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for transformer.layers.16.gating.linear_in.weight: copying a param with shape torch.Size([46137344, 1]) from checkpoint, the shape in current model is torch.Size([22528, 4096]).
size mismatch for transformer.layers.16.gating.linear_out.weight: copying a param with shape torch.Size([23068672, 1]) from checkpoint, the shape in current model is torch.Size([4096, 11264]).
size mismatch for transformer.layers.17.self_attn.in_proj.weight: copying a param with shape torch.Size([25165824, 1]) from checkpoint, the shape in current model is torch.Size([12288, 4096]).
size mismatch for transformer.layers.17.self_attn.out_proj.weight: copying a param with shape torch.Size([8388608, 1]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for transformer.layers.17.gating.linear_in.weight: copying a param with shape torch.Size([46137344, 1]) from checkpoint, the shape in current model is torch.Size([22528, 4096]).
size mismatch for transformer.layers.17.gating.linear_out.weight: copying a param with shape torch.Size([23068672, 1]) from checkpoint, the shape in current model is torch.Size([4096, 11264]).
size mismatch for transformer.layers.18.self_attn.in_proj.weight: copying a param with shape torch.Size([25165824, 1]) from checkpoint, the shape in current model is torch.Size([12288, 4096]).
size mismatch for transformer.layers.18.self_attn.out_proj.weight: copying a param with shape torch.Size([8388608, 1]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for transformer.layers.18.gating.linear_in.weight: copying a param with shape torch.Size([46137344, 1]) from checkpoint, the shape in current model is torch.Size([22528, 4096]).
size mismatch for transformer.layers.18.gating.linear_out.weight: copying a param with shape torch.Size([23068672, 1]) from checkpoint, the shape in current model is torch.Size([4096, 11264]).
size mismatch for transformer.layers.19.self_attn.in_proj.weight: copying a param with shape torch.Size([25165824, 1]) from checkpoint, the shape in current model is torch.Size([12288, 4096]).
size mismatch for transformer.layers.19.self_attn.out_proj.weight: copying a param with shape torch.Size([8388608, 1]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for transformer.layers.19.gating.linear_in.weight: copying a param with shape torch.Size([46137344, 1]) from checkpoint, the shape in current model is torch.Size([22528, 4096]).
size mismatch for transformer.layers.19.gating.linear_out.weight: copying a param with shape torch.Size([23068672, 1]) from checkpoint, the shape in current model is torch.Size([4096, 11264]).
size mismatch for transformer.layers.20.self_attn.in_proj.weight: copying a param with shape torch.Size([25165824, 1]) from checkpoint, the shape in current model is torch.Size([12288, 4096]).
size mismatch for transformer.layers.20.self_attn.out_proj.weight: copying a param with shape torch.Size([8388608, 1]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for transformer.layers.20.gating.linear_in.weight: copying a param with shape torch.Size([46137344, 1]) from checkpoint, the shape in current model is torch.Size([22528, 4096]).
size mismatch for transformer.layers.20.gating.linear_out.weight: copying a param with shape torch.Size([23068672, 1]) from checkpoint, the shape in current model is torch.Size([4096, 11264]).
size mismatch for transformer.layers.21.self_attn.in_proj.weight: copying a param with shape torch.Size([25165824, 1]) from checkpoint, the shape in current model is torch.Size([12288, 4096]).
size mismatch for transformer.layers.21.self_attn.out_proj.weight: copying a param with shape torch.Size([8388608, 1]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for transformer.layers.21.gating.linear_in.weight: copying a param with shape torch.Size([46137344, 1]) from checkpoint, the shape in current model is torch.Size([22528, 4096]).
size mismatch for transformer.layers.21.gating.linear_out.weight: copying a param with shape torch.Size([23068672, 1]) from checkpoint, the shape in current model is torch.Size([4096, 11264]).
size mismatch for transformer.layers.22.self_attn.in_proj.weight: copying a param with shape torch.Size([25165824, 1]) from checkpoint, the shape in current model is torch.Size([12288, 4096]).
size mismatch for transformer.layers.22.self_attn.out_proj.weight: copying a param with shape torch.Size([8388608, 1]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for transformer.layers.22.gating.linear_in.weight: copying a param with shape torch.Size([46137344, 1]) from checkpoint, the shape in current model is torch.Size([22528, 4096]).
size mismatch for transformer.layers.22.gating.linear_out.weight: copying a param with shape torch.Size([23068672, 1]) from checkpoint, the shape in current model is torch.Size([4096, 11264]).
size mismatch for transformer.layers.23.self_attn.in_proj.weight: copying a param with shape torch.Size([25165824, 1]) from checkpoint, the shape in current model is torch.Size([12288, 4096]).
size mismatch for transformer.layers.23.self_attn.out_proj.weight: copying a param with shape torch.Size([8388608, 1]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for transformer.layers.23.gating.linear_in.weight: copying a param with shape torch.Size([46137344, 1]) from checkpoint, the shape in current model is torch.Size([22528, 4096]).
size mismatch for transformer.layers.23.gating.linear_out.weight: copying a param with shape torch.Size([23068672, 1]) from checkpoint, the shape in current model is torch.Size([4096, 11264]).
size mismatch for transformer.layers.24.self_attn.in_proj.weight: copying a param with shape torch.Size([25165824, 1]) from checkpoint, the shape in current model is torch.Size([12288, 4096]).
size mismatch for transformer.layers.24.self_attn.out_proj.weight: copying a param with shape torch.Size([8388608, 1]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for transformer.layers.24.gating.linear_in.weight: copying a param with shape torch.Size([46137344, 1]) from checkpoint, the shape in current model is torch.Size([22528, 4096]).
size mismatch for transformer.layers.24.gating.linear_out.weight: copying a param with shape torch.Size([23068672, 1]) from checkpoint, the shape in current model is torch.Size([4096, 11264]).
size mismatch for transformer.layers.25.self_attn.in_proj.weight: copying a param with shape torch.Size([25165824, 1]) from checkpoint, the shape in current model is torch.Size([12288, 4096]).
size mismatch for transformer.layers.25.self_attn.out_proj.weight: copying a param with shape torch.Size([8388608, 1]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for transformer.layers.25.gating.linear_in.weight: copying a param with shape torch.Size([46137344, 1]) from checkpoint, the shape in current model is torch.Size([22528, 4096]).
size mismatch for transformer.layers.25.gating.linear_out.weight: copying a param with shape torch.Size([23068672, 1]) from checkpoint, the shape in current model is torch.Size([4096, 11264]).
size mismatch for transformer.layers.26.self_attn.in_proj.weight: copying a param with shape torch.Size([25165824, 1]) from checkpoint, the shape in current model is torch.Size([12288, 4096]).
size mismatch for transformer.layers.26.self_attn.out_proj.weight: copying a param with shape torch.Size([8388608, 1]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for transformer.layers.26.gating.linear_in.weight: copying a param with shape torch.Size([46137344, 1]) from checkpoint, the shape in current model is torch.Size([22528, 4096]).
size mismatch for transformer.layers.26.gating.linear_out.weight: copying a param with shape torch.Size([23068672, 1]) from checkpoint, the shape in current model is torch.Size([4096, 11264]).
size mismatch for transformer.layers.27.self_attn.in_proj.weight: copying a param with shape torch.Size([25165824, 1]) from checkpoint, the shape in current model is torch.Size([12288, 4096]).
size mismatch for transformer.layers.27.self_attn.out_proj.weight: copying a param with shape torch.Size([8388608, 1]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for transformer.layers.27.gating.linear_in.weight: copying a param with shape torch.Size([46137344, 1]) from checkpoint, the shape in current model is torch.Size([22528, 4096]).
size mismatch for transformer.layers.27.gating.linear_out.weight: copying a param with shape torch.Size([23068672, 1]) from checkpoint, the shape in current model is torch.Size([4096, 11264]).
size mismatch for transformer.layers.28.self_attn.in_proj.weight: copying a param with shape torch.Size([25165824, 1]) from checkpoint, the shape in current model is torch.Size([12288, 4096]).
size mismatch for transformer.layers.28.self_attn.out_proj.weight: copying a param with shape torch.Size([8388608, 1]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for transformer.layers.28.gating.linear_in.weight: copying a param with shape torch.Size([46137344, 1]) from checkpoint, the shape in current model is torch.Size([22528, 4096]).
size mismatch for transformer.layers.28.gating.linear_out.weight: copying a param with shape torch.Size([23068672, 1]) from checkpoint, the shape in current model is torch.Size([4096, 11264]).
size mismatch for transformer.layers.29.self_attn.in_proj.weight: copying a param with shape torch.Size([25165824, 1]) from checkpoint, the shape in current model is torch.Size([12288, 4096]).
size mismatch for transformer.layers.29.self_attn.out_proj.weight: copying a param with shape torch.Size([8388608, 1]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for transformer.layers.29.gating.linear_in.weight: copying a param with shape torch.Size([46137344, 1]) from checkpoint, the shape in current model is torch.Size([22528, 4096]).
size mismatch for transformer.layers.29.gating.linear_out.weight: copying a param with shape torch.Size([23068672, 1]) from checkpoint, the shape in current model is torch.Size([4096, 11264]).
size mismatch for transformer.layers.30.self_attn.in_proj.weight: copying a param with shape torch.Size([25165824, 1]) from checkpoint, the shape in current model is torch.Size([12288, 4096]).
size mismatch for transformer.layers.30.self_attn.out_proj.weight: copying a param with shape torch.Size([8388608, 1]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for transformer.layers.30.gating.linear_in.weight: copying a param with shape torch.Size([46137344, 1]) from checkpoint, the shape in current model is torch.Size([22528, 4096]).
size mismatch for transformer.layers.30.gating.linear_out.weight: copying a param with shape torch.Size([23068672, 1]) from checkpoint, the shape in current model is torch.Size([4096, 11264]).
size mismatch for transformer.layers.31.self_attn.in_proj.weight: copying a param with shape torch.Size([25165824, 1]) from checkpoint, the shape in current model is torch.Size([12288, 4096]).
size mismatch for transformer.layers.31.self_attn.out_proj.weight: copying a param with shape torch.Size([8388608, 1]) from checkpoint, the shape in current model is torch.Size([4096, 4096]).
size mismatch for transformer.layers.31.gating.linear_in.weight: copying a param with shape torch.Size([46137344, 1]) from checkpoint, the shape in current model is torch.Size([22528, 4096]).
size mismatch for transformer.layers.31.gating.linear_out.weight: copying a param with shape torch.Size([23068672, 1]) from checkpoint, the shape in current model is torch.Size([4096, 11264]).
Hey @gozzima and @moeinsadeghi β this should be fixed now.
The issue was that the loader didn't have a code path for pre-quantized bitsandbytes checkpoints. When you passed the .pt file, it tried to load the packed 4-bit weights (shape [25165824, 1]) directly into the model expecting bf16 shapes ([12288, 4096]) β hence the size mismatch errors.
The moshi/ source in this repo has been updated with a fix. The loader now auto-detects pre-quantized checkpoints and reconstructs the 4-bit weights properly without re-quantizing.
To use pre-quantized weights:
Re-clone or pull the latest moshi/ source from this repo
git clone https://huggingface.co/brianmatzelle/personaplex-7b-v1-bnb-4bit
cd personaplex-7b-v1-bnb-4bit
pip install moshi/.
Run with the pre-quantized .pt file
python -m moshi.offline \
--moshi-weight model_bnb_4bit.pt \
--quantize-4bit \
--voice-prompt "NATF2.pt" \
--input-wav "assets/test/input_assistant.wav" \
--output-wav "output.wav" \
--output-text "output.json"
Or for the live server:
SSL_DIR=$(mktemp -d)
python -m moshi.server --ssl "$SSL_DIR" --quantize-4bit --moshi-weight model_bnb_4bit.pt
The key is passing both --moshi-weight model_bnb_4bit.pt and --quantize-4bit together. The loader detects the bitsandbytes metadata in the checkpoint and skips re-quantization automatically.
Let me know if you run into any other issues!