Qwen3
Collection
5 items โข Updated
This repository provides Qwen3 4B compiled and optimized for Mobilint NPU hardware, packaged for use with llama.cpp-mblt.
# Interactive chat
llama-cli-mblt -hf mobilint/Qwen3-4B-GGUF
# Single prompt
llama-simple-mblt -hf mobilint/Qwen3-4B-GGUF "Hello world"
| File | Size | Description |
|---|---|---|
qwen3-4b-vocab.gguf |
5.7 MB | Tokenizer (vocab-only GGUF) |
target_emb.bin |
1.5 GB | Body embedding weights (float32) |
Qwen3-4B-W4V8.mxq |
2.1 GB | Body model for NPU (W4V8 quantized) |
config.json |
โ | Model configuration |
This model is compiled and optimized for Mobilint NPU hardware. It is intended to be used with llama-cli-mblt from llama.cpp-mblt.
We're not able to determine the quantization variants.