Qwen3 4B โ€” GGUF + MXQ for llama-cli-mblt

This repository provides Qwen3 4B compiled and optimized for Mobilint NPU hardware, packaged for use with llama.cpp-mblt.

Quick Start

# Interactive chat
llama-cli-mblt -hf mobilint/Qwen3-4B-GGUF

# Single prompt
llama-simple-mblt -hf mobilint/Qwen3-4B-GGUF "Hello world"

Files

File Size Description
qwen3-4b-vocab.gguf 5.7 MB Tokenizer (vocab-only GGUF)
target_emb.bin 1.5 GB Body embedding weights (float32)
Qwen3-4B-W4V8.mxq 2.1 GB Body model for NPU (W4V8 quantized)
config.json โ€” Model configuration

About

This model is compiled and optimized for Mobilint NPU hardware. It is intended to be used with llama-cli-mblt from llama.cpp-mblt.

Downloads last month
585
GGUF
Model size
0 params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for mobilint/Qwen3-4B-GGUF

Finetuned
Qwen/Qwen3-4B
Quantized
(209)
this model

Collection including mobilint/Qwen3-4B-GGUF