Qwen3 4B — GGUF + MXQ for llama-cli-mblt

This repository provides Qwen3 4B compiled and optimized for Mobilint NPU hardware, packaged for use with llama.cpp-mblt.

Quick Start

# Interactive chat
llama-cli-mblt -hf mobilint/Qwen3-4B-GGUF

# Single prompt
llama-simple-mblt -hf mobilint/Qwen3-4B-GGUF "Hello world"

File	Size	Description
`qwen3-4b-vocab.gguf`	5.7 MB	Tokenizer (vocab-only GGUF)
`target_emb.bin`	1.5 GB	Body embedding weights (float32)
`Qwen3-4B-W4V8.mxq`	2.1 GB	Body model for NPU (W4V8 quantized)
`config.json`	—	Model configuration

This model is compiled and optimized for Mobilint NPU hardware. It is intended to be used with llama-cli-mblt from llama.cpp-mblt.

GGUF

Model size

0 params

Architecture

qwen3

Hardware compatibility

We're not able to determine the quantization variants.

Base model

Finetuned

Quantized

(209)

this model