EXAONE 3.5 2.4B Instruct โ€” GGUF + MXQ for llama-cli-mblt

This repository provides EXAONE 3.5 2.4B Instruct compiled and optimized for Mobilint NPU hardware, packaged for use with llama.cpp-mblt.

Branches

Branch Contents Description
main Body model only Standard autoregressive decoding
eagle3 Body + FC + Draft models EAGLE3 speculative decoding (~2-4x faster)

Quick Start

# Simple decoding
llama-cli-mblt -hf mobilint/EXAONE-3.5-2.4B-Instruct-GGUF -p "Hello!" -n 128

# EAGLE3 speculative decoding
llama-cli-mblt -hf mobilint/EXAONE-3.5-2.4B-Instruct-GGUF --eagle3 -p "Hello!" -n 128

# Interactive chat
llama-cli-mblt -hf mobilint/EXAONE-3.5-2.4B-Instruct-GGUF --eagle3

Files

main branch

File Size Description
exaone-3.5-2.4b-instruct-vocab.gguf 4.0 MB Tokenizer (vocab-only GGUF)
target_emb.bin 1.0 GB Body embedding weights (float32)
EXAONE-3.5-2.4B-Instruct.mxq 1.4 GB Body model for NPU
config.json โ€” Model configuration

eagle3 branch (adds)

File Size Description
single_Fc_EXAONE-3.5-2.4B-Instruct.mxq 19 MB FC dimension converter model
Draft_EXAONE-3.5-2.4B-Instruct.mxq 87 MB EAGLE3 draft model
draft_emb.bin 1.0 GB Draft embedding weights
d2t.bin 250 KB Draft-to-target vocabulary mapping

About

This model is compiled and optimized for Mobilint NPU hardware. It is intended to be used with llama-cli-mblt from llama.cpp-mblt.

Downloads last month
632
GGUF
Model size
0 params
Architecture
exaone
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for mobilint/EXAONE-3.5-2.4B-Instruct-GGUF

Quantized
(22)
this model

Collection including mobilint/EXAONE-3.5-2.4B-Instruct-GGUF