EXAONE
Collection
7 items โข Updated
This repository provides EXAONE 3.5 2.4B Instruct compiled and optimized for Mobilint NPU hardware, packaged for use with llama.cpp-mblt.
| Branch | Contents | Description |
|---|---|---|
main |
Body model only | Standard autoregressive decoding |
eagle3 |
Body + FC + Draft models | EAGLE3 speculative decoding (~2-4x faster) |
# Simple decoding
llama-cli-mblt -hf mobilint/EXAONE-3.5-2.4B-Instruct-GGUF -p "Hello!" -n 128
# EAGLE3 speculative decoding
llama-cli-mblt -hf mobilint/EXAONE-3.5-2.4B-Instruct-GGUF --eagle3 -p "Hello!" -n 128
# Interactive chat
llama-cli-mblt -hf mobilint/EXAONE-3.5-2.4B-Instruct-GGUF --eagle3
| File | Size | Description |
|---|---|---|
exaone-3.5-2.4b-instruct-vocab.gguf |
4.0 MB | Tokenizer (vocab-only GGUF) |
target_emb.bin |
1.0 GB | Body embedding weights (float32) |
EXAONE-3.5-2.4B-Instruct.mxq |
1.4 GB | Body model for NPU |
config.json |
โ | Model configuration |
| File | Size | Description |
|---|---|---|
single_Fc_EXAONE-3.5-2.4B-Instruct.mxq |
19 MB | FC dimension converter model |
Draft_EXAONE-3.5-2.4B-Instruct.mxq |
87 MB | EAGLE3 draft model |
draft_emb.bin |
1.0 GB | Draft embedding weights |
d2t.bin |
250 KB | Draft-to-target vocabulary mapping |
This model is compiled and optimized for Mobilint NPU hardware. It is intended to be used with llama-cli-mblt from llama.cpp-mblt.
We're not able to determine the quantization variants.
Base model
LGAI-EXAONE/EXAONE-3.5-2.4B-Instruct