EXAONE 4.0: Unified Large Language Models Integrating Non-reasoning and Reasoning Modes
Paper β’ 2507.11407 β’ Published β’ 61
EXAONE-4.0 is a large language model series developed by LG AI Research, consisting of a mid-size 32B model optimized for high performance and a small 1.2B model designed for on-device environments.
We have quantized the weights of this model to INT4 (w/o embedding) to optimize it for on-device deployment.
Model Conversion Contributor: Juneyoung Park (OptAI Inc.)
Model Stats:
For more details, please refer to the official EXAONE4.0 Blog, GitHub, and Documentation.
| Model | Chipset | Target Runtime | Precision | Primary Compute Unit | Context Length | Response Rate (TPS) | Time to First Token (sec) |
|---|---|---|---|---|---|---|---|
| EXAONE-4.0-1.2B | Snapdragon 8 Elite Mobile | QNN(2.42)-GENIE | W4A16 | NPU | 4096 | 55.6 | 0.04 - 0.9 |
If you want faster inference and conversion support for a wider variety of models, feel free to reach out anytime. When running inference with the uploaded model, we recommend using non-reasoning mode. Please refer to the template below.
[|system|]
{SYSTEM_PROMPT}[|endofturn|]
[|user|]
{USER_PROMPT}[|endofturn|]
[|assistant|]
<think>
</think>
EXAONE-4.0-1.2B-OptAI/
βββ LICENSE
βββ README.md
βββ .gitattributes
βββ EXAONE4.0-1.2B-genie-w4a16-qualcomm_snapdragon_8_elite_OptAI.zip
EXAONE4.0-1.2B-genie-w4a16-qualcomm_snapdragon_8_elite_OptAI/
βββ config.json
βββ exaone4_part_1_of_5.bin
βββ ...
βββ exaone4_part_5_of_5.bin
βββ genie_config.json
βββ tokenizer.json
βββ tokenizer_config.json
βββ ...
Base model
LGAI-EXAONE/EXAONE-4.0-1.2B