LFM 2.5 1.2B Thinking (GGUF)

Description

This repository contains the GGUF quantized version of LiquidAI/LFM2.5-1.2B-Thinking, a 1.2 billion parameter "thinking" language model by Liquid AI.

The model uses the novel Lfm2ForCausalLM architecture featuring a hybrid design of 10 double-gated LIV convolution blocks + 6 GQA attention blocks — a departure from standard transformer-only designs. This architecture alternates between local convolution-based mixing and sparse global attention, enabling efficient sequence processing with strong reasoning capabilities.

Model Details

Property	Value
Architecture	Lfm2ForCausalLM
Parameter Count	1.17B
Layers	16 (10 conv blocks + 6 GQA blocks)
Hidden Size	2048
Intermediate (FFN)	8192
Attention Heads	32
KV Heads (GQA)	8 (on attention layers)
Context Length	32,768 tokens
Vocabulary Size	65,536
Languages	English, Arabic, Chinese, French, German, Japanese, Korean, Spanish
Quantization	Q8_0 (8-bit)
File Type	GGUF

Quantization Details

This model was quantized using llama.cpp with the Q8_0 scheme:

Source format: F16 (converted from HuggingFace safetensors)
Quantization: Q8_0 — 8-bit quantization with block-wise scaling
Quality: Near-lossless; ideal for deployment where precision matters
Size reduction: ~50% smaller than F16 while retaining virtually all model quality

Usage with llama.cpp

git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp
cmake -B build && cmake --build build --config Release -j$(nproc)

./build/bin/llama-cli \
  -hf Kelexine/LFM2.5-1.2B-Thinking-GGUF \
  --temp 0.05 --top-k 50 --repeat-penalty 1.05 -n 4096 -cnv

Or with a local file:

./build/bin/llama-cli \
  -m LFM2.5-1.2B-Thinking-Q8_0.gguf \
  -p "<|im_start|>user\nYour prompt here<|im_end|>\n<|im_start|>assistant\n" \
  --temp 0.05 --top-k 50 --repeat-penalty 1.05 -n 4096

Usage with Python (llama-cpp-python)

from llama_cpp import Llama

llm = Llama(
    model_path="LFM2.5-1.2B-Thinking-Q8_0.gguf",
    n_ctx=4096,
    temperature=0.05,
    top_k=50,
    repeat_penalty=1.05,
)

response = llm(
    "<|im_start|>user\nWhat is machine learning?<|im_end|>\n<|im_start|>assistant\n",
    max_tokens=4096,
    stop=["<|im_end|>"],
)
print(response["choices"][0]["text"])

Provided Files

File	Description
`LFM2.5-1.2B-Thinking-Q8_0.gguf`	8-bit quantized GGUF (recommended)

Limitations

This is a 1.17B parameter model — suited for lightweight tasks, quick prototyping, and edge deployment.
The "Thinking" variant is designed for chain-of-thought reasoning but may produce verbose <think>...</think> blocks; strip these in downstream integrations.
Requires a recent version of llama.cpp with Lfm2ForCausalLM architecture support.
Not recommended for knowledge-intensive tasks or programming per Liquid AI's own guidance.

License

This repository inherits the LFM 1.0 License from the base model LiquidAI/LFM2.5-1.2B-Thinking.

Credits

Base model: Liquid AI
Quantization: kelexine
Framework: llama.cpp by ggml-org

Downloads last month: 118

GGUF

Model size

1B params

Architecture

lfm2

Hardware compatibility

8-bit

Model tree for Kelexine/LFM2.5-1.2B-Thinking-GGUF

Base model

LiquidAI/LFM2.5-1.2B-Base

Finetuned

LiquidAI/LFM2.5-1.2B-Thinking

Quantized

(33)

this model