LFM 2.5 1.2B Thinking (GGUF)

Description

This repository contains the GGUF quantized version of LiquidAI/LFM2.5-1.2B-Thinking, a 1.2 billion parameter "thinking" language model by Liquid AI.

The model uses the novel Lfm2ForCausalLM architecture featuring a hybrid design of 10 double-gated LIV convolution blocks + 6 GQA attention blocks โ€” a departure from standard transformer-only designs. This architecture alternates between local convolution-based mixing and sparse global attention, enabling efficient sequence processing with strong reasoning capabilities.

Model Details

Property Value
Architecture Lfm2ForCausalLM
Parameter Count 1.17B
Layers 16 (10 conv blocks + 6 GQA blocks)
Hidden Size 2048
Intermediate (FFN) 8192
Attention Heads 32
KV Heads (GQA) 8 (on attention layers)
Context Length 32,768 tokens
Vocabulary Size 65,536
Languages English, Arabic, Chinese, French, German, Japanese, Korean, Spanish
Quantization Q8_0 (8-bit)
File Type GGUF

Quantization Details

This model was quantized using llama.cpp with the Q8_0 scheme:

  • Source format: F16 (converted from HuggingFace safetensors)
  • Quantization: Q8_0 โ€” 8-bit quantization with block-wise scaling
  • Quality: Near-lossless; ideal for deployment where precision matters
  • Size reduction: ~50% smaller than F16 while retaining virtually all model quality

Usage with llama.cpp

git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp
cmake -B build && cmake --build build --config Release -j$(nproc)

./build/bin/llama-cli \
  -hf Kelexine/LFM2.5-1.2B-Thinking-GGUF \
  --temp 0.05 --top-k 50 --repeat-penalty 1.05 -n 4096 -cnv

Or with a local file:

./build/bin/llama-cli \
  -m LFM2.5-1.2B-Thinking-Q8_0.gguf \
  -p "<|im_start|>user\nYour prompt here<|im_end|>\n<|im_start|>assistant\n" \
  --temp 0.05 --top-k 50 --repeat-penalty 1.05 -n 4096

Usage with Python (llama-cpp-python)

from llama_cpp import Llama

llm = Llama(
    model_path="LFM2.5-1.2B-Thinking-Q8_0.gguf",
    n_ctx=4096,
    temperature=0.05,
    top_k=50,
    repeat_penalty=1.05,
)

response = llm(
    "<|im_start|>user\nWhat is machine learning?<|im_end|>\n<|im_start|>assistant\n",
    max_tokens=4096,
    stop=["<|im_end|>"],
)
print(response["choices"][0]["text"])

Provided Files

File Description
LFM2.5-1.2B-Thinking-Q8_0.gguf 8-bit quantized GGUF (recommended)

Limitations

  • This is a 1.17B parameter model โ€” suited for lightweight tasks, quick prototyping, and edge deployment.
  • The "Thinking" variant is designed for chain-of-thought reasoning but may produce verbose <think>...</think> blocks; strip these in downstream integrations.
  • Requires a recent version of llama.cpp with Lfm2ForCausalLM architecture support.
  • Not recommended for knowledge-intensive tasks or programming per Liquid AI's own guidance.

License

This repository inherits the LFM 1.0 License from the base model LiquidAI/LFM2.5-1.2B-Thinking.

Credits

Downloads last month
118
GGUF
Model size
1B params
Architecture
lfm2
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Kelexine/LFM2.5-1.2B-Thinking-GGUF

Quantized
(33)
this model