LFM2.5 50M Vietnamese — Continued Pre-Training

A 48M-parameter model based on LiquidAI/LFM2.5-1.2B-Instruct, scaled down and further pre-trained on a large Vietnamese text corpus. LFM2.5 is a hybrid conv-attention architecture from Liquid AI, designed for efficient inference with linear-time complexity in sequence length.

Model Details

Property	Value
Base model	LiquidAI/LFM2.5-1.2B-Instruct
Architecture	LFM2.5 (conv + full attention hybrid)
Parameters	48M (96MB safetensors)
Hidden size	256
Layers	9 (6 conv + 3 full attention)
Attention heads	4 (1 KV head, GQA)
Max position	2,048 tokens
Tokenizer	Qwen2.5 (151K vocab, BPE)
Precision	bfloat16

Training

Continued Pre-Training

The base architecture from LiquidAI/LFM2.5-1.2B-Instruct was scaled down to 50M parameters (9 layers, hidden_size=256) and then continued pre-training was performed on Vietnamese text.

Base model: LiquidAI/LFM2.5-1.2B-Instruct (random initiated 50M params weight)
Dataset: Tuan-NT/vietnamese-corpus-pretrain -- 11.3M samples of Vietnamese text (news, Wikipedia, web crawl)
Tokens seen: 2.46 billion
Steps: 178,000 (plateau reached around 120K steps)
Batch size: 8 x 4 gradient accumulation = effective 32
Sequence length: 512 tokens
Learning rate: 2e-4 (cosine schedule)
Optimizer: AdamW
Hardware: NVIDIA RTX 5080 16GB
Training time: ~21 hours
Framework: Unsloth + HuggingFace Transformers

Loss Curve

Step	Loss
10,000	6.337
20,000	5.567
30,000	5.098
50,000	4.629
80,000	4.245
100,000	4.194
120,000	4.103
140,000	4.103
160,000	4.109
178,000	4.055

Loss plateaued around step 120K at ~4.08. The model was trained for 0.5 epochs of the full dataset (2.46B tokens out of ~5.8B total).

Intended Use

This model is intended as a backbone for downstream Vietnamese NLP tasks, not as a standalone text generator. At 48M parameters it is too small for general-purpose generation but well-suited as a feature extractor or for fine-tuning on specific tasks.

Recommended applications:

Named Entity Recognition (NER) -- add a token classification head and fine-tune on labeled data
Text classification -- add a classification head for sentiment, intent, or category prediction
Query understanding -- parse e-commerce search queries into structured components
Lightweight inference -- the hybrid conv-attention architecture enables fast CPU inference (~2ms per query at 20 tokens)
On-device NLP -- small enough for mobile/edge deployment

Not recommended for:

Open-ended text generation (too small for coherent generation)
Tasks requiring world knowledge (limited capacity)
Multilingual tasks (Vietnamese-focused pre-training)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "GazTrab/lfm2.5-50m-vietnamese",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("GazTrab/lfm2.5-50m-vietnamese")

# Get hidden states for downstream tasks
inputs = tokenizer("dien thoai samsung galaxy", return_tensors="pt")
outputs = model(**inputs, output_hidden_states=True)
last_hidden = outputs.hidden_states[-1]  # [1, seq_len, 256]

# Or use as a token classifier backbone
# See: https://huggingface.co/docs/transformers/tasks/token_classification

Background

This model was developed as part of a search system project. The goal was to build a lightweight NER backbone that could:

Parse Vietnamese search queries with typos, missing diacritics, slang, and abbreviations
Run inference in under 2ms on CPU for real-time search
Handle different entity classification

The LFM2.5 architecture was chosen after benchmarking against Qwen2.5 and Qwen3.5 at similar parameter counts, as it achieved the best speed-accuracy tradeoff with its hybrid conv-attention design.

Citation

@misc{LFM2.5-50M-Vietnamese,
  title={LFM2.5 50M Vietnamese: Continued Pre-Training of LFM2.5 on Vietnamese Corpus},
  author={GazTrab},
  year={2026},
  url={https://huggingface.co/GazTrab/LFM2.5-50M-Vietnamese}
}

Acknowledgments

Liquid AI for the LFM2.5 architecture (LiquidAI/LFM2.5-1.2B-Instruct)
Tuan-NT for the Vietnamese pre-training corpus
Unsloth for efficient training framework

Downloads last month: 503

Safetensors

Model size

48M params

Tensor type

BF16

Model tree for GazTrab/LFM2.5-50M-Vietnamese

Base model

LiquidAI/LFM2.5-1.2B-Base

Finetuned

LiquidAI/LFM2.5-1.2B-Instruct