LFM2.5 50M Vietnamese โ Continued Pre-Training
A 48M-parameter model based on LiquidAI/LFM2.5-1.2B-Instruct, scaled down and further pre-trained on a large Vietnamese text corpus. LFM2.5 is a hybrid conv-attention architecture from Liquid AI, designed for efficient inference with linear-time complexity in sequence length.
Model Details
| Property | Value |
|---|---|
| Base model | LiquidAI/LFM2.5-1.2B-Instruct |
| Architecture | LFM2.5 (conv + full attention hybrid) |
| Parameters | 48M (96MB safetensors) |
| Hidden size | 256 |
| Layers | 9 (6 conv + 3 full attention) |
| Attention heads | 4 (1 KV head, GQA) |
| Max position | 2,048 tokens |
| Tokenizer | Qwen2.5 (151K vocab, BPE) |
| Precision | bfloat16 |
Training
Continued Pre-Training
The base architecture from LiquidAI/LFM2.5-1.2B-Instruct was scaled down to 50M parameters (9 layers, hidden_size=256) and then continued pre-training was performed on Vietnamese text.
- Base model: LiquidAI/LFM2.5-1.2B-Instruct (random initiated 50M params weight)
- Dataset: Tuan-NT/vietnamese-corpus-pretrain -- 11.3M samples of Vietnamese text (news, Wikipedia, web crawl)
- Tokens seen: 2.46 billion
- Steps: 178,000 (plateau reached around 120K steps)
- Batch size: 8 x 4 gradient accumulation = effective 32
- Sequence length: 512 tokens
- Learning rate: 2e-4 (cosine schedule)
- Optimizer: AdamW
- Hardware: NVIDIA RTX 5080 16GB
- Training time: ~21 hours
- Framework: Unsloth + HuggingFace Transformers
Loss Curve
| Step | Loss |
|---|---|
| 10,000 | 6.337 |
| 20,000 | 5.567 |
| 30,000 | 5.098 |
| 50,000 | 4.629 |
| 80,000 | 4.245 |
| 100,000 | 4.194 |
| 120,000 | 4.103 |
| 140,000 | 4.103 |
| 160,000 | 4.109 |
| 178,000 | 4.055 |
Loss plateaued around step 120K at ~4.08. The model was trained for 0.5 epochs of the full dataset (2.46B tokens out of ~5.8B total).
Intended Use
This model is intended as a backbone for downstream Vietnamese NLP tasks, not as a standalone text generator. At 48M parameters it is too small for general-purpose generation but well-suited as a feature extractor or for fine-tuning on specific tasks.
Recommended applications:
- Named Entity Recognition (NER) -- add a token classification head and fine-tune on labeled data
- Text classification -- add a classification head for sentiment, intent, or category prediction
- Query understanding -- parse e-commerce search queries into structured components
- Lightweight inference -- the hybrid conv-attention architecture enables fast CPU inference (~2ms per query at 20 tokens)
- On-device NLP -- small enough for mobile/edge deployment
Not recommended for:
- Open-ended text generation (too small for coherent generation)
- Tasks requiring world knowledge (limited capacity)
- Multilingual tasks (Vietnamese-focused pre-training)
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"GazTrab/lfm2.5-50m-vietnamese",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("GazTrab/lfm2.5-50m-vietnamese")
# Get hidden states for downstream tasks
inputs = tokenizer("dien thoai samsung galaxy", return_tensors="pt")
outputs = model(**inputs, output_hidden_states=True)
last_hidden = outputs.hidden_states[-1] # [1, seq_len, 256]
# Or use as a token classifier backbone
# See: https://huggingface.co/docs/transformers/tasks/token_classification
Background
This model was developed as part of a search system project. The goal was to build a lightweight NER backbone that could:
- Parse Vietnamese search queries with typos, missing diacritics, slang, and abbreviations
- Run inference in under 2ms on CPU for real-time search
- Handle different entity classification
The LFM2.5 architecture was chosen after benchmarking against Qwen2.5 and Qwen3.5 at similar parameter counts, as it achieved the best speed-accuracy tradeoff with its hybrid conv-attention design.
Citation
@misc{LFM2.5-50M-Vietnamese,
title={LFM2.5 50M Vietnamese: Continued Pre-Training of LFM2.5 on Vietnamese Corpus},
author={GazTrab},
year={2026},
url={https://huggingface.co/GazTrab/LFM2.5-50M-Vietnamese}
}
Acknowledgments
- Liquid AI for the LFM2.5 architecture (LiquidAI/LFM2.5-1.2B-Instruct)
- Tuan-NT for the Vietnamese pre-training corpus
- Unsloth for efficient training framework
- Downloads last month
- 503
Model tree for GazTrab/LFM2.5-50M-Vietnamese
Base model
LiquidAI/LFM2.5-1.2B-Base