LFM2.5 50M Vietnamese โ€” Continued Pre-Training

A 48M-parameter model based on LiquidAI/LFM2.5-1.2B-Instruct, scaled down and further pre-trained on a large Vietnamese text corpus. LFM2.5 is a hybrid conv-attention architecture from Liquid AI, designed for efficient inference with linear-time complexity in sequence length.

Model Details

Property Value
Base model LiquidAI/LFM2.5-1.2B-Instruct
Architecture LFM2.5 (conv + full attention hybrid)
Parameters 48M (96MB safetensors)
Hidden size 256
Layers 9 (6 conv + 3 full attention)
Attention heads 4 (1 KV head, GQA)
Max position 2,048 tokens
Tokenizer Qwen2.5 (151K vocab, BPE)
Precision bfloat16

Training

Continued Pre-Training

The base architecture from LiquidAI/LFM2.5-1.2B-Instruct was scaled down to 50M parameters (9 layers, hidden_size=256) and then continued pre-training was performed on Vietnamese text.

  • Base model: LiquidAI/LFM2.5-1.2B-Instruct (random initiated 50M params weight)
  • Dataset: Tuan-NT/vietnamese-corpus-pretrain -- 11.3M samples of Vietnamese text (news, Wikipedia, web crawl)
  • Tokens seen: 2.46 billion
  • Steps: 178,000 (plateau reached around 120K steps)
  • Batch size: 8 x 4 gradient accumulation = effective 32
  • Sequence length: 512 tokens
  • Learning rate: 2e-4 (cosine schedule)
  • Optimizer: AdamW
  • Hardware: NVIDIA RTX 5080 16GB
  • Training time: ~21 hours
  • Framework: Unsloth + HuggingFace Transformers

Loss Curve

Step Loss
10,000 6.337
20,000 5.567
30,000 5.098
50,000 4.629
80,000 4.245
100,000 4.194
120,000 4.103
140,000 4.103
160,000 4.109
178,000 4.055

Loss plateaued around step 120K at ~4.08. The model was trained for 0.5 epochs of the full dataset (2.46B tokens out of ~5.8B total).

Intended Use

This model is intended as a backbone for downstream Vietnamese NLP tasks, not as a standalone text generator. At 48M parameters it is too small for general-purpose generation but well-suited as a feature extractor or for fine-tuning on specific tasks.

Recommended applications:

  • Named Entity Recognition (NER) -- add a token classification head and fine-tune on labeled data
  • Text classification -- add a classification head for sentiment, intent, or category prediction
  • Query understanding -- parse e-commerce search queries into structured components
  • Lightweight inference -- the hybrid conv-attention architecture enables fast CPU inference (~2ms per query at 20 tokens)
  • On-device NLP -- small enough for mobile/edge deployment

Not recommended for:

  • Open-ended text generation (too small for coherent generation)
  • Tasks requiring world knowledge (limited capacity)
  • Multilingual tasks (Vietnamese-focused pre-training)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "GazTrab/lfm2.5-50m-vietnamese",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("GazTrab/lfm2.5-50m-vietnamese")

# Get hidden states for downstream tasks
inputs = tokenizer("dien thoai samsung galaxy", return_tensors="pt")
outputs = model(**inputs, output_hidden_states=True)
last_hidden = outputs.hidden_states[-1]  # [1, seq_len, 256]

# Or use as a token classifier backbone
# See: https://huggingface.co/docs/transformers/tasks/token_classification

Background

This model was developed as part of a search system project. The goal was to build a lightweight NER backbone that could:

  1. Parse Vietnamese search queries with typos, missing diacritics, slang, and abbreviations
  2. Run inference in under 2ms on CPU for real-time search
  3. Handle different entity classification

The LFM2.5 architecture was chosen after benchmarking against Qwen2.5 and Qwen3.5 at similar parameter counts, as it achieved the best speed-accuracy tradeoff with its hybrid conv-attention design.

Citation

@misc{LFM2.5-50M-Vietnamese,
  title={LFM2.5 50M Vietnamese: Continued Pre-Training of LFM2.5 on Vietnamese Corpus},
  author={GazTrab},
  year={2026},
  url={https://huggingface.co/GazTrab/LFM2.5-50M-Vietnamese}
}

Acknowledgments

Downloads last month
503
Safetensors
Model size
48M params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for GazTrab/LFM2.5-50M-Vietnamese

Finetuned
(79)
this model

Dataset used to train GazTrab/LFM2.5-50M-Vietnamese