🌊 LFM2-1.2B-KoEn-MT-v4-100k

LFM2-1.2B-KoEn-MT-v4-100k은 LiquidAI의 LFM2-1.2B λͺ¨λΈμ„ 기반으둜 ν•œκ΅­μ–΄-μ˜μ–΄ λ²ˆμ—­ λŠ₯λ ₯ ν–₯상을 μœ„ν•΄ 100,000개의 κ³ ν’ˆμ§ˆ 병렬 λ°μ΄ν„°μ…‹μœΌλ‘œ νŒŒμΈνŠœλ‹λœ λͺ¨λΈμž…λ‹ˆλ‹€.

T4 GPU x 2 (DDP) ν™˜κ²½μ—μ„œ μ΅œμ ν™”λœ ν•™μŠ΅ νŒŒμ΄ν”„λΌμΈμ„ 톡해 ν•™μŠ΅λ˜μ—ˆμœΌλ©°, 1.2B의 κ°€λ²Όμš΄ νŒŒλΌλ―Έν„°λ‘œλ„ 효율적이고 μ€€μˆ˜ν•œ λ²ˆμ—­ μ„±λŠ₯을 λ³΄μ—¬μ€λ‹ˆλ‹€. 특히, NLLB-600Mκ³Ό 경쟁 κ°€λŠ₯ν•œ μ„±λŠ₯을 보이며 λͺ¨λ°”일 및 μ—£μ§€ λ””λ°”μ΄μŠ€μ—μ„œμ˜ ν™œμš© κ°€λŠ₯성을 μ—΄μ–΄μ€λ‹ˆλ‹€.

πŸ“Š 벀치마크 (Benchmarks)

Flores-200 데이터셋(1012 λ¬Έμž₯)을 κΈ°μ€€μœΌλ‘œ ν•œ 평가 κ²°κ³Όμž…λ‹ˆλ‹€. (CHrF++ κΈ°μ€€ μ •λ ¬)

Rank Model CHrF++ BLEU λΉ„κ³ 
1 Google Translate 39.27 18.18 μƒμš© μ„œλΉ„μŠ€ (Target)
2 Yanolja-4B-GGUF 38.61 16.03 Open Source Model (SOTA)
3 NLLB-200 (3.3B) 35.09 11.68 3.3B λ²ˆμ—­ μ „μš© λͺ¨λΈ
4 Gemma-3-4B-it-GGUF 32.83 11.36 Google μ΅œμ‹  4B λͺ¨λΈ
5 NLLB-200-Distilled-600M 31.97 10.32 600M λ²ˆμ—­ μ „μš© λͺ¨λΈ
6 LFM2-1.2B-KOEN-MT-v4-100k 31.53 11.13 λ³Έ λͺ¨λΈ (1.2B)
7 lfm2-mt-v1 30.85 11.17 100 Samples ν•™μŠ΅
8 LFM2-1.2B 27.23 6.43 베이슀라인 λͺ¨λΈ
9 Qwen3-4B-GGUF 25.62 7.46 4B Base Model
10 Gemma-3-1B-it-GGUF 24.07 6.94 1B λͺ¨λΈ
11 Qwen3-1.7B-GGUF 21.19 - 1.7B Base Model
12 Qwen3-0.6B-GGUF 13.48 1.98 0.6B Base Model

πŸ“ˆ ν•™μŠ΅ 둜그 (Training Logs)

μ•½ 6,188 Step λ™μ•ˆ μ§„ν–‰λœ ν•™μŠ΅μ˜ Loss 및 Learning Rate λ³€ν™” μΆ”μ΄μž…λ‹ˆλ‹€. 초기 손싀값 3.5λŒ€μ—μ„œ μ‹œμž‘ν•˜μ—¬ μ΅œμ’… 1.43κΉŒμ§€ μ•ˆμ •μ μœΌλ‘œ μˆ˜λ ΄ν•˜μ˜€μŠ΅λ‹ˆλ‹€.

Step Epoch Training Loss (Avg) Learning Rate λΉ„κ³ 
0 0.00 3.57 0 Start
500 0.08 1.59 8.06e-06 Warmup μ™„λ£Œ ν›„ κ°μ†Œ
1000 0.16 1.57 9.88e-06 초기 μ•ˆμ •ν™”
2000 0.32 1.48 8.45e-06 Loss 1.5 미만 μ§„μž…
3000 0.49 1.46 5.99e-06 μ€‘λ°˜λΆ€ 수렴 가속
4000 0.65 1.45 3.21e-06 λ―Έμ„Έ μ‘°μ • 단계
5000 0.81 1.44 1.08e-06 μ„±λŠ₯ κ·ΉλŒ€ν™”
6000 0.98 1.43 6.30e-09 μ΅œμ’… 수렴 (Final Convergence)
  • Optimizer: paged_adamw_8bit
  • LR Scheduler: Cosine Decay with Warmup (0.1 ratio)
  • Max LR: 1e-5

πŸš€ μ‚¬μš© μ˜ˆμ‹œ (Usage)

이 λͺ¨λΈμ€ transformers 라이브러리λ₯Ό μ‚¬μš©ν•˜μ—¬ μ‰½κ²Œ λ‘œλ“œν•˜κ³  λ²ˆμ—­μ„ μˆ˜ν–‰ν•  수 μžˆμŠ΅λ‹ˆλ‹€.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# λͺ¨λΈ λ‘œλ“œ
model_id = "gyung/lfm2-1.2b-koen-mt-v4-100k"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.float16
)

# λ²ˆμ—­ν•  λ¬Έμž₯
text = "The model is working correctly now."

# μ±„νŒ… ν…œν”Œλ¦Ώ 적용 (ChatML ν˜•μ‹ ꢌμž₯)
messages = [
    {"role": "system", "content": "Translate to Korean."},
    {"role": "user", "content": text}
]

# μž…λ ₯ 토큰화
input_ids = tokenizer.apply_chat_template(
    messages,
    return_tensors="pt",
    add_generation_prompt=True
).to(model.device)

# λ²ˆμ—­ 생성
outputs = model.generate(
    input_ids,
    max_new_tokens=256,
    pad_token_id=tokenizer.eos_token_id
)

# κ²°κ³Ό λ””μ½”λ”©
decoded = tokenizer.decode(outputs[0][input_ids.shape[1]:], skip_special_tokens=True)
print(f"Input: {text}")
print(f"Output: {decoded}")
# Output: λͺ¨λΈμ΄ μ •μƒμ μœΌλ‘œ μž‘λ™ν•˜κ³  μžˆμŠ΅λ‹ˆλ‹€.

βš™οΈ ν•™μŠ΅ 상세 정보 (Training Details)

이 λͺ¨λΈμ€ Kaggle T4 x 2 ν™˜κ²½μ—μ„œ μ΅œμ ν™”λœ μ„€μ •μœΌλ‘œ ν•™μŠ΅λ˜μ—ˆμŠ΅λ‹ˆλ‹€.

ν•™μŠ΅ ꡬ성 (Configuration)

  • Base Model: LiquidAI/LFM2-1.2B
  • Dataset: dataset_100000.jsonl (English-Korean Parallel, 100k samples)
  • Hardware: NVIDIA T4 GPU x 2 (Data Parallelism, DDP)
  • Epochs: 1
  • Batch Size: 1 per device (Gradient Accumulation 16) -> Effective Batch Size 32
  • Optimizer: paged_adamw_8bit
  • Learning Rate: 1e-5 (Cosine Scheduler, Warmup 0.1)
  • Precision: Mixed Precision or FP16 (Optimized for T4)

ν•™μŠ΅ μ½”λ“œ (Training Code Snippet)

# SFTTrainer Configuration used for v4
sft_config = SFTConfig(
    output_dir="/kaggle/working/lfm2-mt-v4",
    num_train_epochs=1,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=16,
    gradient_checkpointing=True,
    optim="paged_adamw_8bit",
    learning_rate=1e-5,
    lr_scheduler_type="cosine",
    warmup_ratio=0.1,
    logging_steps=50,
    save_steps=500,
    eval_strategy="no",  # Optimized for speed
    dataset_text_field="messages",
    packing=False,
    ddp_find_unused_parameters=False,
)

⚠️ μ œν•œ 사항 (Limitations)

  • 이 λͺ¨λΈμ€ 1.2B νŒŒλΌλ―Έν„°μ˜ μ†Œν˜• λͺ¨λΈλ‘œ, 맀우 λ³΅μž‘ν•˜κ±°λ‚˜ 전문적인 λ¬Έλ§₯μ—μ„œλŠ” λŒ€ν˜• λͺ¨λΈ(4B+)보닀 μ„±λŠ₯이 λ–¨μ–΄μ§ˆ 수 μžˆμŠ΅λ‹ˆλ‹€.
  • ν•™μŠ΅ 데이터에 ν¬ν•¨λ˜μ§€ μ•Šμ€ 희귀 λ‹¨μ–΄λ‚˜ 맀우 κΈ΄ λ¬Έμž₯에 λŒ€ν•΄μ„œλŠ” ν™˜κ°(Hallucination)이 λ°œμƒν•  수 μžˆμŠ΅λ‹ˆλ‹€.

πŸ“œ λΌμ΄μ„ μŠ€ (License)

이 λͺ¨λΈμ€ Liquid AI LFM Open License v1.0을 λ”°λ¦…λ‹ˆλ‹€.

  • ν—ˆμš©: ν•™μˆ  연ꡬ 및 개인적 μ‚¬μš©μ€ μ œν•œ 없이 κ°€λŠ₯ν•©λ‹ˆλ‹€.
  • 상업적 이용: μ—° 맀좜 1,000만 λ‹¬λŸ¬(μ•½ 140μ–΅ 원) 미만의 κΈ°μ—…/κ°œμΈμ€ 무료둜 상업적 이용이 κ°€λŠ₯ν•©λ‹ˆλ‹€.
  • μ œν•œ: μ—° 맀좜 1,000만 λ‹¬λŸ¬λ₯Ό μ΄ˆκ³Όν•˜λŠ” 기업은 Liquid AI와 λ³„λ„μ˜ λΌμ΄μ„ μŠ€ 계약이 ν•„μš”ν•©λ‹ˆλ‹€. μžμ„Έν•œ λ‚΄μš©μ€ LICENSE νŒŒμΌμ„ μ°Έκ³ ν•˜μ„Έμš”.

Citation

Model

@misc{lfm2-1.2b-koen-mt-v4-100k,
  author = {Gyung},
  title = {LFM2-1.2B Korean-English Machine Translation Model v4},
  year = {2025},
  publisher = {Hugging Face},
  journal = {Hugging Face Model Hub},
  howpublished = {\url{https://huggingface.co/gyung/lfm2-1.2b-koen-mt-v4-100k}}
}

Base Model (Liquid LFM-2.1B)

@article{liquidai2025lfm2,
  title={LFM2 Technical Report},
  author={Liquid AI},
  journal={arXiv preprint arXiv:2511.23404},
  year={2025}
}

Evaluation Dataset (Flores-200)

@article{nllb2022,
  author = {NLLB Team and Costa-jussΓ , Marta R. and Cross, James and Onabanjo, Onurkele and et al.},
  title = {No Language Left Behind: Scaling Human-Centered Machine Translation},
  year = {2022},
  journal = {arXiv preprint arXiv:2207.04672}
}

Metrics

@inproceedings{popovic-2015-chrf,
    title = "chrF: character n-gram F-score for automatic MT evaluation",
    author = "Popovi{\'c}, Maja",
    booktitle = "Proceedings of the Tenth Workshop on Statistical Machine Translation",
    month = sep,
    year = "2015",
    address = "Lisbon, Portugal",
    publisher = "Association for Computational Linguistics",
    pages = "392--395",
}

@inproceedings{post-2018-call,
    title = "A Call for Clarity in Reporting BLEU Scores",
    author = "Post, Matt",
    booktitle = "Proceedings of the Third Conference on Machine Translation: Research Papers",
    month = oct,
    year = "2018",
    address = "Belgium, Brussels",
    publisher = "Association for Computational Linguistics",
    pages = "186--191",
}
Downloads last month
57
Safetensors
Model size
1B params
Tensor type
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for gyung/lfm2-1.2b-koen-mt-v4-100k

Finetuned
(62)
this model
Adapters
1 model
Quantizations
2 models

Papers for gyung/lfm2-1.2b-koen-mt-v4-100k

Evaluation results