π LFM2-1.2B-KoEn-MT-v4-100k
LFM2-1.2B-KoEn-MT-v4-100kμ LiquidAIμ LFM2-1.2B λͺ¨λΈμ κΈ°λ°μΌλ‘ νκ΅μ΄-μμ΄ λ²μ λ₯λ ₯ ν₯μμ μν΄ 100,000κ°μ κ³ νμ§ λ³λ ¬ λ°μ΄ν°μ
μΌλ‘ νμΈνλλ λͺ¨λΈμ
λλ€.
T4 GPU x 2 (DDP) νκ²½μμ μ΅μ νλ νμ΅ νμ΄νλΌμΈμ ν΅ν΄ νμ΅λμμΌλ©°, 1.2Bμ κ°λ²Όμ΄ νλΌλ―Έν°λ‘λ ν¨μ¨μ μ΄κ³ μ€μν λ²μ μ±λ₯μ 보μ¬μ€λλ€. νΉν, NLLB-600Mκ³Ό κ²½μ κ°λ₯ν μ±λ₯μ 보μ΄λ©° λͺ¨λ°μΌ λ° μ£μ§ λλ°μ΄μ€μμμ νμ© κ°λ₯μ±μ μ΄μ΄μ€λλ€.
π λ²€μΉλ§ν¬ (Benchmarks)
Flores-200 λ°μ΄ν°μ (1012 λ¬Έμ₯)μ κΈ°μ€μΌλ‘ ν νκ° κ²°κ³Όμ λλ€. (CHrF++ κΈ°μ€ μ λ ¬)
| Rank | Model | CHrF++ | BLEU | λΉκ³ |
|---|---|---|---|---|
| 1 | Google Translate | 39.27 | 18.18 | μμ© μλΉμ€ (Target) |
| 2 | Yanolja-4B-GGUF | 38.61 | 16.03 | Open Source Model (SOTA) |
| 3 | NLLB-200 (3.3B) | 35.09 | 11.68 | 3.3B λ²μ μ μ© λͺ¨λΈ |
| 4 | Gemma-3-4B-it-GGUF | 32.83 | 11.36 | Google μ΅μ 4B λͺ¨λΈ |
| 5 | NLLB-200-Distilled-600M | 31.97 | 10.32 | 600M λ²μ μ μ© λͺ¨λΈ |
| 6 | LFM2-1.2B-KOEN-MT-v4-100k | 31.53 | 11.13 | λ³Έ λͺ¨λΈ (1.2B) |
| 7 | lfm2-mt-v1 | 30.85 | 11.17 | 100 Samples νμ΅ |
| 8 | LFM2-1.2B | 27.23 | 6.43 | λ² μ΄μ€λΌμΈ λͺ¨λΈ |
| 9 | Qwen3-4B-GGUF | 25.62 | 7.46 | 4B Base Model |
| 10 | Gemma-3-1B-it-GGUF | 24.07 | 6.94 | 1B λͺ¨λΈ |
| 11 | Qwen3-1.7B-GGUF | 21.19 | - | 1.7B Base Model |
| 12 | Qwen3-0.6B-GGUF | 13.48 | 1.98 | 0.6B Base Model |
π νμ΅ λ‘κ·Έ (Training Logs)
μ½ 6,188 Step λμ μ§νλ νμ΅μ Loss λ° Learning Rate λ³ν μΆμ΄μ λλ€. μ΄κΈ° μμ€κ° 3.5λμμ μμνμ¬ μ΅μ’ 1.43κΉμ§ μμ μ μΌλ‘ μλ ΄νμμ΅λλ€.
| Step | Epoch | Training Loss (Avg) | Learning Rate | λΉκ³ |
|---|---|---|---|---|
| 0 | 0.00 | 3.57 | 0 | Start |
| 500 | 0.08 | 1.59 | 8.06e-06 | Warmup μλ£ ν κ°μ |
| 1000 | 0.16 | 1.57 | 9.88e-06 | μ΄κΈ° μμ ν |
| 2000 | 0.32 | 1.48 | 8.45e-06 | Loss 1.5 λ―Έλ§ μ§μ |
| 3000 | 0.49 | 1.46 | 5.99e-06 | μ€λ°λΆ μλ ΄ κ°μ |
| 4000 | 0.65 | 1.45 | 3.21e-06 | λ―ΈμΈ μ‘°μ λ¨κ³ |
| 5000 | 0.81 | 1.44 | 1.08e-06 | μ±λ₯ κ·Ήλν |
| 6000 | 0.98 | 1.43 | 6.30e-09 | μ΅μ’ μλ ΄ (Final Convergence) |
- Optimizer:
paged_adamw_8bit - LR Scheduler: Cosine Decay with Warmup (0.1 ratio)
- Max LR: 1e-5
π μ¬μ© μμ (Usage)
μ΄ λͺ¨λΈμ transformers λΌμ΄λΈλ¬λ¦¬λ₯Ό μ¬μ©νμ¬ μ½κ² λ‘λνκ³ λ²μμ μνν μ μμ΅λλ€.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
# λͺ¨λΈ λ‘λ
model_id = "gyung/lfm2-1.2b-koen-mt-v4-100k"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype=torch.float16
)
# λ²μν λ¬Έμ₯
text = "The model is working correctly now."
# μ±ν
ν
νλ¦Ώ μ μ© (ChatML νμ κΆμ₯)
messages = [
{"role": "system", "content": "Translate to Korean."},
{"role": "user", "content": text}
]
# μ
λ ₯ ν ν°ν
input_ids = tokenizer.apply_chat_template(
messages,
return_tensors="pt",
add_generation_prompt=True
).to(model.device)
# λ²μ μμ±
outputs = model.generate(
input_ids,
max_new_tokens=256,
pad_token_id=tokenizer.eos_token_id
)
# κ²°κ³Ό λμ½λ©
decoded = tokenizer.decode(outputs[0][input_ids.shape[1]:], skip_special_tokens=True)
print(f"Input: {text}")
print(f"Output: {decoded}")
# Output: λͺ¨λΈμ΄ μ μμ μΌλ‘ μλνκ³ μμ΅λλ€.
βοΈ νμ΅ μμΈ μ 보 (Training Details)
μ΄ λͺ¨λΈμ Kaggle T4 x 2 νκ²½μμ μ΅μ νλ μ€μ μΌλ‘ νμ΅λμμ΅λλ€.
νμ΅ κ΅¬μ± (Configuration)
- Base Model:
LiquidAI/LFM2-1.2B - Dataset:
dataset_100000.jsonl(English-Korean Parallel, 100k samples) - Hardware: NVIDIA T4 GPU x 2 (Data Parallelism, DDP)
- Epochs: 1
- Batch Size: 1 per device (Gradient Accumulation 16) -> Effective Batch Size 32
- Optimizer:
paged_adamw_8bit - Learning Rate: 1e-5 (Cosine Scheduler, Warmup 0.1)
- Precision: Mixed Precision or FP16 (Optimized for T4)
νμ΅ μ½λ (Training Code Snippet)
# SFTTrainer Configuration used for v4
sft_config = SFTConfig(
output_dir="/kaggle/working/lfm2-mt-v4",
num_train_epochs=1,
per_device_train_batch_size=1,
gradient_accumulation_steps=16,
gradient_checkpointing=True,
optim="paged_adamw_8bit",
learning_rate=1e-5,
lr_scheduler_type="cosine",
warmup_ratio=0.1,
logging_steps=50,
save_steps=500,
eval_strategy="no", # Optimized for speed
dataset_text_field="messages",
packing=False,
ddp_find_unused_parameters=False,
)
β οΈ μ ν μ¬ν (Limitations)
- μ΄ λͺ¨λΈμ 1.2B νλΌλ―Έν°μ μν λͺ¨λΈλ‘, λ§€μ° λ³΅μ‘νκ±°λ μ λ¬Έμ μΈ λ¬Έλ§₯μμλ λν λͺ¨λΈ(4B+)λ³΄λ€ μ±λ₯μ΄ λ¨μ΄μ§ μ μμ΅λλ€.
- νμ΅ λ°μ΄ν°μ ν¬ν¨λμ§ μμ ν¬κ· λ¨μ΄λ λ§€μ° κΈ΄ λ¬Έμ₯μ λν΄μλ νκ°(Hallucination)μ΄ λ°μν μ μμ΅λλ€.
π λΌμ΄μ μ€ (License)
μ΄ λͺ¨λΈμ Liquid AI LFM Open License v1.0μ λ°λ¦ λλ€.
- νμ©: νμ μ°κ΅¬ λ° κ°μΈμ μ¬μ©μ μ ν μμ΄ κ°λ₯ν©λλ€.
- μμ μ μ΄μ©: μ° λ§€μΆ 1,000λ§ λ¬λ¬(μ½ 140μ΅ μ) λ―Έλ§μ κΈ°μ /κ°μΈμ 무λ£λ‘ μμ μ μ΄μ©μ΄ κ°λ₯ν©λλ€.
- μ ν: μ° λ§€μΆ 1,000λ§ λ¬λ¬λ₯Ό μ΄κ³Όνλ κΈ°μ μ Liquid AIμ λ³λμ λΌμ΄μ μ€ κ³μ½μ΄ νμν©λλ€. μμΈν λ΄μ©μ LICENSE νμΌμ μ°Έκ³ νμΈμ.
Citation
Model
@misc{lfm2-1.2b-koen-mt-v4-100k,
author = {Gyung},
title = {LFM2-1.2B Korean-English Machine Translation Model v4},
year = {2025},
publisher = {Hugging Face},
journal = {Hugging Face Model Hub},
howpublished = {\url{https://huggingface.co/gyung/lfm2-1.2b-koen-mt-v4-100k}}
}
Base Model (Liquid LFM-2.1B)
@article{liquidai2025lfm2,
title={LFM2 Technical Report},
author={Liquid AI},
journal={arXiv preprint arXiv:2511.23404},
year={2025}
}
Evaluation Dataset (Flores-200)
@article{nllb2022,
author = {NLLB Team and Costa-jussΓ , Marta R. and Cross, James and Onabanjo, Onurkele and et al.},
title = {No Language Left Behind: Scaling Human-Centered Machine Translation},
year = {2022},
journal = {arXiv preprint arXiv:2207.04672}
}
Metrics
@inproceedings{popovic-2015-chrf,
title = "chrF: character n-gram F-score for automatic MT evaluation",
author = "Popovi{\'c}, Maja",
booktitle = "Proceedings of the Tenth Workshop on Statistical Machine Translation",
month = sep,
year = "2015",
address = "Lisbon, Portugal",
publisher = "Association for Computational Linguistics",
pages = "392--395",
}
@inproceedings{post-2018-call,
title = "A Call for Clarity in Reporting BLEU Scores",
author = "Post, Matt",
booktitle = "Proceedings of the Third Conference on Machine Translation: Research Papers",
month = oct,
year = "2018",
address = "Belgium, Brussels",
publisher = "Association for Computational Linguistics",
pages = "186--191",
}
- Downloads last month
- 57
Model tree for gyung/lfm2-1.2b-koen-mt-v4-100k
Papers for gyung/lfm2-1.2b-koen-mt-v4-100k
No Language Left Behind: Scaling Human-Centered Machine Translation
Evaluation results
- CHrF++ on Flores-200self-reported31.530
- BLEU on Flores-200self-reported11.130