πŸ‡°πŸ‡·β†”οΈπŸ‡ΊπŸ‡Έ LFM2-v8-rl-10k-merged-GGUF

LFM2-v8-rl-10k-merged의 GGUF μ–‘μžν™” 버전

CPU/GPUμ—μ„œ llama.cpp둜 λΉ λ₯Έ μΆ”λ‘  κ°€λŠ₯!

πŸ“Š μ–‘μžν™”λ³„ μ„±λŠ₯ (1012개 μˆ˜λ™ 뢄석)

κ²°λ‘ : 4/5/8λΉ„νŠΈ λͺ¨λ‘ fp32와 사싀상 동일!

Quantization CHrF++ BLEU Size fp32 λŒ€λΉ„
fp32 (원본) 34.32 13.10 4.68G -
Q8_0 πŸ† 34.39 12.93 1.25G +0.07
Q5_K_M 34.08 12.78 843M -0.24
Q4_K_M 33.97 12.56 731M -0.35

πŸ” 1012개 예제 μˆ˜λ™ κ²€ν†  κ²°κ³Ό

핡심 발견:

  1. 90% 이상: λͺ¨λ“  μ–‘μžν™” λ²„μ „μ—μ„œ 의미적으둜 λ™μΌν•œ λ²ˆμ—­
  2. 차이점: 단어 선택 차이만 쑴재
    • 예: "μ œμ•ˆν–ˆλ‹€" vs "λ§ν–ˆλ‹€" vs "μ–ΈκΈ‰ν–ˆλ‹€"
  3. ν™˜κ° νŒ¨ν„΄: μ–‘μžν™” μˆ˜μ€€κ³Ό λ¬΄κ΄€ν•˜κ²Œ λ™μΌν•˜κ²Œ λ°œμƒ
    • "George W. Bush" β†’ "μ‘°μ§€ μ›Œμ‹±ν„΄" (λͺ¨λ“  버전)
    • "cheetahs" β†’ "κΈ°λ¦°" λ˜λŠ” "ν˜Έλž‘μ΄" (λͺ¨λ“  버전)

μ–‘μžν™”κ°€ λ²ˆμ—­ ν’ˆμ§ˆμ— λ―ΈμΉ˜λŠ” 영ν–₯: 거의 μ—†μŒ!

비ꡐ ν•­λͺ© Q4 vs Q8 Q8 vs fp32
의미 차이 ❌ μ—†μŒ ❌ μ—†μŒ
단어 선택 μ•½κ°„ 닀름 거의 동일
ν™˜κ° λΉˆλ„ 동일 동일
반볡 버그 ❌ μ—†μŒ fp32만 λ°œμƒ

⚠️ 였히렀 fp32 mergedμ—μ„œ 0.1% 미만 ν™•λ₯ λ‘œ 반볡 좜λ ₯ 버그 λ°œμƒ! GGUF μ–‘μžν™” 버전이 더 μ•ˆμ •μ μž…λ‹ˆλ‹€.

πŸ“¦ μ‚¬μš© κ°€λŠ₯ν•œ 파일

파일 크기 μΆ”μ²œ μš©λ„
*-Q8_0.gguf 1.25G ν’ˆμ§ˆ+μ•ˆμ •μ„± μ΅œμš°μ„  πŸ†
*-Q5_K_M.gguf 843M κ· ν˜• μΆ”μ²œ
*-Q4_K_M.gguf 731M κ²½λŸ‰ν™”/λͺ¨λ°”일

πŸš€ μ‚¬μš©λ²•

llama-cpp-python (Python)

from llama_cpp import Llama
from huggingface_hub import hf_hub_download

# λͺ¨λΈ λ‹€μš΄λ‘œλ“œ (Q8_0 μΆ”μ²œ)
model_path = hf_hub_download(
    "gyung/lfm2-1.2b-koen-mt-v8-rl-10k-merged-GGUF",
    "lfm2-1.2b-koen-mt-v8-rl-10k-merged-Q8_0.gguf"
)

# λͺ¨λΈ λ‘œλ“œ
llm = Llama(
    model_path=model_path,
    n_ctx=4096,
    n_gpu_layers=-1,  # GPU μ‚¬μš© (-1: 전체 λ ˆμ΄μ–΄)
    verbose=False
)

def translate(text, direction="en2ko"):
    if direction == "en2ko":
        system = "Translate the following text to Korean."
    else:
        system = "Translate the following text to English."
    
    prompt = f"""<|im_start|>system
{system}<|im_end|>
<|im_start|>user
{text}<|im_end|>
<|im_start|>assistant
"""
    output = llm(prompt, max_tokens=256, stop=["<|im_end|>"], temperature=0.3)
    return output['choices'][0]['text'].strip()

# μ‚¬μš© μ˜ˆμ‹œ
print(translate("The weather is beautiful today."))
# β†’ 였늘 날씨가 정말 μ•„λ¦„λ‹΅μŠ΅λ‹ˆλ‹€.

print(translate("ν•œκ΅­ μŒμ‹μ΄ 정말 λ§›μžˆμ–΄μš”.", "ko2en"))
# β†’ Korean food is really delicious.

Colabμ—μ„œ GPU μ‚¬μš©

# 1. CUDA 지원 llama-cpp-python μ„€μΉ˜ (μ€‘μš”!)
!pip uninstall llama-cpp-python -y
!pip install llama-cpp-python==0.3.16 \
    --extra-index-url https://github.com/abetlen/llama-cpp-python/releases/download/v0.3.16-cu124

# 2. μœ„ μ½”λ“œ μ‹€ν–‰

llama.cpp CLI

llama-cli -hf gyung/lfm2-1.2b-koen-mt-v8-rl-10k-merged-GGUF \
    -p "Translate to Korean: Hello world"

πŸ’‘ μ™œ GGUFκ°€ 더 쒋은가?

ν•­λͺ© fp32/fp16 GGUF Q8_0
크기 4.68GB 1.25GB (3.7λ°° μž‘μŒ)
μ„±λŠ₯ CHrF++ 34.32 CHrF++ 34.39 (동등+)
μ•ˆμ •μ„± 반볡 버그 있음 βœ… μ•ˆμ •μ 
μΆ”λ‘  속도 GPU ν•„μš” CPUμ—μ„œλ„ 빠름
μš©λ„ μΆ”κ°€ ν•™μŠ΅μš© μ‹€μ œ μ„œλΉ™μš©

πŸ”— κ΄€λ ¨ 링크

πŸ“œ λΌμ΄μ„ μŠ€

Downloads last month
20
GGUF
Model size
1B params
Architecture
lfm2
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for gyung/lfm2-1.2b-koen-mt-v8-rl-10k-merged-GGUF

Quantized
(1)
this model