kanana-1.5-8b-rlhf

PPO (Proximal Policy Optimization)로 학습된 한국어 모델입니다.

학습 정보

  • Base Model: Kanana 1.5 8B
  • 학습 방법: PPO (RLHF)
  • Batch Size: 80
  • Learning Rate: 1e-5

사용 방법

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("jinn33/kanana-1.5-8b-rlhf")
model = AutoModelForCausalLM.from_pretrained("jinn33/kanana-1.5-8b-rlhf")

prompt = "### 질문: 안녕하세요\n\n### 답변:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support