kanana-1.5-8b-rlhf
PPO (Proximal Policy Optimization)로 학습된 한국어 모델입니다.
학습 정보
- Base Model: Kanana 1.5 8B
- 학습 방법: PPO (RLHF)
- Batch Size: 80
- Learning Rate: 1e-5
사용 방법
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("jinn33/kanana-1.5-8b-rlhf")
model = AutoModelForCausalLM.from_pretrained("jinn33/kanana-1.5-8b-rlhf")
prompt = "### 질문: 안녕하세요\n\n### 답변:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support