Model Card for Model ID

AI 와 빅데이터 뢄석 μ „λ¬Έ 기업인 Linkbricks의 λ°μ΄ν„°μ‚¬μ΄μ–Έν‹°μŠ€νŠΈμΈ μ§€μœ€μ„±(Saxo) 이사가 meta-llama/Meta-Llama-3-8Bλ₯Ό 베이슀λͺ¨λΈλ‘œ GCPμƒμ˜ H100-80G 8개λ₯Ό 톡해 SFT-DPO ν›ˆλ ¨μ„ ν•œ(8000 Tokens) ν•œκΈ€ 기반 λͺ¨λΈ. ν† ν¬λ‚˜μ΄μ €λŠ” 라마3λž‘ λ™μΌν•˜λ©° ν•œκΈ€ VOCA ν™•μž₯은 ν•˜μ§€ μ•Šμ€ 버전 μž…λ‹ˆλ‹€. ν•œκΈ€μ΄ 20만개 이상 ν¬ν•¨λœ ν•œκΈ€μ „μš© ν† ν¬λ‚˜μ΄μ € λͺ¨λΈμ€ 별도 연락 μ£Όμ‹œκΈ° λ°”λžλ‹ˆλ‹€.

Dr. Yunsung Ji (Saxo), a data scientist at Linkbricks, a company specializing in AI and big data analytics, trained the meta-llama/Meta-Llama-3-8B base model on 8 H100-60Gs on GCP for 4 hours of instructional training (8000 Tokens). Accelerate, Deepspeed Zero-3 libraries were used.

www.linkbricks.com, www.linkbricks.vc

Configuration including BitsandBytes


bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_use_double_quant=False, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch_dtype )

args = TrainingArguments( output_dir=project_name, run_name=run_name_str, overwrite_output_dir=True, num_train_epochs=20, per_device_train_batch_size=1, gradient_accumulation_steps=4, #1 gradient_checkpointing=True, optim="paged_adamw_32bit", #optim="adamw_8bit", logging_steps=10, save_steps=100, save_strategy="epoch", learning_rate=2e-4, #2e-4 weight_decay=0.01, max_grad_norm=1, #0.3 max_steps=-1, warmup_ratio=0.1, group_by_length=False, fp16 = not torch.cuda.is_bf16_supported(), bf16 = torch.cuda.is_bf16_supported(), #fp16 = True, lr_scheduler_type="cosine", #"constant", disable_tqdm=False, report_to='wandb', push_to_hub=False )

Downloads last month
18
Safetensors
Model size
8B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Saxo/Linkbricks-Horizon-AI-Korean-llama3-sft-dpo-8b-base

Merges
1 model
Quantizations
4 models

Dataset used to train Saxo/Linkbricks-Horizon-AI-Korean-llama3-sft-dpo-8b-base

Spaces using Saxo/Linkbricks-Horizon-AI-Korean-llama3-sft-dpo-8b-base 8