Model Card for Model ID
AI μ λΉ λ°μ΄ν° λΆμ μ λ¬Έ κΈ°μ μΈ Linkbricksμ λ°μ΄ν°μ¬μ΄μΈν°μ€νΈμΈ μ§μ€μ±(Saxo) μ΄μ¬κ° meta-llama/Meta-Llama-3-8Bλ₯Ό λ² μ΄μ€λͺ¨λΈλ‘ GCPμμ H100-80G 8κ°λ₯Ό ν΅ν΄ SFT-DPO νλ ¨μ ν(8000 Tokens) νκΈ κΈ°λ° λͺ¨λΈ. ν ν¬λμ΄μ λ λΌλ§3λ λμΌνλ©° νκΈ VOCA νμ₯μ νμ§ μμ λ²μ μ λλ€. νκΈμ΄ 20λ§κ° μ΄μ ν¬ν¨λ νκΈμ μ© ν ν¬λμ΄μ λͺ¨λΈμ λ³λ μ°λ½ μ£ΌμκΈ° λ°λλλ€.
Dr. Yunsung Ji (Saxo), a data scientist at Linkbricks, a company specializing in AI and big data analytics, trained the meta-llama/Meta-Llama-3-8B base model on 8 H100-60Gs on GCP for 4 hours of instructional training (8000 Tokens). Accelerate, Deepspeed Zero-3 libraries were used.
www.linkbricks.com, www.linkbricks.vc
Configuration including BitsandBytes
bnb_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_use_double_quant=False, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch_dtype )
args = TrainingArguments( output_dir=project_name, run_name=run_name_str, overwrite_output_dir=True, num_train_epochs=20, per_device_train_batch_size=1, gradient_accumulation_steps=4, #1 gradient_checkpointing=True, optim="paged_adamw_32bit", #optim="adamw_8bit", logging_steps=10, save_steps=100, save_strategy="epoch", learning_rate=2e-4, #2e-4 weight_decay=0.01, max_grad_norm=1, #0.3 max_steps=-1, warmup_ratio=0.1, group_by_length=False, fp16 = not torch.cuda.is_bf16_supported(), bf16 = torch.cuda.is_bf16_supported(), #fp16 = True, lr_scheduler_type="cosine", #"constant", disable_tqdm=False, report_to='wandb', push_to_hub=False )
- Downloads last month
- 18