YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

llama-3.1-8b-fft-simpleqa-ar-dclm-1to1-lr3e-4

Training Hyperparameters

Parameter Value
learning_rate 0.0003
num_train_epochs 1.0
per_device_train_batch_size 1
gradient_accumulation_steps 2
weight_decay 0.0
warmup_ratio 0.0
warmup_steps 37
lr_scheduler_type SchedulerType.CONSTANT
optim OptimizerNames.ADAMW_BNB
bf16 True
fp16 False
max_grad_norm 1.0
max_steps 6000
save_steps 1000
deepspeed {'bf16': {'enabled': 'auto'}, 'zero_optimization': {'stage': 3, 'offload_optimizer': {'device': 'none'}, 'offload_param': {'device': 'none'}, 'overlap_comm': True, 'contiguous_gradients': True, 'reduce_bucket_size': 500000000.0, 'stage3_prefetch_bucket_size': 400000000.0, 'stage3_param_persistence_threshold': 1000000.0, 'stage3_gather_16bit_weights_on_model_save': True}, 'gradient_accumulation_steps': 'auto', 'gradient_clipping': 'auto', 'train_batch_size': 'auto', 'train_micro_batch_size_per_gpu': 'auto', 'wall_clock_breakdown': False}
gradient_checkpointing True

Training Results

  • Total steps: 6000
  • Best metric: None
  • Best checkpoint: None
Downloads last month
4
Safetensors
Model size
266k params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support