Model Card for Model ID
llama3-8B supervised finetuning with lora
Model Details
0:"q_proj" 1:"v_proj" lora_alpha:32 lora_dropout:0.05 r:8 gamma:0.85 batch_size_training:4 gradient_accumulation_steps:4 lr:0.0001 num_epochs:3 optimizer:"AdamW" peft_method:"llama_adapter" trainable params: 3,407,872 || all params: 8,033,669,120 || trainable%: 0.0424
Model Description
Average epoch time: 781s Train loss: 0.2921483516693115 Eval loss: 1.125542402267456 Eval perplexity: 3.08188796043396
Max CUDA memory allocated was 48 GB Max CUDA memory reserved was 55 GB Peak active CUDA memory was 48 GB CUDA Malloc retries : 0 CPU Total Peak Memory consumed during the train (max): 3 GB
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support