You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

qwen3vl_8b_full_sft_data_v19_h100_fsdp_ep1

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 2
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 128
gradient_accumulation_steps: 2
total_train_batch_size: 512
total_eval_batch_size: 1024
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1.0

Safetensors

Model size

68.5M params

Tensor type

BF16

Base model

Finetuned

(240)

this model