qwen-1_5b-sft-eng-hin-deu

This model is a fine-tuned version of Qwen/Qwen2.5-1.5B-Instruct on the aya_eng_hin_deu_train dataset. It achieves the following results on the evaluation set:

Loss: 1.1755

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 2
eval_batch_size: 1
seed: 42
distributed_type: multi-GPU
num_devices: 2
total_train_batch_size: 4
total_eval_batch_size: 2
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1.0

Training results

Training Loss	Epoch	Step	Validation Loss
1.1385	0.0500	500	1.3436
1.6044	0.1000	1000	1.3689
1.3966	0.1501	1500	1.3523
1.1879	0.2001	2000	1.3347
1.4383	0.2501	2500	1.3180
1.1371	0.3001	3000	1.3040
1.7056	0.3501	3500	1.2872
1.1809	0.4002	4000	1.2741
1.3698	0.4502	4500	1.2622
1.6436	0.5002	5000	1.2495
1.1414	0.5502	5500	1.2348
1.0521	0.6002	6000	1.2228
1.3184	0.6503	6500	1.2088
1.0562	0.7003	7000	1.1995
1.277	0.7503	7500	1.1915
1.0233	0.8003	8000	1.1840
1.2328	0.8503	8500	1.1795
1.331	0.9004	9000	1.1768
1.3374	0.9504	9500	1.1758

Framework versions

Transformers 4.57.1
Pytorch 2.9.1+cu128
Datasets 4.0.0
Tokenizers 0.22.1

Downloads last month: 2

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for Mayank6255/qwen-1_5b-sft-eng-hin-deu

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-1.5B-Instruct

Finetuned

(1503)

this model