bert-turkish-uncased-270m

This model is a fine-tuned version of on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 32
eval_batch_size: 32
seed: 42
gradient_accumulation_steps: 16
total_train_batch_size: 512
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 0.06
num_epochs: 4

Training Loss	Epoch	Step	Validation Loss
31.0270	0.3360	1000	1.2592
14.0337	0.6721	2000	0.8343
11.8579	1.0081	3000	0.7195
10.7548	1.3441	4000	0.6571
10.1131	1.6801	5000	0.6135
9.5222	2.0161	6000	0.5826
9.0712	2.3522	7000	0.5541
8.8246	2.6882	8000	0.5358
8.6404	3.0242	9000	0.5236
8.4261	3.3602	10000	0.5178
8.3842	3.6963	11000	0.5092

Safetensors

Model size

0.3B params

Tensor type

F32