Qwen3-32B-3d-500K-50K-0.1-reverse-padzero-plus-mul-sub-99-256D-1L-2H-1024I

This model is a fine-tuned version of Qwen/Qwen3-32B on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.4366

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 128
  • eval_batch_size: 128
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 5

Training results

Training Loss Epoch Step Validation Loss
No log 0 0 3.0621
1.7193 0.1280 500 1.6627
1.5122 0.2560 1000 1.5073
1.4812 0.3839 1500 1.4818
1.4713 0.5119 2000 1.4676
1.4652 0.6399 2500 1.4653
1.4599 0.7679 3000 1.4595
1.4559 0.8958 3500 1.4556
1.4557 1.0238 4000 1.4551
1.4524 1.1518 4500 1.4535
1.4533 1.2798 5000 1.4534
1.4529 1.4077 5500 1.4504
1.452 1.5357 6000 1.4509
1.4519 1.6637 6500 1.4495
1.4495 1.7917 7000 1.4478
1.4449 1.9196 7500 1.4466
1.4439 2.0476 8000 1.4450
1.442 2.1756 8500 1.4447
1.4447 2.3036 9000 1.4432
1.4432 2.4315 9500 1.4430
1.4407 2.5595 10000 1.4416
1.4404 2.6875 10500 1.4415
1.4429 2.8155 11000 1.4401
1.4418 2.9434 11500 1.4401
1.4386 3.0714 12000 1.4395
1.437 3.1994 12500 1.4387
1.4356 3.3274 13000 1.4385
1.4364 3.4553 13500 1.4379
1.4387 3.5833 14000 1.4375
1.4362 3.7113 14500 1.4373
1.4351 3.8393 15000 1.4370
1.4382 3.9672 15500 1.4369
1.4373 4.0952 16000 1.4368
1.4392 4.2232 16500 1.4367
1.4387 4.3512 17000 1.4367
1.438 4.4791 17500 1.4366
1.4345 4.6071 18000 1.4366
1.435 4.7351 18500 1.4366
1.4369 4.8631 19000 1.4366
1.4376 4.9910 19500 1.4366

Framework versions

  • Transformers 4.57.1
  • Pytorch 2.9.0+cu128
  • Datasets 4.5.0
  • Tokenizers 0.22.1
Downloads last month
12
Safetensors
Model size
1.06M params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for arithmetic-circuit-overloading/Qwen3-32B-3d-500K-50K-0.1-reverse-padzero-plus-mul-sub-99-256D-1L-2H-1024I

Base model

Qwen/Qwen3-32B
Finetuned
(480)
this model