arithmetic-circuit-overloading
/

Qwen3-32B-3d-500K-50K-0.1-reverse-padzero-plus-mul-sub-99-64D-2L-4H-256I

Qwen3-32B-3d-500K-50K-0.1-reverse-padzero-plus-mul-sub-99-64D-2L-4H-256I

This model is a fine-tuned version of Qwen/Qwen3-32B on an unknown dataset. It achieves the following results on the evaluation set:

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 128
eval_batch_size: 128
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.05
num_epochs: 5

Training Loss	Epoch	Step	Validation Loss
No log	0	0	3.0606
1.758	0.1280	500	1.7425
1.5211	0.2560	1000	1.4843
1.2946	0.3839	1500	1.2896
1.2631	0.5119	2000	1.2621
1.2473	0.6399	2500	1.2442
1.2403	0.7679	3000	1.2364
1.2336	0.8958	3500	1.2347
1.228	1.0238	4000	1.2335
1.2237	1.1518	4500	1.2223
1.2198	1.2798	5000	1.2183
1.2173	1.4077	5500	1.2164
1.2138	1.5357	6000	1.2116
1.2124	1.6637	6500	1.2101
1.2073	1.7917	7000	1.2068
1.2033	1.9196	7500	1.2044
1.2017	2.0476	8000	1.2026
1.1995	2.1756	8500	1.2001
1.1991	2.3036	9000	1.1991
1.197	2.4315	9500	1.1953
1.1936	2.5595	10000	1.1949
1.192	2.6875	10500	1.1925
1.1921	2.8155	11000	1.1913
1.1894	2.9434	11500	1.1890
1.1883	3.0714	12000	1.1876
1.1851	3.1994	12500	1.1862
1.1841	3.3274	13000	1.1854
1.1839	3.4553	13500	1.1845
1.1845	3.5833	14000	1.1839
1.183	3.7113	14500	1.1835
1.1817	3.8393	15000	1.1832
1.1842	3.9672	15500	1.1829
1.1825	4.0952	16000	1.1828
1.1843	4.2232	16500	1.1827
1.1842	4.3512	17000	1.1826
1.1835	4.4791	17500	1.1826
1.1815	4.6071	18000	1.1826
1.1818	4.7351	18500	1.1826
1.1833	4.8631	19000	1.1826
1.1831	4.9910	19500	1.1826

Safetensors

Model size

364k params

Tensor type

BF16

Base model

Finetuned

(480)

this model