kennethge123
/

superglue-rte-gpt2-kd

Model card Files Files and versions

Metrics Training metrics Community

Plainly Optimized Network

Dataset: SUPERGLUE

Trainer Hyperparameters:

lr = 5e-05
per_device_batch_size = 8
gradient_accumulation_steps = 2
weight_decay = 1e-09
seed = 42

eval_loss	eval_accuracy	epoch
19.092	0.667	1.0
18.211	0.667	2.0
17.359	0.739	3.0
17.168	0.732	4.0
18.647	0.681	5.0
18.081	0.681	6.0
18.325	0.688	7.0
18.660	0.688	8.0
18.464	0.688	9.0
18.622	0.696	10.0
17.838	0.710	11.0
17.792	0.703	12.0
18.009	0.696	13.0
19.033	0.674	14.0
17.430	0.717	15.0
18.218	0.696	16.0
17.915	0.710	17.0
17.956	0.717	18.0
18.078	0.725	19.0

Downloads last month: 3

Safetensors

Model size

0.1B params

Tensor type

F32

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support