You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

dense_swe_100m_mult_reseg_ep20_goldfish

This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:

  • Loss: 4.7766

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 32
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 32
  • optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 1331
  • training_steps: 13311
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
7.2408 0.7510 500 6.3540
5.555 1.5017 1000 5.4139
5.1183 2.2523 1500 4.9643
4.7224 3.0030 2000 4.6844
4.5149 3.7540 2500 4.5089
4.2696 4.5047 3000 4.3923
4.1498 5.2554 3500 4.3082
4.0338 6.0060 4000 4.2512
3.8773 6.7570 4500 4.2106
3.7121 7.5077 5000 4.2043
3.6599 8.2584 5500 4.2138
3.5884 9.0090 6000 4.2146
3.4329 9.7600 6500 4.2273
3.2886 10.5107 7000 4.2738
3.2625 11.2614 7500 4.3216
3.1992 12.0120 8000 4.3414
3.0634 12.7630 8500 4.3876
2.9406 13.5137 9000 4.4510
2.9123 14.2644 9500 4.5139
2.8744 15.0150 10000 4.5477
2.756 15.7661 10500 4.5891
2.6645 16.5167 11000 4.6535
2.6478 17.2674 11500 4.6994
2.6125 18.0180 12000 4.7239
2.5343 18.7691 12500 4.7530
2.49 19.5197 13000 4.7759

Framework versions

  • Transformers 4.57.1
  • Pytorch 2.9.1+cu128
  • Datasets 3.6.0
  • Tokenizers 0.22.1
Downloads last month
-
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support