Harnessing Optimization Dynamics for Curvature-Informed Model Merging
Paper • 2509.11167 • Published • 1
This model is a fine-tuned version of meta-llama/Llama-3.1-8B on the tulu3_mixture_coding dataset. It achieves the following results on the evaluation set:
More information needed
More information needed
More information needed
The following hyperparameters were used during training:
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| 0.8738 | 0.0909 | 100 | 0.8080 |
| 0.8557 | 0.1818 | 200 | 0.8015 |
| 0.8619 | 0.2726 | 300 | 0.7972 |
| 0.8201 | 0.3635 | 400 | 0.7945 |
| 0.8609 | 0.4544 | 500 | 0.7920 |
| 0.8175 | 0.5453 | 600 | 0.7903 |
| 0.8462 | 0.6361 | 700 | 0.7885 |
| 0.8307 | 0.7270 | 800 | 0.7850 |
| 0.8595 | 0.8179 | 900 | 0.7791 |
| 0.8116 | 0.9088 | 1000 | 0.7748 |
| 0.8221 | 0.9996 | 1100 | 0.7736 |
Base model
meta-llama/Llama-3.1-8B