Training procedure

The following bitsandbytes quantization config was used during training:

  • load_in_8bit: True
  • load_in_4bit: False
  • llm_int8_threshold: 6.0
  • llm_int8_skip_modules: None
  • llm_int8_enable_fp32_cpu_offload: False
  • llm_int8_has_fp16_weight: False
  • bnb_4bit_quant_type: fp4
  • bnb_4bit_use_double_quant: False
  • bnb_4bit_compute_dtype: float32

Model Description

For more information on how it was created, check out the following link: https://github.com/DunnBC22/NLP_Projects/blob/main/OPT%20Models/Grade%20School%20Math%20Instructions%20Fine-Tune%20OPT.ipynb.

Intended uses & limitations

This is intended to show the possibilities. It is mainly limited by the input data.

Training & Evaluation Dataset

Dataset Source: https://huggingface.co/datasets/qwedsacf/grade-school-math-instructions

Hyperparameters Used

Hyperperameter Value
Model Checkpoint facebook/opt-2.7b
per_device_train_batch_size 4
gradient_accumulation_steps 4
fp16 True
warmup_steps 225
learning_rate 2e-4
Training Steps 450

Framework versions

Library Version
Python 3.10.1
Torch 2.0.1+cu118
Datasets 2.14.4
Transformer 4.31.0
PEFT 0.4.0

Metric

Perplexity = 6.35

License

This model is a fine-tuned version of Meta's OPT-2.7B model.

The original model is released under a custom license (Meta OPT license). The training dataset (grade-school-math-instructions) is derived from GSM8K and may have its own licensing terms.

As a result, this model is released under a non-standard license. Users must review and comply with all upstream licenses.

License Notice

This model is a fine-tuned derivative of a pretrained model. Users must comply with the original model license.

Dataset Notice

This model was fine-tuned on third-party datasets which may have separate licenses or usage restrictions.

Downloads last month
20
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train DunnBC22/opt-2.7b-Fine-Tuned-Grade_School_Math_Instructions