Model Card for Model ID

An updated version of Neural History Chat, using the mighty-history-merge dataset to fine-tune the previous version (v1.0).

Model Details

Run history:

train/epoch	β–β–β–‚β–‚β–ƒβ–ƒβ–ƒβ–„β–„β–…β–…β–…β–†β–†β–‡β–‡β–‡β–ˆβ–ˆ
train/global_step	β–β–β–‚β–‚β–ƒβ–ƒβ–ƒβ–„β–„β–…β–…β–…β–†β–†β–‡β–‡β–‡β–ˆβ–ˆ
train/learning_rate	β–‚β–ƒβ–…β–†β–‡β–ˆβ–‡β–‡β–†β–†β–…β–„β–„β–ƒβ–ƒβ–‚β–‚β–
train/loss	β–ˆβ–†β–„β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–‚β–ƒβ–‚β–‚β–β–β–β–β–‚β–
train/total_flos	▁
train/train_loss	▁
train/train_runtime	▁
train/train_samples_per_second	▁
train/train_steps_per_second	▁

Run summary:

train/epoch	1.98
train/global_step	92
train/learning_rate	0.0
train/loss	0.7792
train/total_flos	1.756453697101824e+16
train/train_loss	1.30356
train/train_runtime	1176.2194
train/train_samples_per_second	10.068
train/train_steps_per_second	0.078

Training Explained

We went with a shorter training session of roughly 2 epochs for testing and evaluation. More steps/epochs might be in the future, but colab pricing is pretty steep. Currently to merge the peft back to the model, requires roughly 40GB of GPU RAM. So renting a Google Colab A100 is required and runs through credits quickly.

Downloads last month
4
Safetensors
Model size
7B params
Tensor type
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Datasets used to train ambrosfitz/neural-history-chat-v1.5