Long-Context Marin 8B - OpenThoughts3

This model is the final checkpoint from the exp2199a2_redo2 experiment, which fine-tunes the long-context extended Marin-8B model on the OpenThoughts3 dataset.

Model Details

Training Hyperparameters

Parameter Value
Epochs 5
Batch Size 512
Learning Rate 8e-5
Max Sequence Length 16384
LR Schedule Cosine
Warmup 10%
Decay 0.9
Weight Decay 0.0
Beta1 0.9
Beta2 0.999
Hardware TPU v4-512

Training Notes

  • Era shuffling enabled (dataset shuffled every epoch)
  • Trained with Llama3-style rotary embeddings configured for 64k context
Downloads last month
4
Safetensors
Model size
8B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train marin-community/longcontext-marin-8b-openthoughts3