Long-Context Marin 8B - OpenThoughts3
This model is the final checkpoint from the exp2199a2_redo2 experiment, which fine-tunes the long-context extended Marin-8B model on the OpenThoughts3 dataset.
Model Details
- Base Model: tootsie-8b-giraffe-phase3-64k (Marin 8B with 64k context extension)
- Training Dataset: OpenThoughts3-1.2M (1.2M examples)
- Final Checkpoint: step-11718
Training Hyperparameters
| Parameter | Value |
|---|---|
| Epochs | 5 |
| Batch Size | 512 |
| Learning Rate | 8e-5 |
| Max Sequence Length | 16384 |
| LR Schedule | Cosine |
| Warmup | 10% |
| Decay | 0.9 |
| Weight Decay | 0.0 |
| Beta1 | 0.9 |
| Beta2 | 0.999 |
| Hardware | TPU v4-512 |
Training Notes
- Era shuffling enabled (dataset shuffled every epoch)
- Trained with Llama3-style rotary embeddings configured for 64k context
- Downloads last month
- 4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support