Helium1-2B-Grimoire-ORPO

Testing grimore's ORPO implementation and using LoRA for post-training ChatML support.

Training Configuration

Parameter Value
Training Mode ORPO
Base Model kyutai/helium-1-2b
Learning Rate 9e-05
Epochs 1
Batch Size 2
Gradient Accumulation 16
Effective Batch Size 32
Max Sequence Length 4096
Optimizer paged_adamw_8bit
LR Scheduler cosine
Warmup Ratio 0.05
Weight Decay 0.01
Max Grad Norm 0.25
Seed 42
ORPO Beta 0.1
Max Prompt Length 2048
LoRA Rank (r) 128
LoRA Alpha 64
LoRA Dropout 0.05
Target Modules k_proj, o_proj, q_proj, v_proj, down_proj, gate_proj, up_proj
Quantization 4-bit (NF4)
GPU NVIDIA RTX A6000

Trained with Merlina

Merlina on GitHub

Downloads last month
5
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nbeerbower/Helium1-2B-Grimoire-ORPO

Finetuned
(3)
this model

Dataset used to train nbeerbower/Helium1-2B-Grimoire-ORPO