Helium1-2B-Grimoire-ORPO

Testing grimore's ORPO implementation and using LoRA for post-training ChatML support.

Training Configuration

Parameter	Value
Training Mode	ORPO
Base Model	`kyutai/helium-1-2b`
Learning Rate	9e-05
Epochs	1
Batch Size	2
Gradient Accumulation	16
Effective Batch Size	32
Max Sequence Length	4096
Optimizer	paged_adamw_8bit
LR Scheduler	cosine
Warmup Ratio	0.05
Weight Decay	0.01
Max Grad Norm	0.25
Seed	42
ORPO Beta	0.1
Max Prompt Length	2048
LoRA Rank (r)	128
LoRA Alpha	64
LoRA Dropout	0.05
Target Modules	k_proj, o_proj, q_proj, v_proj, down_proj, gate_proj, up_proj
Quantization	4-bit (NF4)
GPU	NVIDIA RTX A6000

Safetensors

Model size

2B params

Tensor type

BF16

Base model

Finetuned

(3)

this model