Huihui-Qwen3.5-9B-abliterated-Grimoire-SimPO

Testing grimore's SimPO implementation.

Learning rate was too high on this and the resulting model is unusable.

Training Configuration

Parameter	Value
Training Mode	SIMPO
Base Model	`huihui-ai/Huihui-Qwen3.5-9B-abliterated`
Learning Rate	9e-05
Epochs	1
Batch Size	1
Gradient Accumulation	32
Effective Batch Size	32
Max Sequence Length	2048
Optimizer	paged_adamw_8bit
LR Scheduler	cosine
Warmup Ratio	0.05
Weight Decay	0.01
Max Grad Norm	0.25
Seed	42
Beta	0.1
Max Prompt Length	1024
SimPO Gamma	0.5
LoRA Rank (r)	128
LoRA Alpha	64
LoRA Dropout	0.05
Target Modules	up_proj, down_proj, gate_proj, k_proj, q_proj, v_proj, o_proj
Quantization	4-bit (NF4)
GPU	NVIDIA RTX A6000