Mid-Training - Phase 003: HuggingFaceTB/smoltalk

by mrs83 - opened Feb 15

ethicalabs.ai org Feb 15

•

After the previous unsuccessful attempt to start SFT and DPO prematurely on an underfitted model, I decided to take a different approach. I am now continuing mid-training on a single AMD Ryzen AI Max+ 395 with trl. Due to our budget/compute constraints, we are using QLoRA (4-bit). Unfortunately, the session crashed, but we are now resuming. 🧑‍🚒

mrs83

ethicalabs.ai org Feb 15

•

edited Feb 15

linear decay schedule is likely a reason why this mid-training run is stabler than anything we tried before, we can't wait to see how it performs now on Hellaswag, PIQA and SciQ

mrs83 changed discussion title from Mid-Training - Phase 002: HuggingFaceTB/smoltalk to Mid-Training - Phase 003: HuggingFaceTB/smoltalk Feb 15

mrs83

ethicalabs.ai org Feb 15

Tasks	Version	Filter	Metric		Value		Stderr
hellaswag	1	none	acc	↑	0.2913	±	0.0045
		none	acc_norm	↑	0.3182	±	0.0046
piqa	1	none	acc	↑	0.6251	±	0.0113
		none	acc_norm	↑	0.6224	±	0.0113
sciq	1	none	acc	↑	0.7280	±	0.0141
		none	acc_norm	↑	0.6430	±	0.0152
winogrande	1	none	acc	↑	0.5130	±	0.0140

mrs83

ethicalabs.ai org Feb 15

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
arc_easy	1	none	0	acc	↑	0.471	±	0.0102
		none	0	acc_norm	↑	0.428	±	0.0102

mrs83

ethicalabs.ai org Feb 16

uv run lm_eval --model hf --model_args pretrained=models/Echo-DSRN-Small-Kurtis-EON1-v0.1,trust_remote_code=True,device_map="auto" --tasks openbookqa --output_path ./results_sft_smoltalk_phase2 --batch_size 1 --apply_chat_template --num_fewshot 3

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
openbookqa	1	none	3	acc	↑	0.152	±	0.0161
		none	3	acc_norm	↑	0.324	±	0.0210

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment