Mid-Training - Phase 003: HuggingFaceTB/smoltalk

#1
by mrs83 - opened
ethicalabs.ai org
edited Feb 15

Screenshot_2026-02-15_15-01-00

After the previous unsuccessful attempt to start SFT and DPO prematurely on an underfitted model, I decided to take a different approach. I am now continuing mid-training on a single AMD Ryzen AI Max+ 395 with trl. Due to our budget/compute constraints, we are using QLoRA (4-bit). Unfortunately, the session crashed, but we are now resuming. 🧑‍🚒

ethicalabs.ai org
edited Feb 15

linear decay schedule is likely a reason why this mid-training run is stabler than anything we tried before, we can't wait to see how it performs now on Hellaswag, PIQA and SciQ

shoggot

mrs83 changed discussion title from Mid-Training - Phase 002: HuggingFaceTB/smoltalk to Mid-Training - Phase 003: HuggingFaceTB/smoltalk
ethicalabs.ai org
Tasks Version Filter n-shot Metric Value Stderr
hellaswag 1 none 0 acc 0.2913 ± 0.0045
none 0 acc_norm 0.3182 ± 0.0046
piqa 1 none 0 acc 0.6251 ± 0.0113
none 0 acc_norm 0.6224 ± 0.0113
sciq 1 none 0 acc 0.7280 ± 0.0141
none 0 acc_norm 0.6430 ± 0.0152
winogrande 1 none 0 acc 0.5130 ± 0.0140
ethicalabs.ai org
Tasks Version Filter n-shot Metric Value Stderr
arc_easy 1 none 0 acc 0.471 ± 0.0102
none 0 acc_norm 0.428 ± 0.0102
ethicalabs.ai org

uv run lm_eval --model hf --model_args pretrained=models/Echo-DSRN-Small-Kurtis-EON1-v0.1,trust_remote_code=True,device_map="auto" --tasks openbookqa --output_path ./results_sft_smoltalk_phase2 --batch_size 1 --apply_chat_template --num_fewshot 3

Tasks Version Filter n-shot Metric Value Stderr
openbookqa 1 none 3 acc 0.152 ± 0.0161
none 3 acc_norm 0.324 ± 0.0210

Sign up or log in to comment