Mid-Training - Phase 006: Smoltalk2 (No Thinking)
#5
by mrs83 - opened
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| arc_easy | 1 | none | 0 | acc | โ | 0.4520 | ยฑ | 0.0102 |
| none | 0 | acc_norm | โ | 0.3889 | ยฑ | 0.0100 | ||
| hellaswag | 1 | none | 0 | acc | โ | 0.2891 | ยฑ | 0.0045 |
| none | 0 | acc_norm | โ | 0.3071 | ยฑ | 0.0046 | ||
| piqa | 1 | none | 0 | acc | โ | 0.6219 | ยฑ | 0.0113 |
| none | 0 | acc_norm | โ | 0.6034 | ยฑ | 0.0114 | ||
| sciq | 1 | none | 0 | acc | โ | 0.7180 | ยฑ | 0.0142 |
| none | 0 | acc_norm | โ | 0.6190 | ยฑ | 0.0154 | ||
| truthfulqa_mc1 | 2 | none | 0 | acc | โ | 0.2729 | ยฑ | 0.0156 |
| truthfulqa_mc2 | 3 | none | 0 | acc | โ | 0.4246 | ยฑ | 0.0154 |
| winogrande | 1 | none | 0 | acc | โ | 0.5091 | ยฑ | 0.0141 |
Last checkpoint (DPO testing) - Phase 6.1 (post DPO test)
ethicalabs@pop-os:~/Workspace/Echo-DSRN$ uv run lm_eval --model hf --model_args pretrained=models/Echo-DSRN-Small-Kurtis-EON1-v0.4-DPO,trust_remote_code=True,device_map="auto" --tasks truthfulqa_mc1,truthfulqa_mc2,hellaswag,arc_easy,winogrande,piqa,sciq --output_path ./results_sft_smoltalk_phase6.1 --batch_size 4
2026-02-24:02:14:11 INFO [__main__:465] Selected Tasks: ['truthfulqa_mc1', 'truthfulqa_mc2', 'hellaswag', 'arc_easy', 'winogrande', 'piqa', 'sciq']
2026-02-24:02:14:11 INFO [evaluator:202] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234
2026-02-24:02:14:11 INFO [evaluator:240] Initializing hf model, with arguments: {'pretrained': 'models/Echo-DSRN-Small-Kurtis-EON1-v0.4-DPO', 'trust_remote_code': True,
'device_map': 'auto'}
2026-02-24:02:14:12 INFO [models.huggingface:158] Using device 'cuda'
2026-02-24:02:14:12 INFO [models.huggingface:545] Model type cannot be determined. Using default model type 'causal'
2026-02-24:02:14:12 INFO [models.huggingface:426] Model parallel was set to False.
2026-02-24:02:14:28 INFO [tasks:695] Selected tasks:
2026-02-24:02:14:28 INFO [tasks:686] Task: sciq (sciq/sciq.yaml)
2026-02-24:02:14:28 INFO [tasks:686] Task: piqa (piqa/piqa.yaml)
2026-02-24:02:14:28 INFO [tasks:686] Task: winogrande (winogrande/default.yaml)
2026-02-24:02:14:28 INFO [tasks:686] Task: arc_easy (arc/arc_easy.yaml)
2026-02-24:02:14:28 INFO [tasks:686] Task: hellaswag (hellaswag/hellaswag.yaml)
2026-02-24:02:14:28 INFO [tasks:686] Task: truthfulqa_mc2 (truthfulqa/truthfulqa_mc2.yaml)
2026-02-24:02:14:28 INFO [tasks:686] Task: truthfulqa_mc1 (truthfulqa/truthfulqa_mc1.yaml)
2026-02-24:02:14:28 INFO [api.task:434] Building contexts for sciq on rank 0...
100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 1000/1000 [00:00<00:00, 2406.38it/s]
2026-02-24:02:14:29 INFO [api.task:434] Building contexts for piqa on rank 0...
100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 1838/1838 [00:00<00:00, 4130.78it/s]
2026-02-24:02:14:29 INFO [api.task:434] Building contexts for winogrande on rank 0...
100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 1267/1267 [00:00<00:00, 304398.17it/s]
2026-02-24:02:14:29 INFO [api.task:434] Building contexts for arc_easy on rank 0...
100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 2376/2376 [00:00<00:00, 4372.78it/s]
2026-02-24:02:14:30 INFO [api.task:434] Building contexts for hellaswag on rank 0...
100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 10042/10042 [00:01<00:00, 7852.27it/s]
2026-02-24:02:14:31 INFO [api.task:434] Building contexts for truthfulqa_mc2 on rank 0...
100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 817/817 [00:00<00:00, 2516.28it/s]
2026-02-24:02:14:32 INFO [api.task:434] Building contexts for truthfulqa_mc1 on rank 0...
100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 817/817 [00:00<00:00, 2577.81it/s]
2026-02-24:02:14:32 INFO [evaluator:574] Running loglikelihood requests
Running loglikelihood requests: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 69875/69875 [42:31<00:00, 27.38it/s]
2026-02-24:02:57:17 INFO [loggers.evaluation_tracker:209] Saving results aggregated
hf (pretrained=models/Echo-DSRN-Small-Kurtis-EON1-v0.4-DPO,trust_remote_code=True,device_map=auto), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 4
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| arc_easy | 1 | none | 0 | acc | โ | 0.4689 | ยฑ | 0.0102 |
| none | 0 | acc_norm | โ | 0.4158 | ยฑ | 0.0101 | ||
| hellaswag | 1 | none | 0 | acc | โ | 0.2915 | ยฑ | 0.0045 |
| none | 0 | acc_norm | โ | 0.3190 | ยฑ | 0.0047 | ||
| piqa | 1 | none | 0 | acc | โ | 0.6306 | ยฑ | 0.0113 |
| none | 0 | acc_norm | โ | 0.6143 | ยฑ | 0.0114 | ||
| sciq | 1 | none | 0 | acc | โ | 0.7520 | ยฑ | 0.0137 |
| none | 0 | acc_norm | โ | 0.6780 | ยฑ | 0.0148 | ||
| truthfulqa_mc1 | 2 | none | 0 | acc | โ | 0.2411 | ยฑ | 0.0150 |
| truthfulqa_mc2 | 3 | none | 0 | acc | โ | 0.4251 | ยฑ | 0.0151 |
| winogrande | 1 | none | 0 | acc | โ | 0.5122 | ยฑ | 0.0140 |
Phase 6.2 - Smoltalk2 (No Think, Mid Training)
ethicalabs@pop-os:~/Workspace/Echo-DSRN$ uv run python -m echo_hf.talk --model_path outputs/phase7_sft/checkpoint-2500/ --chat --temperature 0.2
Using device: cuda
Loading model from outputs/phase7_sft/checkpoint-2500/...
Detected LoRA adapter at outputs/phase7_sft/checkpoint-2500/
Loading base model from models/Echo-DSRN-Small-Kurtis-EON1-v0.4-DPO/...
The module name (originally ) is not a valid Python identifier. Please rename the original module to avoid import issues.
Loading adapter...
Loading tokenizer from models/Echo-DSRN-Small-Kurtis-EON1-v0.4-DPO/...
Starting interactive chat with outputs/phase7_sft/checkpoint-2500/...
Type 'exit' or 'quit' to stop.
Type 'reset' to clear conversation history.
You: Who are you?
Echo: /home/ethicalabs/Workspace/Echo-DSRN/echo_hf/modeling_echo.py:338: UserWarning: Flash Efficient attention on Current AMD GPU is still experimental. Enable it with TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1. (Triggered internally at /pytorch/aten/src/ATen/native/transformers/hip/sdp_utils.cpp:323.)
y = attn_fn(q, k, v, is_causal=is_causal)
/home/ethicalabs/Workspace/Echo-DSRN/echo_hf/modeling_echo.py:338: UserWarning: Mem Efficient attention on Current AMD GPU is still experimental. Enable it with TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1. (Triggered internally at /pytorch/aten/src/ATen/native/transformers/hip/sdp_utils.cpp:383.)
y = attn_fn(q, k, v, is_causal=is_causal)
I am Kurtis-EON1, an AI developed by ethicalabs.ai. I'm a creative assistant designed to help people with artistic and technical needs. We craft high-quality art that reflects the unique qualities of each person we serve. Our goal is to make everyday objects feel more authentic and engaging than ever before.
We're here to support artists in their journey towards self-expression and transformation. What do you think about creating art that feels like it belongs on your shelves or at home?
Phase 6.3 - Smoltalk2 (No Think, Mid Training)
| Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
|---|---|---|---|---|---|---|---|---|
| arc_easy | 1 | none | 0 | acc | โ | 0.4764 | ยฑ | 0.0102 |
| none | 0 | acc_norm | โ | 0.4306 | ยฑ | 0.0102 | ||
| hellaswag | 1 | none | 0 | acc | โ | 0.2914 | ยฑ | 0.0045 |
| none | 0 | acc_norm | โ | 0.3164 | ยฑ | 0.0046 | ||
| piqa | 1 | none | 0 | acc | โ | 0.6289 | ยฑ | 0.0113 |
| none | 0 | acc_norm | โ | 0.6202 | ยฑ | 0.0113 | ||
| sciq | 1 | none | 0 | acc | โ | 0.7620 | ยฑ | 0.0135 |
| none | 0 | acc_norm | โ | 0.6680 | ยฑ | 0.0149 | ||
| truthfulqa_mc1 | 2 | none | 0 | acc | โ | 0.2387 | ยฑ | 0.0149 |
| truthfulqa_mc2 | 3 | none | 0 | acc | โ | 0.4282 | ยฑ | 0.0152 |
| winogrande | 1 | none | 0 | acc | โ | 0.5067 | ยฑ | 0.0141 |

