diff --git "a/evaluation.log" "b/evaluation.log" deleted file mode 100644--- "a/evaluation.log" +++ /dev/null @@ -1,87 +0,0 @@ -INFO 03-30 20:48:58 [__init__.py:239] Automatically detected platform cuda. -2026-03-30:20:49:09 INFO [__main__:429] Passed `--trust_remote_code`, setting environment variable `HF_DATASETS_TRUST_REMOTE_CODE=true` -2026-03-30:20:49:09 INFO [__main__:446] Selected Tasks: ['arc_challenge', 'arc_easy', 'boolq', 'hellaswag', 'lambada', 'lambada_standard', 'piqa', 'social_iqa', 'wikitext', 'winogrande'] -2026-03-30:20:49:09 INFO [evaluator:202] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234 -2026-03-30:20:49:09 INFO [evaluator:240] Initializing hf model, with arguments: {'pretrained': 'results/hf_ckpts/blockffn_02b_mul1002_withmean_d64_s128_lr93e4_b128/', 'dtype': - 'bfloat16', 'trust_remote_code': True} -2026-03-30:20:49:09 WARNING [accelerate.utils.other:512] Detected kernel version 3.10.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher. -2026-03-30:20:49:09 INFO [models.huggingface:147] Using device 'cuda:0' -2026-03-30:20:49:09 INFO [models.huggingface:535] Model type cannot be determined. Using default model type 'causal' -2026-03-30:20:49:10 INFO [models.huggingface:414] Model parallel was set to False, max memory was not set, and device map was set to {'': 'cuda:0'} -2026-03-30:20:49:28 WARNING [api.task:846] [Task: boolq] metric acc is defined, but aggregation is not. using default aggregation=mean -2026-03-30:20:49:28 WARNING [api.task:858] [Task: boolq] metric acc is defined, but higher_is_better is not. using default higher_is_better=True -/home/test1267/test-6/miniconda3/envs/lmeval/lib/python3.10/site-packages/datasets/load.py:1298: FutureWarning: The repository for social_i_qa contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/social_i_qa -You can avoid this message in future by passing the argument `trust_remote_code=True`. -Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`. - warnings.warn( -2026-03-30:20:50:24 WARNING [api.task:846] [Task: wikitext] metric word_perplexity is defined, but aggregation is not. using default aggregation=weighted_perplexity -2026-03-30:20:50:24 WARNING [api.task:858] [Task: wikitext] metric word_perplexity is defined, but higher_is_better is not. using default higher_is_better=False -2026-03-30:20:50:24 WARNING [api.task:846] [Task: wikitext] metric byte_perplexity is defined, but aggregation is not. using default aggregation=weighted_perplexity -2026-03-30:20:50:24 WARNING [api.task:858] [Task: wikitext] metric byte_perplexity is defined, but higher_is_better is not. using default higher_is_better=False -2026-03-30:20:50:24 WARNING [api.task:846] [Task: wikitext] metric bits_per_byte is defined, but aggregation is not. using default aggregation=bits_per_byte -2026-03-30:20:50:24 WARNING [api.task:858] [Task: wikitext] metric bits_per_byte is defined, but higher_is_better is not. using default higher_is_better=False -2026-03-30:20:50:41 INFO [api.task:434] Building contexts for winogrande on rank 0... - 0%| | 0/1267 [00:00