diff --git "a/evaluation.log" "b/evaluation.log" deleted file mode 100644--- "a/evaluation.log" +++ /dev/null @@ -1,1542 +0,0 @@ -INFO 01-23 12:42:21 [__init__.py:239] Automatically detected platform cuda. -2026-01-23:12:42:31 INFO [__main__:429] Passed `--trust_remote_code`, setting environment variable `HF_DATASETS_TRUST_REMOTE_CODE=true` -2026-01-23:12:42:31 INFO [__main__:446] Selected Tasks: ['arc_challenge', 'arc_easy', 'boolq', 'hellaswag', 'lambada', 'lambada_standard', 'piqa', 'social_iqa', 'wikitext', 'winogrande'] -2026-01-23:12:42:31 INFO [evaluator:202] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234 -2026-01-23:12:42:31 INFO [evaluator:240] Initializing hf model, with arguments: {'pretrained': 'results/hf_ckpts/blockffn_01b_mul1002_withmean_d64_s128_lr1175e3_b64/', 'dtype': - 'bfloat16', 'trust_remote_code': True} -2026-01-23:12:42:31 INFO [models.huggingface:147] Using device 'cuda:0' -2026-01-23:12:42:31 INFO [models.huggingface:535] Model type cannot be determined. Using default model type 'causal' -2026-01-23:12:42:32 INFO [models.huggingface:414] Model parallel was set to False, max memory was not set, and device map was set to {'': 'cuda:0'} -2026-01-23:12:42:42 WARNING [api.task:846] [Task: boolq] metric acc is defined, but aggregation is not. using default aggregation=mean -2026-01-23:12:42:42 WARNING [api.task:858] [Task: boolq] metric acc is defined, but higher_is_better is not. using default higher_is_better=True -/home/test1267/test-6/miniconda3/envs/lmeval/lib/python3.10/site-packages/datasets/load.py:1298: FutureWarning: The repository for social_i_qa contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/social_i_qa -You can avoid this message in future by passing the argument `trust_remote_code=True`. -Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`. - warnings.warn( -2026-01-23:12:43:26 WARNING [api.task:846] [Task: wikitext] metric word_perplexity is defined, but aggregation is not. using default aggregation=weighted_perplexity -2026-01-23:12:43:26 WARNING [api.task:858] [Task: wikitext] metric word_perplexity is defined, but higher_is_better is not. using default higher_is_better=False -2026-01-23:12:43:26 WARNING [api.task:846] [Task: wikitext] metric byte_perplexity is defined, but aggregation is not. using default aggregation=weighted_perplexity -2026-01-23:12:43:26 WARNING [api.task:858] [Task: wikitext] metric byte_perplexity is defined, but higher_is_better is not. using default higher_is_better=False -2026-01-23:12:43:26 WARNING [api.task:846] [Task: wikitext] metric bits_per_byte is defined, but aggregation is not. using default aggregation=bits_per_byte -2026-01-23:12:43:26 WARNING [api.task:858] [Task: wikitext] metric bits_per_byte is defined, but higher_is_better is not. using default higher_is_better=False -2026-01-23:12:43:39 INFO [api.task:434] Building contexts for winogrande on rank 0... - - 0%| | 0/1267 [00:00