Text Generation
Transformers
Safetensors
English
llama
supra
chimera
50m
small
open
open-source
cpu
tiny
slm
text-generation-inference
Instructions to use SupraLabs/Supra-50M-Base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use SupraLabs/Supra-50M-Base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="SupraLabs/Supra-50M-Base")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("SupraLabs/Supra-50M-Base") model = AutoModelForCausalLM.from_pretrained("SupraLabs/Supra-50M-Base") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use SupraLabs/Supra-50M-Base with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "SupraLabs/Supra-50M-Base" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SupraLabs/Supra-50M-Base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/SupraLabs/Supra-50M-Base
- SGLang
How to use SupraLabs/Supra-50M-Base with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "SupraLabs/Supra-50M-Base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SupraLabs/Supra-50M-Base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "SupraLabs/Supra-50M-Base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "SupraLabs/Supra-50M-Base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use SupraLabs/Supra-50M-Base with Docker Model Runner:
docker model run hf.co/SupraLabs/Supra-50M-Base
| (venv) leo@leo-mint:~/smallm/Supra-50M$ python3 train.py | |
| [*] Loading libraries... | |
| [*] Loading tokenizer... | |
| [*] Preparing 20,000,000,000 tokens (streaming, memmap-backed)... | |
| [=] Reusing existing token file: tokens.bin | |
| [+] Dataset ready: 19,531,250 chunks of 1024 tokens | |
| [*] Setting up model... | |
| [*] Model parameters: 51,786,240 | |
| [*] Defining training arguments... | |
| [transformers] warmup_ratio is deprecated and will be removed in v5.2. Use `warmup_steps` instead. | |
| [*] Starting training... | |
| 0%| | 0/152588 [00:00<?, ?it/s]W0517 14:59:00.167000 27625 torch/_inductor/utils.py:1731] [0/0] Not enough SMs to use max_autotune_gemm mode | |
| {'loss': '9.822', 'grad_norm': '1.781', 'learning_rate': '1.946e-05', 'epoch': '0.0006554'} | |
| {'loss': '8.539', 'grad_norm': '1.164', 'learning_rate': '3.912e-05', 'epoch': '0.001311'} | |
| {'loss': '7.393', 'grad_norm': '1.282', 'learning_rate': '5.878e-05', 'epoch': '0.001966'} | |
| {'loss': '6.806', 'grad_norm': '2.183', 'learning_rate': '7.844e-05', 'epoch': '0.002621'} | |
| {'loss': '6.413', 'grad_norm': '1.753', 'learning_rate': '9.81e-05', 'epoch': '0.003277'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.54it/s] | |
| {'loss': '6.131', 'grad_norm': '1.552', 'learning_rate': '0.0001178', 'epoch': '0.003932'} | |
| {'loss': '5.908', 'grad_norm': '1.505', 'learning_rate': '0.0001374', 'epoch': '0.004588'} | |
| {'loss': '5.71', 'grad_norm': '1.384', 'learning_rate': '0.0001571', 'epoch': '0.005243'} | |
| {'loss': '5.53', 'grad_norm': '1.439', 'learning_rate': '0.0001767', 'epoch': '0.005898'} | |
| {'loss': '5.372', 'grad_norm': '1.095', 'learning_rate': '0.0001964', 'epoch': '0.006554'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.17it/s] | |
| {'loss': '5.233', 'grad_norm': '1.361', 'learning_rate': '0.0002161', 'epoch': '0.007209'} | |
| {'loss': '5.104', 'grad_norm': '1.061', 'learning_rate': '0.0002357', 'epoch': '0.007864'} | |
| {'loss': '4.973', 'grad_norm': '1.352', 'learning_rate': '0.0002554', 'epoch': '0.00852'} | |
| {'loss': '4.843', 'grad_norm': '1.013', 'learning_rate': '0.000275', 'epoch': '0.009175'} | |
| {'loss': '4.711', 'grad_norm': '0.9927', 'learning_rate': '0.0002947', 'epoch': '0.00983'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5.91it/s] | |
| {'loss': '4.585', 'grad_norm': '0.896', 'learning_rate': '0.0003144', 'epoch': '0.01049'} | |
| {'loss': '4.479', 'grad_norm': '0.7985', 'learning_rate': '0.000334', 'epoch': '0.01114'} | |
| {'loss': '4.386', 'grad_norm': '0.7477', 'learning_rate': '0.0003537', 'epoch': '0.0118'} | |
| {'loss': '4.318', 'grad_norm': '0.7296', 'learning_rate': '0.0003733', 'epoch': '0.01245'} | |
| {'loss': '4.255', 'grad_norm': '0.6791', 'learning_rate': '0.000393', 'epoch': '0.01311'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.33it/s] | |
| {'loss': '4.194', 'grad_norm': '0.7175', 'learning_rate': '0.0004126', 'epoch': '0.01376'} | |
| {'loss': '4.151', 'grad_norm': '0.6317', 'learning_rate': '0.0004323', 'epoch': '0.01442'} | |
| {'loss': '4.106', 'grad_norm': '0.5953', 'learning_rate': '0.000452', 'epoch': '0.01507'} | |
| {'loss': '4.069', 'grad_norm': '0.4885', 'learning_rate': '0.0004716', 'epoch': '0.01573'} | |
| {'loss': '4.041', 'grad_norm': '0.5002', 'learning_rate': '0.0004913', 'epoch': '0.01638'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.09it/s] | |
| {'loss': '4.009', 'grad_norm': '0.5133', 'learning_rate': '0.0005109', 'epoch': '0.01704'} | |
| {'loss': '3.978', 'grad_norm': '0.5448', 'learning_rate': '0.0005306', 'epoch': '0.01769'} | |
| {'loss': '3.957', 'grad_norm': '0.5136', 'learning_rate': '0.0005503', 'epoch': '0.01835'} | |
| {'loss': '3.928', 'grad_norm': '0.4771', 'learning_rate': '0.0005699', 'epoch': '0.01901'} | |
| {'loss': '3.911', 'grad_norm': '0.4366', 'learning_rate': '0.0005896', 'epoch': '0.01966'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.08it/s] | |
| {'loss': '3.899', 'grad_norm': '0.4166', 'learning_rate': '0.0006', 'epoch': '0.02032'} | |
| {'loss': '3.876', 'grad_norm': '0.3686', 'learning_rate': '0.0006', 'epoch': '0.02097'} | |
| {'loss': '3.849', 'grad_norm': '0.4205', 'learning_rate': '0.0006', 'epoch': '0.02163'} | |
| {'loss': '3.831', 'grad_norm': '0.4025', 'learning_rate': '0.0006', 'epoch': '0.02228'} | |
| {'loss': '3.815', 'grad_norm': '0.3824', 'learning_rate': '0.0006', 'epoch': '0.02294'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.19it/s] | |
| {'loss': '3.802', 'grad_norm': '0.3756', 'learning_rate': '0.0006', 'epoch': '0.02359'} | |
| {'loss': '3.785', 'grad_norm': '0.3782', 'learning_rate': '0.0006', 'epoch': '0.02425'} | |
| {'loss': '3.773', 'grad_norm': '0.3885', 'learning_rate': '0.0006', 'epoch': '0.0249'} | |
| {'loss': '3.758', 'grad_norm': '0.3821', 'learning_rate': '0.0006', 'epoch': '0.02556'} | |
| {'loss': '3.748', 'grad_norm': '0.3729', 'learning_rate': '0.0005999', 'epoch': '0.02621'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.22it/s] | |
| {'loss': '3.731', 'grad_norm': '0.3965', 'learning_rate': '0.0005999', 'epoch': '0.02687'} | |
| {'loss': '3.722', 'grad_norm': '0.389', 'learning_rate': '0.0005999', 'epoch': '0.02753'} | |
| {'loss': '3.714', 'grad_norm': '0.3952', 'learning_rate': '0.0005999', 'epoch': '0.02818'} | |
| {'loss': '3.703', 'grad_norm': '0.3691', 'learning_rate': '0.0005999', 'epoch': '0.02884'} | |
| {'loss': '3.69', 'grad_norm': '0.3722', 'learning_rate': '0.0005999', 'epoch': '0.02949'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.92it/s] | |
| {'loss': '3.682', 'grad_norm': '0.3461', 'learning_rate': '0.0005998', 'epoch': '0.03015'} | |
| {'loss': '3.671', 'grad_norm': '0.38', 'learning_rate': '0.0005998', 'epoch': '0.0308'} | |
| {'loss': '3.662', 'grad_norm': '0.3693', 'learning_rate': '0.0005998', 'epoch': '0.03146'} | |
| {'loss': '3.655', 'grad_norm': '0.3818', 'learning_rate': '0.0005998', 'epoch': '0.03211'} | |
| {'loss': '3.65', 'grad_norm': '0.3394', 'learning_rate': '0.0005997', 'epoch': '0.03277'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.90it/s] | |
| {'loss': '3.645', 'grad_norm': '0.3594', 'learning_rate': '0.0005997', 'epoch': '0.03342'} | |
| {'loss': '3.632', 'grad_norm': '0.3436', 'learning_rate': '0.0005997', 'epoch': '0.03408'} | |
| {'loss': '3.629', 'grad_norm': '0.3674', 'learning_rate': '0.0005997', 'epoch': '0.03473'} | |
| {'loss': '3.616', 'grad_norm': '0.3732', 'learning_rate': '0.0005996', 'epoch': '0.03539'} | |
| {'loss': '3.624', 'grad_norm': '0.4021', 'learning_rate': '0.0005996', 'epoch': '0.03604'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.32it/s] | |
| {'loss': '3.606', 'grad_norm': '0.3589', 'learning_rate': '0.0005996', 'epoch': '0.0367'} | |
| {'loss': '3.607', 'grad_norm': '0.3607', 'learning_rate': '0.0005995', 'epoch': '0.03736'} | |
| {'loss': '3.593', 'grad_norm': '0.3369', 'learning_rate': '0.0005995', 'epoch': '0.03801'} | |
| {'loss': '3.593', 'grad_norm': '0.3583', 'learning_rate': '0.0005995', 'epoch': '0.03867'} | |
| {'loss': '3.587', 'grad_norm': '0.347', 'learning_rate': '0.0005994', 'epoch': '0.03932'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.00it/s] | |
| {'loss': '3.579', 'grad_norm': '0.3477', 'learning_rate': '0.0005994', 'epoch': '0.03998'} | |
| {'loss': '3.574', 'grad_norm': '0.3275', 'learning_rate': '0.0005993', 'epoch': '0.04063'} | |
| {'loss': '3.578', 'grad_norm': '0.3631', 'learning_rate': '0.0005993', 'epoch': '0.04129'} | |
| {'loss': '3.567', 'grad_norm': '0.3617', 'learning_rate': '0.0005993', 'epoch': '0.04194'} | |
| {'loss': '3.566', 'grad_norm': '0.3838', 'learning_rate': '0.0005992', 'epoch': '0.0426'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 2.95it/s] | |
| {'loss': '3.555', 'grad_norm': '0.3746', 'learning_rate': '0.0005992', 'epoch': '0.04325'} | |
| {'loss': '3.555', 'grad_norm': '0.3321', 'learning_rate': '0.0005991', 'epoch': '0.04391'} | |
| {'loss': '3.547', 'grad_norm': '0.3564', 'learning_rate': '0.0005991', 'epoch': '0.04456'} | |
| {'loss': '3.554', 'grad_norm': '0.3793', 'learning_rate': '0.000599', 'epoch': '0.04522'} | |
| {'loss': '3.547', 'grad_norm': '0.3557', 'learning_rate': '0.000599', 'epoch': '0.04588'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.46it/s] | |
| {'loss': '3.539', 'grad_norm': '0.3665', 'learning_rate': '0.0005989', 'epoch': '0.04653'} | |
| {'loss': '3.54', 'grad_norm': '0.3462', 'learning_rate': '0.0005989', 'epoch': '0.04719'} | |
| {'loss': '3.535', 'grad_norm': '0.3403', 'learning_rate': '0.0005988', 'epoch': '0.04784'} | |
| {'loss': '3.531', 'grad_norm': '0.3762', 'learning_rate': '0.0005987', 'epoch': '0.0485'} | |
| {'loss': '3.528', 'grad_norm': '0.3384', 'learning_rate': '0.0005987', 'epoch': '0.04915'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 2.31it/s] | |
| {'loss': '3.523', 'grad_norm': '0.3551', 'learning_rate': '0.0005986', 'epoch': '0.04981'} | |
| {'loss': '3.523', 'grad_norm': '0.3496', 'learning_rate': '0.0005986', 'epoch': '0.05046'} | |
| {'loss': '3.52', 'grad_norm': '0.3509', 'learning_rate': '0.0005985', 'epoch': '0.05112'} | |
| {'loss': '3.508', 'grad_norm': '0.3552', 'learning_rate': '0.0005984', 'epoch': '0.05177'} | |
| {'loss': '3.506', 'grad_norm': '0.4069', 'learning_rate': '0.0005984', 'epoch': '0.05243'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5.85it/s] | |
| {'loss': '3.509', 'grad_norm': '0.3515', 'learning_rate': '0.0005983', 'epoch': '0.05308'} | |
| {'loss': '3.505', 'grad_norm': '0.3485', 'learning_rate': '0.0005982', 'epoch': '0.05374'} | |
| {'loss': '3.501', 'grad_norm': '0.37', 'learning_rate': '0.0005982', 'epoch': '0.05439'} | |
| {'loss': '3.499', 'grad_norm': '0.3799', 'learning_rate': '0.0005981', 'epoch': '0.05505'} | |
| {'loss': '3.503', 'grad_norm': '0.3496', 'learning_rate': '0.000598', 'epoch': '0.05571'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.50it/s] | |
| {'loss': '3.494', 'grad_norm': '0.3906', 'learning_rate': '0.000598', 'epoch': '0.05636'} | |
| {'loss': '3.49', 'grad_norm': '0.3612', 'learning_rate': '0.0005979', 'epoch': '0.05702'} | |
| {'loss': '3.487', 'grad_norm': '0.3826', 'learning_rate': '0.0005978', 'epoch': '0.05767'} | |
| {'loss': '3.49', 'grad_norm': '0.3617', 'learning_rate': '0.0005977', 'epoch': '0.05833'} | |
| {'loss': '3.484', 'grad_norm': '0.3589', 'learning_rate': '0.0005977', 'epoch': '0.05898'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.72it/s] | |
| {'loss': '3.481', 'grad_norm': '0.3567', 'learning_rate': '0.0005976', 'epoch': '0.05964'} | |
| {'loss': '3.479', 'grad_norm': '0.358', 'learning_rate': '0.0005975', 'epoch': '0.06029'} | |
| {'loss': '3.475', 'grad_norm': '0.3492', 'learning_rate': '0.0005974', 'epoch': '0.06095'} | |
| {'loss': '3.476', 'grad_norm': '0.369', 'learning_rate': '0.0005973', 'epoch': '0.0616'} | |
| {'loss': '3.471', 'grad_norm': '0.42', 'learning_rate': '0.0005973', 'epoch': '0.06226'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.84it/s] | |
| {'loss': '3.467', 'grad_norm': '0.3995', 'learning_rate': '0.0005972', 'epoch': '0.06291'} | |
| {'loss': '3.469', 'grad_norm': '0.3499', 'learning_rate': '0.0005971', 'epoch': '0.06357'} | |
| {'loss': '3.466', 'grad_norm': '0.3647', 'learning_rate': '0.000597', 'epoch': '0.06423'} | |
| {'loss': '3.46', 'grad_norm': '0.3487', 'learning_rate': '0.0005969', 'epoch': '0.06488'} | |
| {'loss': '3.461', 'grad_norm': '0.3431', 'learning_rate': '0.0005968', 'epoch': '0.06554'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.54it/s] | |
| {'loss': '3.461', 'grad_norm': '0.3862', 'learning_rate': '0.0005967', 'epoch': '0.06619'} | |
| {'loss': '3.461', 'grad_norm': '0.373', 'learning_rate': '0.0005966', 'epoch': '0.06685'} | |
| {'loss': '3.462', 'grad_norm': '0.367', 'learning_rate': '0.0005965', 'epoch': '0.0675'} | |
| {'loss': '3.457', 'grad_norm': '0.3643', 'learning_rate': '0.0005964', 'epoch': '0.06816'} | |
| {'loss': '3.455', 'grad_norm': '0.3512', 'learning_rate': '0.0005963', 'epoch': '0.06881'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.32it/s] | |
| {'loss': '3.455', 'grad_norm': '0.3911', 'learning_rate': '0.0005962', 'epoch': '0.06947'} | |
| {'loss': '3.446', 'grad_norm': '0.349', 'learning_rate': '0.0005961', 'epoch': '0.07012'} | |
| {'loss': '3.45', 'grad_norm': '0.3599', 'learning_rate': '0.000596', 'epoch': '0.07078'} | |
| {'loss': '3.439', 'grad_norm': '0.3614', 'learning_rate': '0.0005959', 'epoch': '0.07143'} | |
| {'loss': '3.443', 'grad_norm': '0.3775', 'learning_rate': '0.0005958', 'epoch': '0.07209'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.54it/s] | |
| {'loss': '3.448', 'grad_norm': '0.4077', 'learning_rate': '0.0005957', 'epoch': '0.07274'} | |
| {'loss': '3.439', 'grad_norm': '0.384', 'learning_rate': '0.0005956', 'epoch': '0.0734'} | |
| {'loss': '3.442', 'grad_norm': '0.3768', 'learning_rate': '0.0005955', 'epoch': '0.07406'} | |
| {'loss': '3.435', 'grad_norm': '0.3531', 'learning_rate': '0.0005954', 'epoch': '0.07471'} | |
| {'loss': '3.438', 'grad_norm': '0.365', 'learning_rate': '0.0005953', 'epoch': '0.07537'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.74it/s] | |
| {'loss': '3.441', 'grad_norm': '0.3533', 'learning_rate': '0.0005952', 'epoch': '0.07602'} | |
| {'loss': '3.441', 'grad_norm': '0.3584', 'learning_rate': '0.0005951', 'epoch': '0.07668'} | |
| {'loss': '3.434', 'grad_norm': '0.4161', 'learning_rate': '0.0005949', 'epoch': '0.07733'} | |
| {'loss': '3.434', 'grad_norm': '0.3601', 'learning_rate': '0.0005948', 'epoch': '0.07799'} | |
| {'loss': '3.432', 'grad_norm': '0.3707', 'learning_rate': '0.0005947', 'epoch': '0.07864'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.68it/s] | |
| {'loss': '3.422', 'grad_norm': '0.352', 'learning_rate': '0.0005946', 'epoch': '0.0793'} | |
| {'loss': '3.428', 'grad_norm': '0.3649', 'learning_rate': '0.0005945', 'epoch': '0.07995'} | |
| {'loss': '3.425', 'grad_norm': '0.3697', 'learning_rate': '0.0005944', 'epoch': '0.08061'} | |
| {'loss': '3.425', 'grad_norm': '0.3875', 'learning_rate': '0.0005942', 'epoch': '0.08126'} | |
| {'loss': '3.425', 'grad_norm': '0.3726', 'learning_rate': '0.0005941', 'epoch': '0.08192'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.15it/s] | |
| {'loss': '3.423', 'grad_norm': '0.352', 'learning_rate': '0.000594', 'epoch': '0.08258'} | |
| {'loss': '3.421', 'grad_norm': '0.3771', 'learning_rate': '0.0005939', 'epoch': '0.08323'} | |
| {'loss': '3.423', 'grad_norm': '0.4241', 'learning_rate': '0.0005937', 'epoch': '0.08389'} | |
| {'loss': '3.421', 'grad_norm': '0.361', 'learning_rate': '0.0005936', 'epoch': '0.08454'} | |
| {'loss': '3.414', 'grad_norm': '0.3651', 'learning_rate': '0.0005935', 'epoch': '0.0852'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5.09it/s] | |
| {'loss': '3.419', 'grad_norm': '0.3694', 'learning_rate': '0.0005933', 'epoch': '0.08585'} | |
| {'loss': '3.417', 'grad_norm': '0.37', 'learning_rate': '0.0005932', 'epoch': '0.08651'} | |
| {'loss': '3.414', 'grad_norm': '0.4199', 'learning_rate': '0.0005931', 'epoch': '0.08716'} | |
| {'loss': '3.415', 'grad_norm': '0.3884', 'learning_rate': '0.0005929', 'epoch': '0.08782'} | |
| {'loss': '3.417', 'grad_norm': '0.3871', 'learning_rate': '0.0005928', 'epoch': '0.08847'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5.25it/s] | |
| {'loss': '3.414', 'grad_norm': '0.3685', 'learning_rate': '0.0005927', 'epoch': '0.08913'} | |
| {'loss': '3.411', 'grad_norm': '0.3827', 'learning_rate': '0.0005925', 'epoch': '0.08978'} | |
| {'loss': '3.407', 'grad_norm': '0.3573', 'learning_rate': '0.0005924', 'epoch': '0.09044'} | |
| {'loss': '3.405', 'grad_norm': '0.3688', 'learning_rate': '0.0005922', 'epoch': '0.09109'} | |
| {'loss': '3.405', 'grad_norm': '0.3638', 'learning_rate': '0.0005921', 'epoch': '0.09175'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.93it/s] | |
| {'loss': '3.402', 'grad_norm': '0.3493', 'learning_rate': '0.000592', 'epoch': '0.09241'} | |
| {'loss': '3.397', 'grad_norm': '0.3694', 'learning_rate': '0.0005918', 'epoch': '0.09306'} | |
| {'loss': '3.4', 'grad_norm': '0.3925', 'learning_rate': '0.0005917', 'epoch': '0.09372'} | |
| {'loss': '3.404', 'grad_norm': '0.3872', 'learning_rate': '0.0005915', 'epoch': '0.09437'} | |
| {'loss': '3.397', 'grad_norm': '0.3646', 'learning_rate': '0.0005914', 'epoch': '0.09503'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.40it/s] | |
| {'loss': '3.399', 'grad_norm': '0.3847', 'learning_rate': '0.0005912', 'epoch': '0.09568'} | |
| {'loss': '3.401', 'grad_norm': '0.3837', 'learning_rate': '0.0005911', 'epoch': '0.09634'} | |
| {'loss': '3.397', 'grad_norm': '0.3586', 'learning_rate': '0.0005909', 'epoch': '0.09699'} | |
| {'loss': '3.393', 'grad_norm': '0.4064', 'learning_rate': '0.0005908', 'epoch': '0.09765'} | |
| {'loss': '3.395', 'grad_norm': '0.376', 'learning_rate': '0.0005906', 'epoch': '0.0983'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.80it/s] | |
| {'loss': '3.393', 'grad_norm': '0.4256', 'learning_rate': '0.0005904', 'epoch': '0.09896'} | |
| {'loss': '3.401', 'grad_norm': '0.4047', 'learning_rate': '0.0005903', 'epoch': '0.09961'} | |
| {'loss': '3.394', 'grad_norm': '0.3741', 'learning_rate': '0.0005901', 'epoch': '0.1003'} | |
| {'loss': '3.392', 'grad_norm': '0.3817', 'learning_rate': '0.00059', 'epoch': '0.1009'} | |
| {'loss': '3.387', 'grad_norm': '0.4296', 'learning_rate': '0.0005898', 'epoch': '0.1016'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 2.82it/s] | |
| {'loss': '3.389', 'grad_norm': '0.3564', 'learning_rate': '0.0005896', 'epoch': '0.1022'} | |
| {'loss': '3.39', 'grad_norm': '0.4093', 'learning_rate': '0.0005895', 'epoch': '0.1029'} | |
| {'loss': '3.389', 'grad_norm': '0.3907', 'learning_rate': '0.0005893', 'epoch': '0.1035'} | |
| {'loss': '3.386', 'grad_norm': '0.416', 'learning_rate': '0.0005891', 'epoch': '0.1042'} | |
| {'loss': '3.385', 'grad_norm': '0.3946', 'learning_rate': '0.000589', 'epoch': '0.1049'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 2.75it/s] | |
| {'loss': '3.387', 'grad_norm': '0.3888', 'learning_rate': '0.0005888', 'epoch': '0.1055'} | |
| {'loss': '3.384', 'grad_norm': '0.3953', 'learning_rate': '0.0005886', 'epoch': '0.1062'} | |
| {'loss': '3.385', 'grad_norm': '0.4033', 'learning_rate': '0.0005885', 'epoch': '0.1068'} | |
| {'loss': '3.388', 'grad_norm': '0.4087', 'learning_rate': '0.0005883', 'epoch': '0.1075'} | |
| {'loss': '3.385', 'grad_norm': '0.3681', 'learning_rate': '0.0005881', 'epoch': '0.1081'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.76it/s] | |
| {'loss': '3.381', 'grad_norm': '0.4152', 'learning_rate': '0.0005879', 'epoch': '0.1088'} | |
| {'loss': '3.38', 'grad_norm': '0.3973', 'learning_rate': '0.0005878', 'epoch': '0.1094'} | |
| {'loss': '3.38', 'grad_norm': '0.3795', 'learning_rate': '0.0005876', 'epoch': '0.1101'} | |
| {'loss': '3.384', 'grad_norm': '0.4048', 'learning_rate': '0.0005874', 'epoch': '0.1108'} | |
| {'loss': '3.384', 'grad_norm': '0.3852', 'learning_rate': '0.0005872', 'epoch': '0.1114'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.67it/s] | |
| {'loss': '3.38', 'grad_norm': '0.3918', 'learning_rate': '0.000587', 'epoch': '0.1121'} | |
| {'loss': '3.378', 'grad_norm': '0.4043', 'learning_rate': '0.0005868', 'epoch': '0.1127'} | |
| {'loss': '3.374', 'grad_norm': '0.368', 'learning_rate': '0.0005867', 'epoch': '0.1134'} | |
| {'loss': '3.378', 'grad_norm': '0.3844', 'learning_rate': '0.0005865', 'epoch': '0.114'} | |
| {'loss': '3.371', 'grad_norm': '0.3853', 'learning_rate': '0.0005863', 'epoch': '0.1147'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.40it/s] | |
| {'loss': '3.371', 'grad_norm': '0.3587', 'learning_rate': '0.0005861', 'epoch': '0.1153'} | |
| {'loss': '3.374', 'grad_norm': '0.3656', 'learning_rate': '0.0005859', 'epoch': '0.116'} | |
| {'loss': '3.372', 'grad_norm': '0.3428', 'learning_rate': '0.0005857', 'epoch': '0.1167'} | |
| {'loss': '3.372', 'grad_norm': '0.3704', 'learning_rate': '0.0005855', 'epoch': '0.1173'} | |
| {'loss': '3.37', 'grad_norm': '0.3918', 'learning_rate': '0.0005853', 'epoch': '0.118'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.72it/s] | |
| {'loss': '3.377', 'grad_norm': '0.3739', 'learning_rate': '0.0005851', 'epoch': '0.1186'} | |
| {'loss': '3.367', 'grad_norm': '0.3773', 'learning_rate': '0.0005849', 'epoch': '0.1193'} | |
| {'loss': '3.369', 'grad_norm': '0.3885', 'learning_rate': '0.0005847', 'epoch': '0.1199'} | |
| {'loss': '3.363', 'grad_norm': '0.3729', 'learning_rate': '0.0005845', 'epoch': '0.1206'} | |
| {'loss': '3.37', 'grad_norm': '0.375', 'learning_rate': '0.0005843', 'epoch': '0.1212'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.35it/s] | |
| {'loss': '3.368', 'grad_norm': '0.3634', 'learning_rate': '0.0005841', 'epoch': '0.1219'} | |
| {'loss': '3.368', 'grad_norm': '0.4189', 'learning_rate': '0.0005839', 'epoch': '0.1226'} | |
| {'loss': '3.368', 'grad_norm': '0.3579', 'learning_rate': '0.0005837', 'epoch': '0.1232'} | |
| {'loss': '3.366', 'grad_norm': '0.3531', 'learning_rate': '0.0005835', 'epoch': '0.1239'} | |
| {'loss': '3.366', 'grad_norm': '0.3624', 'learning_rate': '0.0005833', 'epoch': '0.1245'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.83it/s] | |
| {'loss': '3.358', 'grad_norm': '0.3907', 'learning_rate': '0.0005831', 'epoch': '0.1252'} | |
| {'loss': '3.367', 'grad_norm': '0.3936', 'learning_rate': '0.0005829', 'epoch': '0.1258'} | |
| {'loss': '3.364', 'grad_norm': '0.3841', 'learning_rate': '0.0005827', 'epoch': '0.1265'} | |
| {'loss': '3.361', 'grad_norm': '0.3735', 'learning_rate': '0.0005825', 'epoch': '0.1271'} | |
| {'loss': '3.368', 'grad_norm': '0.3932', 'learning_rate': '0.0005823', 'epoch': '0.1278'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.10it/s] | |
| {'loss': '3.351', 'grad_norm': '0.3737', 'learning_rate': '0.0005821', 'epoch': '0.1285'} | |
| {'loss': '3.358', 'grad_norm': '0.3517', 'learning_rate': '0.0005818', 'epoch': '0.1291'} | |
| {'loss': '3.36', 'grad_norm': '0.3841', 'learning_rate': '0.0005816', 'epoch': '0.1298'} | |
| {'loss': '3.363', 'grad_norm': '0.3739', 'learning_rate': '0.0005814', 'epoch': '0.1304'} | |
| {'loss': '3.354', 'grad_norm': '0.4001', 'learning_rate': '0.0005812', 'epoch': '0.1311'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.63it/s] | |
| {'loss': '3.36', 'grad_norm': '0.364', 'learning_rate': '0.000581', 'epoch': '0.1317'} | |
| {'loss': '3.355', 'grad_norm': '0.3792', 'learning_rate': '0.0005807', 'epoch': '0.1324'} | |
| {'loss': '3.363', 'grad_norm': '0.3697', 'learning_rate': '0.0005805', 'epoch': '0.133'} | |
| {'loss': '3.365', 'grad_norm': '0.408', 'learning_rate': '0.0005803', 'epoch': '0.1337'} | |
| {'loss': '3.359', 'grad_norm': '0.367', 'learning_rate': '0.0005801', 'epoch': '0.1343'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.20it/s] | |
| {'loss': '3.36', 'grad_norm': '0.4236', 'learning_rate': '0.0005798', 'epoch': '0.135'} | |
| {'loss': '3.355', 'grad_norm': '0.4047', 'learning_rate': '0.0005796', 'epoch': '0.1357'} | |
| {'loss': '3.354', 'grad_norm': '0.375', 'learning_rate': '0.0005794', 'epoch': '0.1363'} | |
| {'loss': '3.356', 'grad_norm': '0.4076', 'learning_rate': '0.0005792', 'epoch': '0.137'} | |
| {'loss': '3.356', 'grad_norm': '0.4216', 'learning_rate': '0.0005789', 'epoch': '0.1376'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.31it/s] | |
| {'loss': '3.353', 'grad_norm': '0.4283', 'learning_rate': '0.0005787', 'epoch': '0.1383'} | |
| {'loss': '3.352', 'grad_norm': '0.3582', 'learning_rate': '0.0005785', 'epoch': '0.1389'} | |
| {'loss': '3.346', 'grad_norm': '0.4229', 'learning_rate': '0.0005782', 'epoch': '0.1396'} | |
| {'loss': '3.356', 'grad_norm': '0.3956', 'learning_rate': '0.000578', 'epoch': '0.1402'} | |
| {'loss': '3.346', 'grad_norm': '0.3688', 'learning_rate': '0.0005778', 'epoch': '0.1409'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.34it/s] | |
| {'loss': '3.351', 'grad_norm': '0.4167', 'learning_rate': '0.0005775', 'epoch': '0.1416'} | |
| {'loss': '3.355', 'grad_norm': '0.3978', 'learning_rate': '0.0005773', 'epoch': '0.1422'} | |
| {'loss': '3.351', 'grad_norm': '0.4078', 'learning_rate': '0.000577', 'epoch': '0.1429'} | |
| {'loss': '3.348', 'grad_norm': '0.362', 'learning_rate': '0.0005768', 'epoch': '0.1435'} | |
| {'loss': '3.346', 'grad_norm': '0.4139', 'learning_rate': '0.0005765', 'epoch': '0.1442'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5.62it/s] | |
| {'loss': '3.345', 'grad_norm': '0.3952', 'learning_rate': '0.0005763', 'epoch': '0.1448'} | |
| {'loss': '3.348', 'grad_norm': '0.3764', 'learning_rate': '0.0005761', 'epoch': '0.1455'} | |
| {'loss': '3.351', 'grad_norm': '0.4298', 'learning_rate': '0.0005758', 'epoch': '0.1461'} | |
| {'loss': '3.347', 'grad_norm': '0.4231', 'learning_rate': '0.0005756', 'epoch': '0.1468'} | |
| {'loss': '3.347', 'grad_norm': '0.4313', 'learning_rate': '0.0005753', 'epoch': '0.1475'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.84it/s] | |
| {'loss': '3.342', 'grad_norm': '0.4146', 'learning_rate': '0.0005751', 'epoch': '0.1481'} | |
| {'loss': '3.346', 'grad_norm': '0.394', 'learning_rate': '0.0005748', 'epoch': '0.1488'} | |
| {'loss': '3.345', 'grad_norm': '0.4244', 'learning_rate': '0.0005746', 'epoch': '0.1494'} | |
| {'loss': '3.349', 'grad_norm': '0.4101', 'learning_rate': '0.0005743', 'epoch': '0.1501'} | |
| {'loss': '3.337', 'grad_norm': '0.3851', 'learning_rate': '0.000574', 'epoch': '0.1507'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.32it/s] | |
| {'loss': '3.344', 'grad_norm': '0.3972', 'learning_rate': '0.0005738', 'epoch': '0.1514'} | |
| {'loss': '3.343', 'grad_norm': '0.4008', 'learning_rate': '0.0005735', 'epoch': '0.152'} | |
| {'loss': '3.345', 'grad_norm': '0.4243', 'learning_rate': '0.0005733', 'epoch': '0.1527'} | |
| {'loss': '3.335', 'grad_norm': '0.4064', 'learning_rate': '0.000573', 'epoch': '0.1534'} | |
| {'loss': '3.337', 'grad_norm': '0.3827', 'learning_rate': '0.0005727', 'epoch': '0.154'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 2.94it/s] | |
| {'loss': '3.338', 'grad_norm': '0.3717', 'learning_rate': '0.0005725', 'epoch': '0.1547'} | |
| {'loss': '3.339', 'grad_norm': '0.3652', 'learning_rate': '0.0005722', 'epoch': '0.1553'} | |
| {'loss': '3.333', 'grad_norm': '0.38', 'learning_rate': '0.000572', 'epoch': '0.156'} | |
| {'loss': '3.34', 'grad_norm': '0.3919', 'learning_rate': '0.0005717', 'epoch': '0.1566'} | |
| {'loss': '3.339', 'grad_norm': '0.424', 'learning_rate': '0.0005714', 'epoch': '0.1573'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.18it/s] | |
| {'loss': '3.338', 'grad_norm': '0.3888', 'learning_rate': '0.0005711', 'epoch': '0.1579'} | |
| {'loss': '3.342', 'grad_norm': '0.3861', 'learning_rate': '0.0005709', 'epoch': '0.1586'} | |
| {'loss': '3.337', 'grad_norm': '0.4329', 'learning_rate': '0.0005706', 'epoch': '0.1593'} | |
| {'loss': '3.332', 'grad_norm': '0.3949', 'learning_rate': '0.0005703', 'epoch': '0.1599'} | |
| {'loss': '3.341', 'grad_norm': '0.3942', 'learning_rate': '0.0005701', 'epoch': '0.1606'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.51it/s] | |
| {'loss': '3.337', 'grad_norm': '0.3761', 'learning_rate': '0.0005698', 'epoch': '0.1612'} | |
| {'loss': '3.332', 'grad_norm': '0.4193', 'learning_rate': '0.0005695', 'epoch': '0.1619'} | |
| {'loss': '3.336', 'grad_norm': '0.3785', 'learning_rate': '0.0005692', 'epoch': '0.1625'} | |
| {'loss': '3.342', 'grad_norm': '0.407', 'learning_rate': '0.000569', 'epoch': '0.1632'} | |
| {'loss': '3.332', 'grad_norm': '0.3972', 'learning_rate': '0.0005687', 'epoch': '0.1638'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.63it/s] | |
| {'loss': '3.335', 'grad_norm': '0.4236', 'learning_rate': '0.0005684', 'epoch': '0.1645'} | |
| {'loss': '3.339', 'grad_norm': '0.4263', 'learning_rate': '0.0005681', 'epoch': '0.1652'} | |
| {'loss': '3.332', 'grad_norm': '0.4147', 'learning_rate': '0.0005678', 'epoch': '0.1658'} | |
| {'loss': '3.332', 'grad_norm': '0.4093', 'learning_rate': '0.0005675', 'epoch': '0.1665'} | |
| {'loss': '3.331', 'grad_norm': '0.4688', 'learning_rate': '0.0005673', 'epoch': '0.1671'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.28it/s] | |
| {'loss': '3.33', 'grad_norm': '0.4011', 'learning_rate': '0.000567', 'epoch': '0.1678'} | |
| {'loss': '3.328', 'grad_norm': '0.3979', 'learning_rate': '0.0005667', 'epoch': '0.1684'} | |
| {'loss': '3.331', 'grad_norm': '0.3958', 'learning_rate': '0.0005664', 'epoch': '0.1691'} | |
| {'loss': '3.33', 'grad_norm': '0.4354', 'learning_rate': '0.0005661', 'epoch': '0.1697'} | |
| {'loss': '3.331', 'grad_norm': '0.4233', 'learning_rate': '0.0005658', 'epoch': '0.1704'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.34it/s] | |
| {'loss': '3.327', 'grad_norm': '0.3645', 'learning_rate': '0.0005655', 'epoch': '0.171'} | |
| {'loss': '3.33', 'grad_norm': '0.4305', 'learning_rate': '0.0005652', 'epoch': '0.1717'} | |
| {'loss': '3.33', 'grad_norm': '0.3957', 'learning_rate': '0.0005649', 'epoch': '0.1724'} | |
| {'loss': '3.333', 'grad_norm': '0.3876', 'learning_rate': '0.0005646', 'epoch': '0.173'} | |
| {'loss': '3.327', 'grad_norm': '0.4283', 'learning_rate': '0.0005643', 'epoch': '0.1737'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.27it/s] | |
| {'loss': '3.323', 'grad_norm': '0.4096', 'learning_rate': '0.000564', 'epoch': '0.1743'} | |
| {'loss': '3.328', 'grad_norm': '0.4385', 'learning_rate': '0.0005637', 'epoch': '0.175'} | |
| {'loss': '3.326', 'grad_norm': '0.4151', 'learning_rate': '0.0005634', 'epoch': '0.1756'} | |
| {'loss': '3.328', 'grad_norm': '0.4207', 'learning_rate': '0.0005631', 'epoch': '0.1763'} | |
| {'loss': '3.327', 'grad_norm': '0.4172', 'learning_rate': '0.0005628', 'epoch': '0.1769'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.16it/s] | |
| {'loss': '3.328', 'grad_norm': '0.4056', 'learning_rate': '0.0005625', 'epoch': '0.1776'} | |
| {'loss': '3.329', 'grad_norm': '0.4142', 'learning_rate': '0.0005622', 'epoch': '0.1783'} | |
| {'loss': '3.326', 'grad_norm': '0.4323', 'learning_rate': '0.0005619', 'epoch': '0.1789'} | |
| {'loss': '3.323', 'grad_norm': '0.3874', 'learning_rate': '0.0005616', 'epoch': '0.1796'} | |
| {'loss': '3.325', 'grad_norm': '0.4138', 'learning_rate': '0.0005613', 'epoch': '0.1802'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 2.98it/s] | |
| {'loss': '3.326', 'grad_norm': '0.4033', 'learning_rate': '0.000561', 'epoch': '0.1809'} | |
| {'loss': '3.324', 'grad_norm': '0.3812', 'learning_rate': '0.0005607', 'epoch': '0.1815'} | |
| {'loss': '3.333', 'grad_norm': '0.4189', 'learning_rate': '0.0005604', 'epoch': '0.1822'} | |
| {'loss': '3.322', 'grad_norm': '0.4055', 'learning_rate': '0.00056', 'epoch': '0.1828'} | |
| {'loss': '3.326', 'grad_norm': '0.4067', 'learning_rate': '0.0005597', 'epoch': '0.1835'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.86it/s] | |
| {'loss': '3.319', 'grad_norm': '0.3985', 'learning_rate': '0.0005594', 'epoch': '0.1842'} | |
| {'loss': '3.318', 'grad_norm': '0.3674', 'learning_rate': '0.0005591', 'epoch': '0.1848'} | |
| {'loss': '3.318', 'grad_norm': '0.4153', 'learning_rate': '0.0005588', 'epoch': '0.1855'} | |
| {'loss': '3.323', 'grad_norm': '0.4415', 'learning_rate': '0.0005585', 'epoch': '0.1861'} | |
| {'loss': '3.317', 'grad_norm': '0.4189', 'learning_rate': '0.0005581', 'epoch': '0.1868'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.64it/s] | |
| {'loss': '3.319', 'grad_norm': '0.4219', 'learning_rate': '0.0005578', 'epoch': '0.1874'} | |
| {'loss': '3.324', 'grad_norm': '0.4226', 'learning_rate': '0.0005575', 'epoch': '0.1881'} | |
| {'loss': '3.318', 'grad_norm': '0.3861', 'learning_rate': '0.0005572', 'epoch': '0.1887'} | |
| {'loss': '3.321', 'grad_norm': '0.4099', 'learning_rate': '0.0005568', 'epoch': '0.1894'} | |
| {'loss': '3.317', 'grad_norm': '0.4526', 'learning_rate': '0.0005565', 'epoch': '0.1901'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.84it/s] | |
| {'loss': '3.323', 'grad_norm': '0.4115', 'learning_rate': '0.0005562', 'epoch': '0.1907'} | |
| {'loss': '3.32', 'grad_norm': '0.3804', 'learning_rate': '0.0005559', 'epoch': '0.1914'} | |
| {'loss': '3.323', 'grad_norm': '0.4126', 'learning_rate': '0.0005555', 'epoch': '0.192'} | |
| {'loss': '3.32', 'grad_norm': '0.397', 'learning_rate': '0.0005552', 'epoch': '0.1927'} | |
| {'loss': '3.315', 'grad_norm': '0.4178', 'learning_rate': '0.0005549', 'epoch': '0.1933'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.60it/s] | |
| {'loss': '3.323', 'grad_norm': '0.4269', 'learning_rate': '0.0005545', 'epoch': '0.194'} | |
| {'loss': '3.309', 'grad_norm': '0.3857', 'learning_rate': '0.0005542', 'epoch': '0.1946'} | |
| {'loss': '3.31', 'grad_norm': '0.4207', 'learning_rate': '0.0005539', 'epoch': '0.1953'} | |
| {'loss': '3.315', 'grad_norm': '0.4172', 'learning_rate': '0.0005535', 'epoch': '0.196'} | |
| {'loss': '3.318', 'grad_norm': '0.4196', 'learning_rate': '0.0005532', 'epoch': '0.1966'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.51it/s] | |
| {'loss': '3.317', 'grad_norm': '0.4169', 'learning_rate': '0.0005529', 'epoch': '0.1973'} | |
| {'loss': '3.317', 'grad_norm': '0.4135', 'learning_rate': '0.0005525', 'epoch': '0.1979'} | |
| {'loss': '3.31', 'grad_norm': '0.4213', 'learning_rate': '0.0005522', 'epoch': '0.1986'} | |
| {'loss': '3.312', 'grad_norm': '0.3907', 'learning_rate': '0.0005518', 'epoch': '0.1992'} | |
| {'loss': '3.308', 'grad_norm': '0.4401', 'learning_rate': '0.0005515', 'epoch': '0.1999'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.49it/s] | |
| {'loss': '3.309', 'grad_norm': '0.4019', 'learning_rate': '0.0005511', 'epoch': '0.2005'} | |
| {'loss': '3.314', 'grad_norm': '0.4282', 'learning_rate': '0.0005508', 'epoch': '0.2012'} | |
| {'loss': '3.312', 'grad_norm': '0.3946', 'learning_rate': '0.0005505', 'epoch': '0.2019'} | |
| {'loss': '3.309', 'grad_norm': '0.3791', 'learning_rate': '0.0005501', 'epoch': '0.2025'} | |
| {'loss': '3.314', 'grad_norm': '0.4115', 'learning_rate': '0.0005498', 'epoch': '0.2032'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5.30it/s] | |
| {'loss': '3.315', 'grad_norm': '0.4361', 'learning_rate': '0.0005494', 'epoch': '0.2038'} | |
| {'loss': '3.311', 'grad_norm': '0.3919', 'learning_rate': '0.0005491', 'epoch': '0.2045'} | |
| {'loss': '3.309', 'grad_norm': '0.417', 'learning_rate': '0.0005487', 'epoch': '0.2051'} | |
| {'loss': '3.308', 'grad_norm': '0.434', 'learning_rate': '0.0005484', 'epoch': '0.2058'} | |
| {'loss': '3.311', 'grad_norm': '0.3879', 'learning_rate': '0.000548', 'epoch': '0.2064'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.15it/s] | |
| {'loss': '3.309', 'grad_norm': '0.4034', 'learning_rate': '0.0005476', 'epoch': '0.2071'} | |
| {'loss': '3.312', 'grad_norm': '0.429', 'learning_rate': '0.0005473', 'epoch': '0.2077'} | |
| {'loss': '3.315', 'grad_norm': '0.405', 'learning_rate': '0.0005469', 'epoch': '0.2084'} | |
| {'loss': '3.305', 'grad_norm': '0.4224', 'learning_rate': '0.0005466', 'epoch': '0.2091'} | |
| {'loss': '3.307', 'grad_norm': '0.4002', 'learning_rate': '0.0005462', 'epoch': '0.2097'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.63it/s] | |
| {'loss': '3.315', 'grad_norm': '0.4299', 'learning_rate': '0.0005459', 'epoch': '0.2104'} | |
| {'loss': '3.303', 'grad_norm': '0.3953', 'learning_rate': '0.0005455', 'epoch': '0.211'} | |
| {'loss': '3.31', 'grad_norm': '0.4181', 'learning_rate': '0.0005451', 'epoch': '0.2117'} | |
| {'loss': '3.307', 'grad_norm': '0.4256', 'learning_rate': '0.0005448', 'epoch': '0.2123'} | |
| {'loss': '3.307', 'grad_norm': '0.4429', 'learning_rate': '0.0005444', 'epoch': '0.213'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.07it/s] | |
| {'loss': '3.303', 'grad_norm': '0.4143', 'learning_rate': '0.000544', 'epoch': '0.2136'} | |
| {'loss': '3.311', 'grad_norm': '0.4011', 'learning_rate': '0.0005437', 'epoch': '0.2143'} | |
| {'loss': '3.305', 'grad_norm': '0.4365', 'learning_rate': '0.0005433', 'epoch': '0.215'} | |
| {'loss': '3.309', 'grad_norm': '0.4251', 'learning_rate': '0.0005429', 'epoch': '0.2156'} | |
| {'loss': '3.31', 'grad_norm': '0.4824', 'learning_rate': '0.0005426', 'epoch': '0.2163'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.88it/s] | |
| {'loss': '3.306', 'grad_norm': '0.3786', 'learning_rate': '0.0005422', 'epoch': '0.2169'} | |
| {'loss': '3.308', 'grad_norm': '0.4208', 'learning_rate': '0.0005418', 'epoch': '0.2176'} | |
| {'loss': '3.298', 'grad_norm': '0.4428', 'learning_rate': '0.0005414', 'epoch': '0.2182'} | |
| {'loss': '3.307', 'grad_norm': '0.4398', 'learning_rate': '0.0005411', 'epoch': '0.2189'} | |
| {'loss': '3.303', 'grad_norm': '0.4392', 'learning_rate': '0.0005407', 'epoch': '0.2195'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.61it/s] | |
| {'loss': '3.308', 'grad_norm': '0.3917', 'learning_rate': '0.0005403', 'epoch': '0.2202'} | |
| {'loss': '3.31', 'grad_norm': '0.3676', 'learning_rate': '0.0005399', 'epoch': '0.2209'} | |
| {'loss': '3.304', 'grad_norm': '0.4456', 'learning_rate': '0.0005396', 'epoch': '0.2215'} | |
| {'loss': '3.307', 'grad_norm': '0.4102', 'learning_rate': '0.0005392', 'epoch': '0.2222'} | |
| {'loss': '3.303', 'grad_norm': '0.4049', 'learning_rate': '0.0005388', 'epoch': '0.2228'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.06it/s] | |
| {'loss': '3.313', 'grad_norm': '0.4135', 'learning_rate': '0.0005384', 'epoch': '0.2235'} | |
| {'loss': '3.303', 'grad_norm': '0.4272', 'learning_rate': '0.000538', 'epoch': '0.2241'} | |
| {'loss': '3.301', 'grad_norm': '0.3994', 'learning_rate': '0.0005376', 'epoch': '0.2248'} | |
| {'loss': '3.299', 'grad_norm': '0.3901', 'learning_rate': '0.0005373', 'epoch': '0.2254'} | |
| {'loss': '3.308', 'grad_norm': '0.4429', 'learning_rate': '0.0005369', 'epoch': '0.2261'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.41it/s] | |
| {'loss': '3.306', 'grad_norm': '0.4178', 'learning_rate': '0.0005365', 'epoch': '0.2268'} | |
| {'loss': '3.302', 'grad_norm': '0.4027', 'learning_rate': '0.0005361', 'epoch': '0.2274'} | |
| {'loss': '3.3', 'grad_norm': '0.4094', 'learning_rate': '0.0005357', 'epoch': '0.2281'} | |
| {'loss': '3.303', 'grad_norm': '0.4042', 'learning_rate': '0.0005353', 'epoch': '0.2287'} | |
| {'loss': '3.302', 'grad_norm': '0.4301', 'learning_rate': '0.0005349', 'epoch': '0.2294'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.63it/s] | |
| {'loss': '3.295', 'grad_norm': '0.4202', 'learning_rate': '0.0005345', 'epoch': '0.23'} | |
| {'loss': '3.298', 'grad_norm': '0.4166', 'learning_rate': '0.0005341', 'epoch': '0.2307'} | |
| {'loss': '3.298', 'grad_norm': '0.4179', 'learning_rate': '0.0005337', 'epoch': '0.2313'} | |
| {'loss': '3.297', 'grad_norm': '0.4093', 'learning_rate': '0.0005334', 'epoch': '0.232'} | |
| {'loss': '3.297', 'grad_norm': '0.4061', 'learning_rate': '0.000533', 'epoch': '0.2327'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.39it/s] | |
| {'loss': '3.298', 'grad_norm': '0.4248', 'learning_rate': '0.0005326', 'epoch': '0.2333'} | |
| {'loss': '3.302', 'grad_norm': '0.4083', 'learning_rate': '0.0005322', 'epoch': '0.234'} | |
| {'loss': '3.307', 'grad_norm': '0.436', 'learning_rate': '0.0005318', 'epoch': '0.2346'} | |
| {'loss': '3.295', 'grad_norm': '0.4644', 'learning_rate': '0.0005314', 'epoch': '0.2353'} | |
| {'loss': '3.299', 'grad_norm': '0.4748', 'learning_rate': '0.000531', 'epoch': '0.2359'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5.80it/s] | |
| {'loss': '3.298', 'grad_norm': '0.4075', 'learning_rate': '0.0005306', 'epoch': '0.2366'} | |
| {'loss': '3.296', 'grad_norm': '0.4763', 'learning_rate': '0.0005302', 'epoch': '0.2372'} | |
| {'loss': '3.301', 'grad_norm': '0.4135', 'learning_rate': '0.0005297', 'epoch': '0.2379'} | |
| {'loss': '3.29', 'grad_norm': '0.4013', 'learning_rate': '0.0005293', 'epoch': '0.2386'} | |
| {'loss': '3.295', 'grad_norm': '0.4345', 'learning_rate': '0.0005289', 'epoch': '0.2392'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.57it/s] | |
| {'loss': '3.291', 'grad_norm': '0.4374', 'learning_rate': '0.0005285', 'epoch': '0.2399'} | |
| {'loss': '3.294', 'grad_norm': '0.3919', 'learning_rate': '0.0005281', 'epoch': '0.2405'} | |
| {'loss': '3.292', 'grad_norm': '0.3981', 'learning_rate': '0.0005277', 'epoch': '0.2412'} | |
| {'loss': '3.291', 'grad_norm': '0.4253', 'learning_rate': '0.0005273', 'epoch': '0.2418'} | |
| {'loss': '3.292', 'grad_norm': '0.4287', 'learning_rate': '0.0005269', 'epoch': '0.2425'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.31it/s] | |
| {'loss': '3.294', 'grad_norm': '0.4415', 'learning_rate': '0.0005265', 'epoch': '0.2431'} | |
| {'loss': '3.292', 'grad_norm': '0.4087', 'learning_rate': '0.0005261', 'epoch': '0.2438'} | |
| {'loss': '3.297', 'grad_norm': '0.4502', 'learning_rate': '0.0005256', 'epoch': '0.2444'} | |
| {'loss': '3.291', 'grad_norm': '0.3993', 'learning_rate': '0.0005252', 'epoch': '0.2451'} | |
| {'loss': '3.291', 'grad_norm': '0.4159', 'learning_rate': '0.0005248', 'epoch': '0.2458'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5.84it/s] | |
| {'loss': '3.295', 'grad_norm': '0.4288', 'learning_rate': '0.0005244', 'epoch': '0.2464'} | |
| {'loss': '3.29', 'grad_norm': '0.4135', 'learning_rate': '0.000524', 'epoch': '0.2471'} | |
| {'loss': '3.289', 'grad_norm': '0.3979', 'learning_rate': '0.0005236', 'epoch': '0.2477'} | |
| {'loss': '3.288', 'grad_norm': '0.46', 'learning_rate': '0.0005231', 'epoch': '0.2484'} | |
| {'loss': '3.292', 'grad_norm': '0.4309', 'learning_rate': '0.0005227', 'epoch': '0.249'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.29it/s] | |
| {'loss': '3.287', 'grad_norm': '0.4198', 'learning_rate': '0.0005223', 'epoch': '0.2497'} | |
| {'loss': '3.292', 'grad_norm': '0.4363', 'learning_rate': '0.0005219', 'epoch': '0.2503'} | |
| {'loss': '3.294', 'grad_norm': '0.4288', 'learning_rate': '0.0005214', 'epoch': '0.251'} | |
| {'loss': '3.285', 'grad_norm': '0.467', 'learning_rate': '0.000521', 'epoch': '0.2517'} | |
| {'loss': '3.297', 'grad_norm': '0.423', 'learning_rate': '0.0005206', 'epoch': '0.2523'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.91it/s] | |
| {'loss': '3.292', 'grad_norm': '0.4233', 'learning_rate': '0.0005202', 'epoch': '0.253'} | |
| {'loss': '3.293', 'grad_norm': '0.39', 'learning_rate': '0.0005197', 'epoch': '0.2536'} | |
| {'loss': '3.287', 'grad_norm': '0.4068', 'learning_rate': '0.0005193', 'epoch': '0.2543'} | |
| {'loss': '3.294', 'grad_norm': '0.4774', 'learning_rate': '0.0005189', 'epoch': '0.2549'} | |
| {'loss': '3.289', 'grad_norm': '0.4119', 'learning_rate': '0.0005184', 'epoch': '0.2556'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.22it/s] | |
| {'loss': '3.285', 'grad_norm': '0.4129', 'learning_rate': '0.000518', 'epoch': '0.2562'} | |
| {'loss': '3.289', 'grad_norm': '0.4281', 'learning_rate': '0.0005176', 'epoch': '0.2569'} | |
| {'loss': '3.296', 'grad_norm': '0.4259', 'learning_rate': '0.0005171', 'epoch': '0.2576'} | |
| {'loss': '3.285', 'grad_norm': '0.4517', 'learning_rate': '0.0005167', 'epoch': '0.2582'} | |
| {'loss': '3.291', 'grad_norm': '0.4411', 'learning_rate': '0.0005163', 'epoch': '0.2589'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.82it/s] | |
| {'loss': '3.293', 'grad_norm': '0.4366', 'learning_rate': '0.0005158', 'epoch': '0.2595'} | |
| {'loss': '3.286', 'grad_norm': '0.4526', 'learning_rate': '0.0005154', 'epoch': '0.2602'} | |
| {'loss': '3.287', 'grad_norm': '0.4371', 'learning_rate': '0.000515', 'epoch': '0.2608'} | |
| {'loss': '3.288', 'grad_norm': '0.4736', 'learning_rate': '0.0005145', 'epoch': '0.2615'} | |
| {'loss': '3.284', 'grad_norm': '0.4448', 'learning_rate': '0.0005141', 'epoch': '0.2621'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.47it/s] | |
| {'loss': '3.289', 'grad_norm': '0.4361', 'learning_rate': '0.0005136', 'epoch': '0.2628'} | |
| {'loss': '3.292', 'grad_norm': '0.4069', 'learning_rate': '0.0005132', 'epoch': '0.2635'} | |
| {'loss': '3.285', 'grad_norm': '0.429', 'learning_rate': '0.0005127', 'epoch': '0.2641'} | |
| {'loss': '3.29', 'grad_norm': '0.45', 'learning_rate': '0.0005123', 'epoch': '0.2648'} | |
| {'loss': '3.29', 'grad_norm': '0.4603', 'learning_rate': '0.0005119', 'epoch': '0.2654'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.59it/s] | |
| {'loss': '3.286', 'grad_norm': '0.4661', 'learning_rate': '0.0005114', 'epoch': '0.2661'} | |
| {'loss': '3.283', 'grad_norm': '0.4116', 'learning_rate': '0.000511', 'epoch': '0.2667'} | |
| {'loss': '3.289', 'grad_norm': '0.4033', 'learning_rate': '0.0005105', 'epoch': '0.2674'} | |
| {'loss': '3.287', 'grad_norm': '0.4275', 'learning_rate': '0.0005101', 'epoch': '0.268'} | |
| {'loss': '3.285', 'grad_norm': '0.4323', 'learning_rate': '0.0005096', 'epoch': '0.2687'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5.46it/s] | |
| {'loss': '3.287', 'grad_norm': '0.4436', 'learning_rate': '0.0005092', 'epoch': '0.2694'} | |
| {'loss': '3.283', 'grad_norm': '0.4717', 'learning_rate': '0.0005087', 'epoch': '0.27'} | |
| {'loss': '3.282', 'grad_norm': '0.4171', 'learning_rate': '0.0005083', 'epoch': '0.2707'} | |
| {'loss': '3.285', 'grad_norm': '0.466', 'learning_rate': '0.0005078', 'epoch': '0.2713'} | |
| {'loss': '3.288', 'grad_norm': '0.4331', 'learning_rate': '0.0005073', 'epoch': '0.272'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.88it/s] | |
| {'loss': '3.282', 'grad_norm': '0.4762', 'learning_rate': '0.0005069', 'epoch': '0.2726'} | |
| {'loss': '3.279', 'grad_norm': '0.4448', 'learning_rate': '0.0005064', 'epoch': '0.2733'} | |
| {'loss': '3.283', 'grad_norm': '0.417', 'learning_rate': '0.000506', 'epoch': '0.2739'} | |
| {'loss': '3.283', 'grad_norm': '0.4176', 'learning_rate': '0.0005055', 'epoch': '0.2746'} | |
| {'loss': '3.281', 'grad_norm': '0.4735', 'learning_rate': '0.0005051', 'epoch': '0.2753'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.03it/s] | |
| {'loss': '3.291', 'grad_norm': '0.4538', 'learning_rate': '0.0005046', 'epoch': '0.2759'} | |
| {'loss': '3.283', 'grad_norm': '0.4483', 'learning_rate': '0.0005041', 'epoch': '0.2766'} | |
| {'loss': '3.287', 'grad_norm': '0.3931', 'learning_rate': '0.0005037', 'epoch': '0.2772'} | |
| {'loss': '3.281', 'grad_norm': '0.4505', 'learning_rate': '0.0005032', 'epoch': '0.2779'} | |
| {'loss': '3.277', 'grad_norm': '0.4187', 'learning_rate': '0.0005027', 'epoch': '0.2785'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5.26it/s] | |
| {'loss': '3.282', 'grad_norm': '0.4115', 'learning_rate': '0.0005023', 'epoch': '0.2792'} | |
| {'loss': '3.282', 'grad_norm': '0.4723', 'learning_rate': '0.0005018', 'epoch': '0.2798'} | |
| {'loss': '3.288', 'grad_norm': '0.4022', 'learning_rate': '0.0005013', 'epoch': '0.2805'} | |
| {'loss': '3.279', 'grad_norm': '0.4272', 'learning_rate': '0.0005009', 'epoch': '0.2811'} | |
| {'loss': '3.287', 'grad_norm': '0.4183', 'learning_rate': '0.0005004', 'epoch': '0.2818'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.06it/s] | |
| {'loss': '3.28', 'grad_norm': '0.4252', 'learning_rate': '0.0004999', 'epoch': '0.2825'} | |
| {'loss': '3.283', 'grad_norm': '0.4789', 'learning_rate': '0.0004995', 'epoch': '0.2831'} | |
| {'loss': '3.282', 'grad_norm': '0.41', 'learning_rate': '0.000499', 'epoch': '0.2838'} | |
| {'loss': '3.279', 'grad_norm': '0.505', 'learning_rate': '0.0004985', 'epoch': '0.2844'} | |
| {'loss': '3.281', 'grad_norm': '0.4099', 'learning_rate': '0.0004981', 'epoch': '0.2851'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.59it/s] | |
| {'loss': '3.277', 'grad_norm': '0.4219', 'learning_rate': '0.0004976', 'epoch': '0.2857'} | |
| {'loss': '3.279', 'grad_norm': '0.4118', 'learning_rate': '0.0004971', 'epoch': '0.2864'} | |
| {'loss': '3.278', 'grad_norm': '0.4192', 'learning_rate': '0.0004966', 'epoch': '0.287'} | |
| {'loss': '3.277', 'grad_norm': '0.4265', 'learning_rate': '0.0004962', 'epoch': '0.2877'} | |
| {'loss': '3.279', 'grad_norm': '0.421', 'learning_rate': '0.0004957', 'epoch': '0.2884'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.02it/s] | |
| {'loss': '3.28', 'grad_norm': '0.4099', 'learning_rate': '0.0004952', 'epoch': '0.289'} | |
| {'loss': '3.28', 'grad_norm': '0.4933', 'learning_rate': '0.0004947', 'epoch': '0.2897'} | |
| {'loss': '3.28', 'grad_norm': '0.4045', 'learning_rate': '0.0004942', 'epoch': '0.2903'} | |
| {'loss': '3.276', 'grad_norm': '0.4753', 'learning_rate': '0.0004938', 'epoch': '0.291'} | |
| {'loss': '3.273', 'grad_norm': '0.4605', 'learning_rate': '0.0004933', 'epoch': '0.2916'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.78it/s] | |
| {'loss': '3.275', 'grad_norm': '0.4532', 'learning_rate': '0.0004928', 'epoch': '0.2923'} | |
| {'loss': '3.273', 'grad_norm': '0.4628', 'learning_rate': '0.0004923', 'epoch': '0.2929'} | |
| {'loss': '3.28', 'grad_norm': '0.4034', 'learning_rate': '0.0004918', 'epoch': '0.2936'} | |
| {'loss': '3.272', 'grad_norm': '0.4407', 'learning_rate': '0.0004913', 'epoch': '0.2943'} | |
| {'loss': '3.279', 'grad_norm': '0.4513', 'learning_rate': '0.0004909', 'epoch': '0.2949'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.65it/s] | |
| {'loss': '3.282', 'grad_norm': '0.4502', 'learning_rate': '0.0004904', 'epoch': '0.2956'} | |
| {'loss': '3.276', 'grad_norm': '0.417', 'learning_rate': '0.0004899', 'epoch': '0.2962'} | |
| {'loss': '3.272', 'grad_norm': '0.4675', 'learning_rate': '0.0004894', 'epoch': '0.2969'} | |
| {'loss': '3.272', 'grad_norm': '0.4454', 'learning_rate': '0.0004889', 'epoch': '0.2975'} | |
| {'loss': '3.27', 'grad_norm': '0.4224', 'learning_rate': '0.0004884', 'epoch': '0.2982'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.38it/s] | |
| {'loss': '3.277', 'grad_norm': '0.4185', 'learning_rate': '0.0004879', 'epoch': '0.2988'} | |
| {'loss': '3.272', 'grad_norm': '0.4154', 'learning_rate': '0.0004874', 'epoch': '0.2995'} | |
| {'loss': '3.274', 'grad_norm': '0.4373', 'learning_rate': '0.0004869', 'epoch': '0.3002'} | |
| {'loss': '3.273', 'grad_norm': '0.4612', 'learning_rate': '0.0004864', 'epoch': '0.3008'} | |
| {'loss': '3.269', 'grad_norm': '0.4317', 'learning_rate': '0.0004859', 'epoch': '0.3015'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.40it/s] | |
| {'loss': '3.272', 'grad_norm': '0.4245', 'learning_rate': '0.0004855', 'epoch': '0.3021'} | |
| {'loss': '3.277', 'grad_norm': '0.459', 'learning_rate': '0.000485', 'epoch': '0.3028'} | |
| {'loss': '3.279', 'grad_norm': '0.4376', 'learning_rate': '0.0004845', 'epoch': '0.3034'} | |
| {'loss': '3.274', 'grad_norm': '0.4456', 'learning_rate': '0.000484', 'epoch': '0.3041'} | |
| {'loss': '3.271', 'grad_norm': '0.4329', 'learning_rate': '0.0004835', 'epoch': '0.3047'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.95it/s] | |
| {'loss': '3.275', 'grad_norm': '0.475', 'learning_rate': '0.000483', 'epoch': '0.3054'} | |
| {'loss': '3.274', 'grad_norm': '0.4173', 'learning_rate': '0.0004825', 'epoch': '0.3061'} | |
| {'loss': '3.269', 'grad_norm': '0.4307', 'learning_rate': '0.000482', 'epoch': '0.3067'} | |
| {'loss': '3.272', 'grad_norm': '0.4991', 'learning_rate': '0.0004815', 'epoch': '0.3074'} | |
| {'loss': '3.271', 'grad_norm': '0.4149', 'learning_rate': '0.000481', 'epoch': '0.308'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.07it/s] | |
| {'loss': '3.272', 'grad_norm': '0.4634', 'learning_rate': '0.0004805', 'epoch': '0.3087'} | |
| {'loss': '3.276', 'grad_norm': '0.4536', 'learning_rate': '0.00048', 'epoch': '0.3093'} | |
| {'loss': '3.269', 'grad_norm': '0.4109', 'learning_rate': '0.0004795', 'epoch': '0.31'} | |
| {'loss': '3.265', 'grad_norm': '0.4222', 'learning_rate': '0.0004789', 'epoch': '0.3106'} | |
| {'loss': '3.274', 'grad_norm': '0.4168', 'learning_rate': '0.0004784', 'epoch': '0.3113'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5.28it/s] | |
| {'loss': '3.272', 'grad_norm': '0.4087', 'learning_rate': '0.0004779', 'epoch': '0.312'} | |
| {'loss': '3.273', 'grad_norm': '0.4815', 'learning_rate': '0.0004774', 'epoch': '0.3126'} | |
| {'loss': '3.27', 'grad_norm': '0.4261', 'learning_rate': '0.0004769', 'epoch': '0.3133'} | |
| {'loss': '3.275', 'grad_norm': '0.4388', 'learning_rate': '0.0004764', 'epoch': '0.3139'} | |
| {'loss': '3.267', 'grad_norm': '0.4312', 'learning_rate': '0.0004759', 'epoch': '0.3146'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.70it/s] | |
| {'loss': '3.269', 'grad_norm': '0.4429', 'learning_rate': '0.0004754', 'epoch': '0.3152'} | |
| {'loss': '3.27', 'grad_norm': '0.4504', 'learning_rate': '0.0004749', 'epoch': '0.3159'} | |
| {'loss': '3.275', 'grad_norm': '0.4439', 'learning_rate': '0.0004744', 'epoch': '0.3165'} | |
| {'loss': '3.268', 'grad_norm': '0.445', 'learning_rate': '0.0004738', 'epoch': '0.3172'} | |
| {'loss': '3.269', 'grad_norm': '0.4463', 'learning_rate': '0.0004733', 'epoch': '0.3178'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.78it/s] | |
| {'loss': '3.271', 'grad_norm': '0.4737', 'learning_rate': '0.0004728', 'epoch': '0.3185'} | |
| {'loss': '3.27', 'grad_norm': '0.4419', 'learning_rate': '0.0004723', 'epoch': '0.3192'} | |
| {'loss': '3.266', 'grad_norm': '0.4606', 'learning_rate': '0.0004718', 'epoch': '0.3198'} | |
| {'loss': '3.264', 'grad_norm': '0.4492', 'learning_rate': '0.0004713', 'epoch': '0.3205'} | |
| {'loss': '3.266', 'grad_norm': '0.4502', 'learning_rate': '0.0004708', 'epoch': '0.3211'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.32it/s] | |
| {'loss': '3.266', 'grad_norm': '0.4692', 'learning_rate': '0.0004702', 'epoch': '0.3218'} | |
| {'loss': '3.263', 'grad_norm': '0.4328', 'learning_rate': '0.0004697', 'epoch': '0.3224'} | |
| {'loss': '3.268', 'grad_norm': '0.4387', 'learning_rate': '0.0004692', 'epoch': '0.3231'} | |
| {'loss': '3.265', 'grad_norm': '0.4508', 'learning_rate': '0.0004687', 'epoch': '0.3237'} | |
| {'loss': '3.263', 'grad_norm': '0.4707', 'learning_rate': '0.0004682', 'epoch': '0.3244'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.48it/s] | |
| {'loss': '3.27', 'grad_norm': '0.456', 'learning_rate': '0.0004676', 'epoch': '0.3251'} | |
| {'loss': '3.263', 'grad_norm': '0.4588', 'learning_rate': '0.0004671', 'epoch': '0.3257'} | |
| {'loss': '3.263', 'grad_norm': '0.4057', 'learning_rate': '0.0004666', 'epoch': '0.3264'} | |
| {'loss': '3.265', 'grad_norm': '0.45', 'learning_rate': '0.0004661', 'epoch': '0.327'} | |
| {'loss': '3.264', 'grad_norm': '0.4766', 'learning_rate': '0.0004655', 'epoch': '0.3277'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.50it/s] | |
| {'loss': '3.263', 'grad_norm': '0.4782', 'learning_rate': '0.000465', 'epoch': '0.3283'} | |
| {'loss': '3.27', 'grad_norm': '0.4372', 'learning_rate': '0.0004645', 'epoch': '0.329'} | |
| {'loss': '3.268', 'grad_norm': '0.4519', 'learning_rate': '0.000464', 'epoch': '0.3296'} | |
| {'loss': '3.255', 'grad_norm': '0.4323', 'learning_rate': '0.0004634', 'epoch': '0.3303'} | |
| {'loss': '3.264', 'grad_norm': '0.4575', 'learning_rate': '0.0004629', 'epoch': '0.331'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.83it/s] | |
| {'loss': '3.27', 'grad_norm': '0.4476', 'learning_rate': '0.0004624', 'epoch': '0.3316'} | |
| {'loss': '3.262', 'grad_norm': '0.4528', 'learning_rate': '0.0004618', 'epoch': '0.3323'} | |
| {'loss': '3.262', 'grad_norm': '0.4844', 'learning_rate': '0.0004613', 'epoch': '0.3329'} | |
| {'loss': '3.264', 'grad_norm': '0.4795', 'learning_rate': '0.0004608', 'epoch': '0.3336'} | |
| {'loss': '3.267', 'grad_norm': '0.4845', 'learning_rate': '0.0004602', 'epoch': '0.3342'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.45it/s] | |
| {'loss': '3.26', 'grad_norm': '0.4406', 'learning_rate': '0.0004597', 'epoch': '0.3349'} | |
| {'loss': '3.263', 'grad_norm': '0.4481', 'learning_rate': '0.0004592', 'epoch': '0.3355'} | |
| {'loss': '3.26', 'grad_norm': '0.452', 'learning_rate': '0.0004586', 'epoch': '0.3362'} | |
| {'loss': '3.26', 'grad_norm': '0.469', 'learning_rate': '0.0004581', 'epoch': '0.3369'} | |
| {'loss': '3.253', 'grad_norm': '0.4089', 'learning_rate': '0.0004576', 'epoch': '0.3375'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.17it/s] | |
| {'loss': '3.262', 'grad_norm': '0.4312', 'learning_rate': '0.000457', 'epoch': '0.3382'} | |
| {'loss': '3.259', 'grad_norm': '0.4376', 'learning_rate': '0.0004565', 'epoch': '0.3388'} | |
| {'loss': '3.257', 'grad_norm': '0.4248', 'learning_rate': '0.000456', 'epoch': '0.3395'} | |
| {'loss': '3.26', 'grad_norm': '0.4705', 'learning_rate': '0.0004554', 'epoch': '0.3401'} | |
| {'loss': '3.26', 'grad_norm': '0.4164', 'learning_rate': '0.0004549', 'epoch': '0.3408'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.79it/s] | |
| {'loss': '3.266', 'grad_norm': '0.4457', 'learning_rate': '0.0004543', 'epoch': '0.3414'} | |
| {'loss': '3.259', 'grad_norm': '0.4463', 'learning_rate': '0.0004538', 'epoch': '0.3421'} | |
| {'loss': '3.259', 'grad_norm': '0.4603', 'learning_rate': '0.0004533', 'epoch': '0.3428'} | |
| {'loss': '3.259', 'grad_norm': '0.4435', 'learning_rate': '0.0004527', 'epoch': '0.3434'} | |
| {'loss': '3.262', 'grad_norm': '0.4214', 'learning_rate': '0.0004522', 'epoch': '0.3441'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.42it/s] | |
| {'loss': '3.26', 'grad_norm': '0.4644', 'learning_rate': '0.0004516', 'epoch': '0.3447'} | |
| {'loss': '3.256', 'grad_norm': '0.4562', 'learning_rate': '0.0004511', 'epoch': '0.3454'} | |
| {'loss': '3.262', 'grad_norm': '0.4455', 'learning_rate': '0.0004505', 'epoch': '0.346'} | |
| {'loss': '3.259', 'grad_norm': '0.4552', 'learning_rate': '0.00045', 'epoch': '0.3467'} | |
| {'loss': '3.257', 'grad_norm': '0.4198', 'learning_rate': '0.0004494', 'epoch': '0.3473'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.60it/s] | |
| {'loss': '3.261', 'grad_norm': '0.4728', 'learning_rate': '0.0004489', 'epoch': '0.348'} | |
| {'loss': '3.259', 'grad_norm': '0.4621', 'learning_rate': '0.0004484', 'epoch': '0.3487'} | |
| {'loss': '3.256', 'grad_norm': '0.468', 'learning_rate': '0.0004478', 'epoch': '0.3493'} | |
| {'loss': '3.26', 'grad_norm': '0.452', 'learning_rate': '0.0004473', 'epoch': '0.35'} | |
| {'loss': '3.257', 'grad_norm': '0.4327', 'learning_rate': '0.0004467', 'epoch': '0.3506'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.22it/s] | |
| {'loss': '3.26', 'grad_norm': '0.4696', 'learning_rate': '0.0004462', 'epoch': '0.3513'} | |
| {'loss': '3.259', 'grad_norm': '0.4474', 'learning_rate': '0.0004456', 'epoch': '0.3519'} | |
| {'loss': '3.254', 'grad_norm': '0.4496', 'learning_rate': '0.0004451', 'epoch': '0.3526'} | |
| {'loss': '3.26', 'grad_norm': '0.4078', 'learning_rate': '0.0004445', 'epoch': '0.3532'} | |
| {'loss': '3.252', 'grad_norm': '0.4482', 'learning_rate': '0.0004439', 'epoch': '0.3539'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 2.79it/s] | |
| {'loss': '3.257', 'grad_norm': '0.4473', 'learning_rate': '0.0004434', 'epoch': '0.3545'} | |
| {'loss': '3.254', 'grad_norm': '0.4646', 'learning_rate': '0.0004428', 'epoch': '0.3552'} | |
| {'loss': '3.255', 'grad_norm': '0.4673', 'learning_rate': '0.0004423', 'epoch': '0.3559'} | |
| {'loss': '3.249', 'grad_norm': '0.4286', 'learning_rate': '0.0004417', 'epoch': '0.3565'} | |
| {'loss': '3.248', 'grad_norm': '0.463', 'learning_rate': '0.0004412', 'epoch': '0.3572'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.55it/s] | |
| {'loss': '3.256', 'grad_norm': '0.4485', 'learning_rate': '0.0004406', 'epoch': '0.3578'} | |
| {'loss': '3.257', 'grad_norm': '0.4817', 'learning_rate': '0.0004401', 'epoch': '0.3585'} | |
| {'loss': '3.256', 'grad_norm': '0.4897', 'learning_rate': '0.0004395', 'epoch': '0.3591'} | |
| {'loss': '3.255', 'grad_norm': '0.4551', 'learning_rate': '0.0004389', 'epoch': '0.3598'} | |
| {'loss': '3.255', 'grad_norm': '0.4687', 'learning_rate': '0.0004384', 'epoch': '0.3604'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.29it/s] | |
| {'loss': '3.264', 'grad_norm': '0.4215', 'learning_rate': '0.0004378', 'epoch': '0.3611'} | |
| {'loss': '3.255', 'grad_norm': '0.4536', 'learning_rate': '0.0004373', 'epoch': '0.3618'} | |
| {'loss': '3.252', 'grad_norm': '0.5059', 'learning_rate': '0.0004367', 'epoch': '0.3624'} | |
| {'loss': '3.253', 'grad_norm': '0.4578', 'learning_rate': '0.0004361', 'epoch': '0.3631'} | |
| {'loss': '3.255', 'grad_norm': '0.4677', 'learning_rate': '0.0004356', 'epoch': '0.3637'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.16it/s] | |
| {'loss': '3.258', 'grad_norm': '0.4477', 'learning_rate': '0.000435', 'epoch': '0.3644'} | |
| {'loss': '3.247', 'grad_norm': '0.4775', 'learning_rate': '0.0004345', 'epoch': '0.365'} | |
| {'loss': '3.257', 'grad_norm': '0.4636', 'learning_rate': '0.0004339', 'epoch': '0.3657'} | |
| {'loss': '3.256', 'grad_norm': '0.491', 'learning_rate': '0.0004333', 'epoch': '0.3663'} | |
| {'loss': '3.251', 'grad_norm': '0.4513', 'learning_rate': '0.0004328', 'epoch': '0.367'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.75it/s] | |
| {'loss': '3.251', 'grad_norm': '0.4497', 'learning_rate': '0.0004322', 'epoch': '0.3677'} | |
| {'loss': '3.258', 'grad_norm': '0.4543', 'learning_rate': '0.0004316', 'epoch': '0.3683'} | |
| {'loss': '3.255', 'grad_norm': '0.4645', 'learning_rate': '0.0004311', 'epoch': '0.369'} | |
| {'loss': '3.252', 'grad_norm': '0.4636', 'learning_rate': '0.0004305', 'epoch': '0.3696'} | |
| {'loss': '3.247', 'grad_norm': '0.4725', 'learning_rate': '0.0004299', 'epoch': '0.3703'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.44it/s] | |
| {'loss': '3.25', 'grad_norm': '0.4311', 'learning_rate': '0.0004294', 'epoch': '0.3709'} | |
| {'loss': '3.253', 'grad_norm': '0.4578', 'learning_rate': '0.0004288', 'epoch': '0.3716'} | |
| {'loss': '3.25', 'grad_norm': '0.4468', 'learning_rate': '0.0004282', 'epoch': '0.3722'} | |
| {'loss': '3.248', 'grad_norm': '0.4504', 'learning_rate': '0.0004277', 'epoch': '0.3729'} | |
| {'loss': '3.244', 'grad_norm': '0.4893', 'learning_rate': '0.0004271', 'epoch': '0.3736'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.96it/s] | |
| {'loss': '3.247', 'grad_norm': '0.4625', 'learning_rate': '0.0004265', 'epoch': '0.3742'} | |
| {'loss': '3.254', 'grad_norm': '0.4205', 'learning_rate': '0.0004259', 'epoch': '0.3749'} | |
| {'loss': '3.252', 'grad_norm': '0.4396', 'learning_rate': '0.0004254', 'epoch': '0.3755'} | |
| {'loss': '3.254', 'grad_norm': '0.4752', 'learning_rate': '0.0004248', 'epoch': '0.3762'} | |
| {'loss': '3.25', 'grad_norm': '0.4604', 'learning_rate': '0.0004242', 'epoch': '0.3768'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5.26it/s] | |
| {'loss': '3.242', 'grad_norm': '0.4707', 'learning_rate': '0.0004236', 'epoch': '0.3775'} | |
| {'loss': '3.244', 'grad_norm': '0.4613', 'learning_rate': '0.0004231', 'epoch': '0.3781'} | |
| {'loss': '3.249', 'grad_norm': '0.4585', 'learning_rate': '0.0004225', 'epoch': '0.3788'} | |
| {'loss': '3.249', 'grad_norm': '0.4497', 'learning_rate': '0.0004219', 'epoch': '0.3795'} | |
| {'loss': '3.252', 'grad_norm': '0.4544', 'learning_rate': '0.0004213', 'epoch': '0.3801'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.35it/s] | |
| {'loss': '3.249', 'grad_norm': '0.4723', 'learning_rate': '0.0004208', 'epoch': '0.3808'} | |
| {'loss': '3.249', 'grad_norm': '0.477', 'learning_rate': '0.0004202', 'epoch': '0.3814'} | |
| {'loss': '3.253', 'grad_norm': '0.4469', 'learning_rate': '0.0004196', 'epoch': '0.3821'} | |
| {'loss': '3.246', 'grad_norm': '0.4561', 'learning_rate': '0.000419', 'epoch': '0.3827'} | |
| {'loss': '3.247', 'grad_norm': '0.4483', 'learning_rate': '0.0004185', 'epoch': '0.3834'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.60it/s] | |
| {'loss': '3.251', 'grad_norm': '0.4371', 'learning_rate': '0.0004179', 'epoch': '0.384'} | |
| {'loss': '3.245', 'grad_norm': '0.472', 'learning_rate': '0.0004173', 'epoch': '0.3847'} | |
| {'loss': '3.251', 'grad_norm': '0.4505', 'learning_rate': '0.0004167', 'epoch': '0.3854'} | |
| {'loss': '3.246', 'grad_norm': '0.4938', 'learning_rate': '0.0004161', 'epoch': '0.386'} | |
| {'loss': '3.243', 'grad_norm': '0.4633', 'learning_rate': '0.0004156', 'epoch': '0.3867'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.06it/s] | |
| {'loss': '3.247', 'grad_norm': '0.4648', 'learning_rate': '0.000415', 'epoch': '0.3873'} | |
| {'loss': '3.247', 'grad_norm': '0.4496', 'learning_rate': '0.0004144', 'epoch': '0.388'} | |
| {'loss': '3.245', 'grad_norm': '0.4622', 'learning_rate': '0.0004138', 'epoch': '0.3886'} | |
| {'loss': '3.244', 'grad_norm': '0.4612', 'learning_rate': '0.0004132', 'epoch': '0.3893'} | |
| {'loss': '3.241', 'grad_norm': '0.4886', 'learning_rate': '0.0004126', 'epoch': '0.3899'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.27it/s] | |
| {'loss': '3.242', 'grad_norm': '0.4501', 'learning_rate': '0.0004121', 'epoch': '0.3906'} | |
| {'loss': '3.242', 'grad_norm': '0.4612', 'learning_rate': '0.0004115', 'epoch': '0.3912'} | |
| {'loss': '3.246', 'grad_norm': '0.453', 'learning_rate': '0.0004109', 'epoch': '0.3919'} | |
| {'loss': '3.241', 'grad_norm': '0.4666', 'learning_rate': '0.0004103', 'epoch': '0.3926'} | |
| {'loss': '3.24', 'grad_norm': '0.4556', 'learning_rate': '0.0004097', 'epoch': '0.3932'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.24it/s] | |
| {'loss': '3.246', 'grad_norm': '0.4626', 'learning_rate': '0.0004091', 'epoch': '0.3939'} | |
| {'loss': '3.24', 'grad_norm': '0.4649', 'learning_rate': '0.0004085', 'epoch': '0.3945'} | |
| {'loss': '3.242', 'grad_norm': '0.4926', 'learning_rate': '0.000408', 'epoch': '0.3952'} | |
| {'loss': '3.241', 'grad_norm': '0.4654', 'learning_rate': '0.0004074', 'epoch': '0.3958'} | |
| {'loss': '3.242', 'grad_norm': '0.4665', 'learning_rate': '0.0004068', 'epoch': '0.3965'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.11it/s] | |
| {'loss': '3.239', 'grad_norm': '0.4301', 'learning_rate': '0.0004062', 'epoch': '0.3971'} | |
| {'loss': '3.239', 'grad_norm': '0.4961', 'learning_rate': '0.0004056', 'epoch': '0.3978'} | |
| {'loss': '3.243', 'grad_norm': '0.4698', 'learning_rate': '0.000405', 'epoch': '0.3985'} | |
| {'loss': '3.242', 'grad_norm': '0.4147', 'learning_rate': '0.0004044', 'epoch': '0.3991'} | |
| {'loss': '3.244', 'grad_norm': '0.4621', 'learning_rate': '0.0004038', 'epoch': '0.3998'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.11it/s] | |
| {'loss': '3.241', 'grad_norm': '0.5477', 'learning_rate': '0.0004032', 'epoch': '0.4004'} | |
| {'loss': '3.241', 'grad_norm': '0.4406', 'learning_rate': '0.0004026', 'epoch': '0.4011'} | |
| {'loss': '3.238', 'grad_norm': '0.4559', 'learning_rate': '0.000402', 'epoch': '0.4017'} | |
| {'loss': '3.239', 'grad_norm': '0.4655', 'learning_rate': '0.0004015', 'epoch': '0.4024'} | |
| {'loss': '3.236', 'grad_norm': '0.4437', 'learning_rate': '0.0004009', 'epoch': '0.403'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.54it/s] | |
| {'loss': '3.238', 'grad_norm': '0.4322', 'learning_rate': '0.0004003', 'epoch': '0.4037'} | |
| {'loss': '3.246', 'grad_norm': '0.4784', 'learning_rate': '0.0003997', 'epoch': '0.4044'} | |
| {'loss': '3.237', 'grad_norm': '0.4521', 'learning_rate': '0.0003991', 'epoch': '0.405'} | |
| {'loss': '3.239', 'grad_norm': '0.456', 'learning_rate': '0.0003985', 'epoch': '0.4057'} | |
| {'loss': '3.239', 'grad_norm': '0.4769', 'learning_rate': '0.0003979', 'epoch': '0.4063'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.69it/s] | |
| {'loss': '3.237', 'grad_norm': '0.439', 'learning_rate': '0.0003973', 'epoch': '0.407'} | |
| {'loss': '3.239', 'grad_norm': '0.4867', 'learning_rate': '0.0003967', 'epoch': '0.4076'} | |
| {'loss': '3.24', 'grad_norm': '0.4515', 'learning_rate': '0.0003961', 'epoch': '0.4083'} | |
| {'loss': '3.242', 'grad_norm': '0.5506', 'learning_rate': '0.0003955', 'epoch': '0.4089'} | |
| {'loss': '3.234', 'grad_norm': '0.4816', 'learning_rate': '0.0003949', 'epoch': '0.4096'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5.12it/s] | |
| {'loss': '3.234', 'grad_norm': '0.4501', 'learning_rate': '0.0003943', 'epoch': '0.4103'} | |
| {'loss': '3.238', 'grad_norm': '0.4657', 'learning_rate': '0.0003937', 'epoch': '0.4109'} | |
| {'loss': '3.241', 'grad_norm': '0.4681', 'learning_rate': '0.0003931', 'epoch': '0.4116'} | |
| {'loss': '3.239', 'grad_norm': '0.4949', 'learning_rate': '0.0003925', 'epoch': '0.4122'} | |
| {'loss': '3.24', 'grad_norm': '0.4786', 'learning_rate': '0.0003919', 'epoch': '0.4129'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 2.70it/s] | |
| {'loss': '3.229', 'grad_norm': '0.4773', 'learning_rate': '0.0003913', 'epoch': '0.4135'} | |
| {'loss': '3.235', 'grad_norm': '0.4632', 'learning_rate': '0.0003907', 'epoch': '0.4142'} | |
| {'loss': '3.234', 'grad_norm': '0.5151', 'learning_rate': '0.0003901', 'epoch': '0.4148'} | |
| {'loss': '3.236', 'grad_norm': '0.4856', 'learning_rate': '0.0003895', 'epoch': '0.4155'} | |
| {'loss': '3.24', 'grad_norm': '0.4583', 'learning_rate': '0.0003889', 'epoch': '0.4162'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.90it/s] | |
| {'loss': '3.232', 'grad_norm': '0.4249', 'learning_rate': '0.0003883', 'epoch': '0.4168'} | |
| {'loss': '3.236', 'grad_norm': '0.4633', 'learning_rate': '0.0003877', 'epoch': '0.4175'} | |
| {'loss': '3.234', 'grad_norm': '0.4659', 'learning_rate': '0.0003871', 'epoch': '0.4181'} | |
| {'loss': '3.24', 'grad_norm': '0.4802', 'learning_rate': '0.0003865', 'epoch': '0.4188'} | |
| {'loss': '3.233', 'grad_norm': '0.4773', 'learning_rate': '0.0003859', 'epoch': '0.4194'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.32it/s] | |
| {'loss': '3.234', 'grad_norm': '0.4949', 'learning_rate': '0.0003853', 'epoch': '0.4201'} | |
| {'loss': '3.231', 'grad_norm': '0.4951', 'learning_rate': '0.0003847', 'epoch': '0.4207'} | |
| {'loss': '3.236', 'grad_norm': '0.4363', 'learning_rate': '0.0003841', 'epoch': '0.4214'} | |
| {'loss': '3.235', 'grad_norm': '0.4838', 'learning_rate': '0.0003835', 'epoch': '0.4221'} | |
| {'loss': '3.235', 'grad_norm': '0.4485', 'learning_rate': '0.0003829', 'epoch': '0.4227'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.52it/s] | |
| {'loss': '3.236', 'grad_norm': '0.5036', 'learning_rate': '0.0003823', 'epoch': '0.4234'} | |
| {'loss': '3.237', 'grad_norm': '0.4519', 'learning_rate': '0.0003817', 'epoch': '0.424'} | |
| {'loss': '3.233', 'grad_norm': '0.4726', 'learning_rate': '0.000381', 'epoch': '0.4247'} | |
| {'loss': '3.234', 'grad_norm': '0.4456', 'learning_rate': '0.0003804', 'epoch': '0.4253'} | |
| {'loss': '3.232', 'grad_norm': '0.4678', 'learning_rate': '0.0003798', 'epoch': '0.426'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.34it/s] | |
| {'loss': '3.237', 'grad_norm': '0.4973', 'learning_rate': '0.0003792', 'epoch': '0.4266'} | |
| {'loss': '3.224', 'grad_norm': '0.4893', 'learning_rate': '0.0003786', 'epoch': '0.4273'} | |
| {'loss': '3.232', 'grad_norm': '0.5094', 'learning_rate': '0.000378', 'epoch': '0.4279'} | |
| {'loss': '3.234', 'grad_norm': '0.4931', 'learning_rate': '0.0003774', 'epoch': '0.4286'} | |
| {'loss': '3.233', 'grad_norm': '0.4649', 'learning_rate': '0.0003768', 'epoch': '0.4293'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.69it/s] | |
| {'loss': '3.23', 'grad_norm': '0.4462', 'learning_rate': '0.0003762', 'epoch': '0.4299'} | |
| {'loss': '3.233', 'grad_norm': '0.4557', 'learning_rate': '0.0003756', 'epoch': '0.4306'} | |
| {'loss': '3.235', 'grad_norm': '0.4757', 'learning_rate': '0.000375', 'epoch': '0.4312'} | |
| {'loss': '3.237', 'grad_norm': '0.5045', 'learning_rate': '0.0003744', 'epoch': '0.4319'} | |
| {'loss': '3.231', 'grad_norm': '0.4782', 'learning_rate': '0.0003737', 'epoch': '0.4325'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.15it/s] | |
| {'loss': '3.234', 'grad_norm': '0.4713', 'learning_rate': '0.0003731', 'epoch': '0.4332'} | |
| {'loss': '3.229', 'grad_norm': '0.4605', 'learning_rate': '0.0003725', 'epoch': '0.4338'} | |
| {'loss': '3.233', 'grad_norm': '0.482', 'learning_rate': '0.0003719', 'epoch': '0.4345'} | |
| {'loss': '3.232', 'grad_norm': '0.4597', 'learning_rate': '0.0003713', 'epoch': '0.4352'} | |
| {'loss': '3.229', 'grad_norm': '0.4956', 'learning_rate': '0.0003707', 'epoch': '0.4358'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.45it/s] | |
| {'loss': '3.235', 'grad_norm': '0.4755', 'learning_rate': '0.0003701', 'epoch': '0.4365'} | |
| {'loss': '3.235', 'grad_norm': '0.4881', 'learning_rate': '0.0003695', 'epoch': '0.4371'} | |
| {'loss': '3.229', 'grad_norm': '0.4577', 'learning_rate': '0.0003688', 'epoch': '0.4378'} | |
| {'loss': '3.227', 'grad_norm': '0.4374', 'learning_rate': '0.0003682', 'epoch': '0.4384'} | |
| {'loss': '3.227', 'grad_norm': '0.4808', 'learning_rate': '0.0003676', 'epoch': '0.4391'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.71it/s] | |
| {'loss': '3.231', 'grad_norm': '0.4652', 'learning_rate': '0.000367', 'epoch': '0.4397'} | |
| {'loss': '3.23', 'grad_norm': '0.4981', 'learning_rate': '0.0003664', 'epoch': '0.4404'} | |
| {'loss': '3.228', 'grad_norm': '0.4604', 'learning_rate': '0.0003658', 'epoch': '0.4411'} | |
| {'loss': '3.227', 'grad_norm': '0.4543', 'learning_rate': '0.0003652', 'epoch': '0.4417'} | |
| {'loss': '3.225', 'grad_norm': '0.4936', 'learning_rate': '0.0003645', 'epoch': '0.4424'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.59it/s] | |
| {'loss': '3.22', 'grad_norm': '0.4366', 'learning_rate': '0.0003639', 'epoch': '0.443'} | |
| {'loss': '3.221', 'grad_norm': '0.4697', 'learning_rate': '0.0003633', 'epoch': '0.4437'} | |
| {'loss': '3.225', 'grad_norm': '0.5409', 'learning_rate': '0.0003627', 'epoch': '0.4443'} | |
| {'loss': '3.218', 'grad_norm': '0.4559', 'learning_rate': '0.0003621', 'epoch': '0.445'} | |
| {'loss': '3.229', 'grad_norm': '0.4595', 'learning_rate': '0.0003615', 'epoch': '0.4456'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.75it/s] | |
| {'loss': '3.227', 'grad_norm': '0.4986', 'learning_rate': '0.0003608', 'epoch': '0.4463'} | |
| {'loss': '3.222', 'grad_norm': '0.4858', 'learning_rate': '0.0003602', 'epoch': '0.447'} | |
| {'loss': '3.223', 'grad_norm': '0.5112', 'learning_rate': '0.0003596', 'epoch': '0.4476'} | |
| {'loss': '3.23', 'grad_norm': '0.4357', 'learning_rate': '0.000359', 'epoch': '0.4483'} | |
| {'loss': '3.227', 'grad_norm': '0.4464', 'learning_rate': '0.0003584', 'epoch': '0.4489'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.75it/s] | |
| {'loss': '3.228', 'grad_norm': '0.5098', 'learning_rate': '0.0003578', 'epoch': '0.4496'} | |
| {'loss': '3.232', 'grad_norm': '0.4553', 'learning_rate': '0.0003571', 'epoch': '0.4502'} | |
| {'loss': '3.226', 'grad_norm': '0.4563', 'learning_rate': '0.0003565', 'epoch': '0.4509'} | |
| {'loss': '3.227', 'grad_norm': '0.4658', 'learning_rate': '0.0003559', 'epoch': '0.4515'} | |
| {'loss': '3.231', 'grad_norm': '0.5049', 'learning_rate': '0.0003553', 'epoch': '0.4522'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.83it/s] | |
| {'loss': '3.223', 'grad_norm': '0.4598', 'learning_rate': '0.0003547', 'epoch': '0.4529'} | |
| {'loss': '3.223', 'grad_norm': '0.4721', 'learning_rate': '0.000354', 'epoch': '0.4535'} | |
| {'loss': '3.225', 'grad_norm': '0.4694', 'learning_rate': '0.0003534', 'epoch': '0.4542'} | |
| {'loss': '3.229', 'grad_norm': '0.4915', 'learning_rate': '0.0003528', 'epoch': '0.4548'} | |
| {'loss': '3.224', 'grad_norm': '0.4837', 'learning_rate': '0.0003522', 'epoch': '0.4555'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5.94it/s] | |
| {'loss': '3.224', 'grad_norm': '0.4628', 'learning_rate': '0.0003516', 'epoch': '0.4561'} | |
| {'loss': '3.221', 'grad_norm': '0.4819', 'learning_rate': '0.0003509', 'epoch': '0.4568'} | |
| {'loss': '3.228', 'grad_norm': '0.4556', 'learning_rate': '0.0003503', 'epoch': '0.4574'} | |
| {'loss': '3.221', 'grad_norm': '0.5048', 'learning_rate': '0.0003497', 'epoch': '0.4581'} | |
| {'loss': '3.222', 'grad_norm': '0.4743', 'learning_rate': '0.0003491', 'epoch': '0.4588'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.10it/s] | |
| {'loss': '3.22', 'grad_norm': '0.4812', 'learning_rate': '0.0003484', 'epoch': '0.4594'} | |
| {'loss': '3.22', 'grad_norm': '0.4922', 'learning_rate': '0.0003478', 'epoch': '0.4601'} | |
| {'loss': '3.22', 'grad_norm': '0.4928', 'learning_rate': '0.0003472', 'epoch': '0.4607'} | |
| {'loss': '3.226', 'grad_norm': '0.5083', 'learning_rate': '0.0003466', 'epoch': '0.4614'} | |
| {'loss': '3.223', 'grad_norm': '0.4728', 'learning_rate': '0.000346', 'epoch': '0.462'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.18it/s] | |
| {'loss': '3.224', 'grad_norm': '0.4942', 'learning_rate': '0.0003453', 'epoch': '0.4627'} | |
| {'loss': '3.222', 'grad_norm': '0.4995', 'learning_rate': '0.0003447', 'epoch': '0.4633'} | |
| {'loss': '3.222', 'grad_norm': '0.4649', 'learning_rate': '0.0003441', 'epoch': '0.464'} | |
| {'loss': '3.223', 'grad_norm': '0.4704', 'learning_rate': '0.0003435', 'epoch': '0.4646'} | |
| {'loss': '3.227', 'grad_norm': '0.47', 'learning_rate': '0.0003428', 'epoch': '0.4653'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5.24it/s] | |
| {'loss': '3.223', 'grad_norm': '0.4915', 'learning_rate': '0.0003422', 'epoch': '0.466'} | |
| {'loss': '3.22', 'grad_norm': '0.4776', 'learning_rate': '0.0003416', 'epoch': '0.4666'} | |
| {'loss': '3.221', 'grad_norm': '0.4778', 'learning_rate': '0.000341', 'epoch': '0.4673'} | |
| {'loss': '3.22', 'grad_norm': '0.5041', 'learning_rate': '0.0003403', 'epoch': '0.4679'} | |
| {'loss': '3.219', 'grad_norm': '0.4567', 'learning_rate': '0.0003397', 'epoch': '0.4686'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.32it/s] | |
| {'loss': '3.213', 'grad_norm': '0.4878', 'learning_rate': '0.0003391', 'epoch': '0.4692'} | |
| {'loss': '3.22', 'grad_norm': '0.4518', 'learning_rate': '0.0003385', 'epoch': '0.4699'} | |
| {'loss': '3.219', 'grad_norm': '0.4839', 'learning_rate': '0.0003378', 'epoch': '0.4705'} | |
| {'loss': '3.218', 'grad_norm': '0.4706', 'learning_rate': '0.0003372', 'epoch': '0.4712'} | |
| {'loss': '3.217', 'grad_norm': '0.4595', 'learning_rate': '0.0003366', 'epoch': '0.4719'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.86it/s] | |
| {'loss': '3.217', 'grad_norm': '0.5183', 'learning_rate': '0.000336', 'epoch': '0.4725'} | |
| {'loss': '3.221', 'grad_norm': '0.4675', 'learning_rate': '0.0003353', 'epoch': '0.4732'} | |
| {'loss': '3.218', 'grad_norm': '0.467', 'learning_rate': '0.0003347', 'epoch': '0.4738'} | |
| {'loss': '3.214', 'grad_norm': '0.478', 'learning_rate': '0.0003341', 'epoch': '0.4745'} | |
| {'loss': '3.218', 'grad_norm': '0.4651', 'learning_rate': '0.0003335', 'epoch': '0.4751'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.69it/s] | |
| {'loss': '3.216', 'grad_norm': '0.4744', 'learning_rate': '0.0003328', 'epoch': '0.4758'} | |
| {'loss': '3.215', 'grad_norm': '0.4911', 'learning_rate': '0.0003322', 'epoch': '0.4764'} | |
| {'loss': '3.217', 'grad_norm': '0.4939', 'learning_rate': '0.0003316', 'epoch': '0.4771'} | |
| {'loss': '3.216', 'grad_norm': '0.4626', 'learning_rate': '0.000331', 'epoch': '0.4778'} | |
| {'loss': '3.222', 'grad_norm': '0.4866', 'learning_rate': '0.0003303', 'epoch': '0.4784'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5.48it/s] | |
| {'loss': '3.219', 'grad_norm': '0.4775', 'learning_rate': '0.0003297', 'epoch': '0.4791'} | |
| {'loss': '3.218', 'grad_norm': '0.489', 'learning_rate': '0.0003291', 'epoch': '0.4797'} | |
| {'loss': '3.215', 'grad_norm': '0.4488', 'learning_rate': '0.0003285', 'epoch': '0.4804'} | |
| {'loss': '3.217', 'grad_norm': '0.4772', 'learning_rate': '0.0003278', 'epoch': '0.481'} | |
| {'loss': '3.218', 'grad_norm': '0.49', 'learning_rate': '0.0003272', 'epoch': '0.4817'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.48it/s] | |
| {'loss': '3.218', 'grad_norm': '0.4655', 'learning_rate': '0.0003266', 'epoch': '0.4823'} | |
| {'loss': '3.22', 'grad_norm': '0.469', 'learning_rate': '0.0003259', 'epoch': '0.483'} | |
| {'loss': '3.217', 'grad_norm': '0.4918', 'learning_rate': '0.0003253', 'epoch': '0.4837'} | |
| {'loss': '3.217', 'grad_norm': '0.4977', 'learning_rate': '0.0003247', 'epoch': '0.4843'} | |
| {'loss': '3.213', 'grad_norm': '0.4964', 'learning_rate': '0.0003241', 'epoch': '0.485'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.21it/s] | |
| {'loss': '3.218', 'grad_norm': '0.4903', 'learning_rate': '0.0003234', 'epoch': '0.4856'} | |
| {'loss': '3.215', 'grad_norm': '0.48', 'learning_rate': '0.0003228', 'epoch': '0.4863'} | |
| {'loss': '3.211', 'grad_norm': '0.4788', 'learning_rate': '0.0003222', 'epoch': '0.4869'} | |
| {'loss': '3.214', 'grad_norm': '0.4856', 'learning_rate': '0.0003215', 'epoch': '0.4876'} | |
| {'loss': '3.221', 'grad_norm': '0.4608', 'learning_rate': '0.0003209', 'epoch': '0.4882'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.48it/s] | |
| {'loss': '3.215', 'grad_norm': '0.4789', 'learning_rate': '0.0003203', 'epoch': '0.4889'} | |
| {'loss': '3.214', 'grad_norm': '0.4828', 'learning_rate': '0.0003197', 'epoch': '0.4896'} | |
| {'loss': '3.21', 'grad_norm': '0.4582', 'learning_rate': '0.000319', 'epoch': '0.4902'} | |
| {'loss': '3.21', 'grad_norm': '0.4786', 'learning_rate': '0.0003184', 'epoch': '0.4909'} | |
| {'loss': '3.214', 'grad_norm': '0.5587', 'learning_rate': '0.0003178', 'epoch': '0.4915'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.26it/s] | |
| {'loss': '3.214', 'grad_norm': '0.4929', 'learning_rate': '0.0003171', 'epoch': '0.4922'} | |
| {'loss': '3.21', 'grad_norm': '0.4704', 'learning_rate': '0.0003165', 'epoch': '0.4928'} | |
| {'loss': '3.21', 'grad_norm': '0.4733', 'learning_rate': '0.0003159', 'epoch': '0.4935'} | |
| {'loss': '3.208', 'grad_norm': '0.4682', 'learning_rate': '0.0003153', 'epoch': '0.4941'} | |
| {'loss': '3.207', 'grad_norm': '0.5116', 'learning_rate': '0.0003146', 'epoch': '0.4948'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.42it/s] | |
| {'loss': '3.211', 'grad_norm': '0.5282', 'learning_rate': '0.000314', 'epoch': '0.4955'} | |
| {'loss': '3.206', 'grad_norm': '0.4986', 'learning_rate': '0.0003134', 'epoch': '0.4961'} | |
| {'loss': '3.211', 'grad_norm': '0.4897', 'learning_rate': '0.0003127', 'epoch': '0.4968'} | |
| {'loss': '3.208', 'grad_norm': '0.4994', 'learning_rate': '0.0003121', 'epoch': '0.4974'} | |
| {'loss': '3.213', 'grad_norm': '0.501', 'learning_rate': '0.0003115', 'epoch': '0.4981'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.42it/s] | |
| {'loss': '3.211', 'grad_norm': '0.5263', 'learning_rate': '0.0003108', 'epoch': '0.4987'} | |
| {'loss': '3.207', 'grad_norm': '0.4611', 'learning_rate': '0.0003102', 'epoch': '0.4994'} | |
| {'loss': '3.211', 'grad_norm': '0.5102', 'learning_rate': '0.0003096', 'epoch': '0.5'} | |
| {'loss': '3.212', 'grad_norm': '0.4581', 'learning_rate': '0.000309', 'epoch': '0.5007'} | |
| {'loss': '3.208', 'grad_norm': '0.4724', 'learning_rate': '0.0003083', 'epoch': '0.5014'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.69it/s] | |
| {'loss': '3.21', 'grad_norm': '0.4635', 'learning_rate': '0.0003077', 'epoch': '0.502'} | |
| {'loss': '3.21', 'grad_norm': '0.4868', 'learning_rate': '0.0003071', 'epoch': '0.5027'} | |
| {'loss': '3.207', 'grad_norm': '0.4789', 'learning_rate': '0.0003064', 'epoch': '0.5033'} | |
| {'loss': '3.209', 'grad_norm': '0.4851', 'learning_rate': '0.0003058', 'epoch': '0.504'} | |
| {'loss': '3.211', 'grad_norm': '0.4733', 'learning_rate': '0.0003052', 'epoch': '0.5046'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.46it/s] | |
| {'loss': '3.2', 'grad_norm': '0.5023', 'learning_rate': '0.0003045', 'epoch': '0.5053'} | |
| {'loss': '3.206', 'grad_norm': '0.4644', 'learning_rate': '0.0003039', 'epoch': '0.5059'} | |
| {'loss': '3.205', 'grad_norm': '0.4844', 'learning_rate': '0.0003033', 'epoch': '0.5066'} | |
| {'loss': '3.205', 'grad_norm': '0.4935', 'learning_rate': '0.0003027', 'epoch': '0.5072'} | |
| {'loss': '3.206', 'grad_norm': '0.5377', 'learning_rate': '0.000302', 'epoch': '0.5079'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.63it/s] | |
| {'loss': '3.213', 'grad_norm': '0.4874', 'learning_rate': '0.0003014', 'epoch': '0.5086'} | |
| {'loss': '3.207', 'grad_norm': '0.4904', 'learning_rate': '0.0003008', 'epoch': '0.5092'} | |
| {'loss': '3.202', 'grad_norm': '0.4684', 'learning_rate': '0.0003001', 'epoch': '0.5099'} | |
| {'loss': '3.212', 'grad_norm': '0.5016', 'learning_rate': '0.0002995', 'epoch': '0.5105'} | |
| {'loss': '3.203', 'grad_norm': '0.5012', 'learning_rate': '0.0002989', 'epoch': '0.5112'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.16it/s] | |
| {'loss': '3.206', 'grad_norm': '0.5526', 'learning_rate': '0.0002982', 'epoch': '0.5118'} | |
| {'loss': '3.207', 'grad_norm': '0.4914', 'learning_rate': '0.0002976', 'epoch': '0.5125'} | |
| {'loss': '3.207', 'grad_norm': '0.4688', 'learning_rate': '0.000297', 'epoch': '0.5131'} | |
| {'loss': '3.2', 'grad_norm': '0.4405', 'learning_rate': '0.0002964', 'epoch': '0.5138'} | |
| {'loss': '3.203', 'grad_norm': '0.4888', 'learning_rate': '0.0002957', 'epoch': '0.5145'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.73it/s] | |
| {'loss': '3.206', 'grad_norm': '0.5016', 'learning_rate': '0.0002951', 'epoch': '0.5151'} | |
| {'loss': '3.205', 'grad_norm': '0.4752', 'learning_rate': '0.0002945', 'epoch': '0.5158'} | |
| {'loss': '3.208', 'grad_norm': '0.4911', 'learning_rate': '0.0002938', 'epoch': '0.5164'} | |
| {'loss': '3.198', 'grad_norm': '0.5026', 'learning_rate': '0.0002932', 'epoch': '0.5171'} | |
| {'loss': '3.199', 'grad_norm': '0.5133', 'learning_rate': '0.0002926', 'epoch': '0.5177'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5.49it/s] | |
| {'loss': '3.196', 'grad_norm': '0.506', 'learning_rate': '0.0002919', 'epoch': '0.5184'} | |
| {'loss': '3.198', 'grad_norm': '0.4965', 'learning_rate': '0.0002913', 'epoch': '0.519'} | |
| {'loss': '3.203', 'grad_norm': '0.5265', 'learning_rate': '0.0002907', 'epoch': '0.5197'} | |
| {'loss': '3.204', 'grad_norm': '0.4985', 'learning_rate': '0.00029', 'epoch': '0.5204'} | |
| {'loss': '3.201', 'grad_norm': '0.5248', 'learning_rate': '0.0002894', 'epoch': '0.521'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.00it/s] | |
| {'loss': '3.197', 'grad_norm': '0.5066', 'learning_rate': '0.0002888', 'epoch': '0.5217'} | |
| {'loss': '3.192', 'grad_norm': '0.4923', 'learning_rate': '0.0002882', 'epoch': '0.5223'} | |
| {'loss': '3.206', 'grad_norm': '0.5089', 'learning_rate': '0.0002875', 'epoch': '0.523'} | |
| {'loss': '3.202', 'grad_norm': '0.5294', 'learning_rate': '0.0002869', 'epoch': '0.5236'} | |
| {'loss': '3.198', 'grad_norm': '0.5118', 'learning_rate': '0.0002863', 'epoch': '0.5243'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.23it/s] | |
| {'loss': '3.203', 'grad_norm': '0.4945', 'learning_rate': '0.0002856', 'epoch': '0.5249'} | |
| {'loss': '3.205', 'grad_norm': '0.4752', 'learning_rate': '0.000285', 'epoch': '0.5256'} | |
| {'loss': '3.2', 'grad_norm': '0.4892', 'learning_rate': '0.0002844', 'epoch': '0.5263'} | |
| {'loss': '3.201', 'grad_norm': '0.4925', 'learning_rate': '0.0002838', 'epoch': '0.5269'} | |
| {'loss': '3.196', 'grad_norm': '0.5137', 'learning_rate': '0.0002831', 'epoch': '0.5276'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.96it/s] | |
| {'loss': '3.197', 'grad_norm': '0.48', 'learning_rate': '0.0002825', 'epoch': '0.5282'} | |
| {'loss': '3.194', 'grad_norm': '0.5007', 'learning_rate': '0.0002819', 'epoch': '0.5289'} | |
| {'loss': '3.201', 'grad_norm': '0.4906', 'learning_rate': '0.0002812', 'epoch': '0.5295'} | |
| {'loss': '3.202', 'grad_norm': '0.4941', 'learning_rate': '0.0002806', 'epoch': '0.5302'} | |
| {'loss': '3.202', 'grad_norm': '0.5366', 'learning_rate': '0.00028', 'epoch': '0.5308'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.70it/s] | |
| {'loss': '3.202', 'grad_norm': '0.4959', 'learning_rate': '0.0002793', 'epoch': '0.5315'} | |
| {'loss': '3.196', 'grad_norm': '0.4736', 'learning_rate': '0.0002787', 'epoch': '0.5322'} | |
| {'loss': '3.202', 'grad_norm': '0.4696', 'learning_rate': '0.0002781', 'epoch': '0.5328'} | |
| {'loss': '3.197', 'grad_norm': '0.517', 'learning_rate': '0.0002775', 'epoch': '0.5335'} | |
| {'loss': '3.197', 'grad_norm': '0.5044', 'learning_rate': '0.0002768', 'epoch': '0.5341'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.56it/s] | |
| {'loss': '3.194', 'grad_norm': '0.4654', 'learning_rate': '0.0002762', 'epoch': '0.5348'} | |
| {'loss': '3.196', 'grad_norm': '0.5031', 'learning_rate': '0.0002756', 'epoch': '0.5354'} | |
| {'loss': '3.197', 'grad_norm': '0.4724', 'learning_rate': '0.000275', 'epoch': '0.5361'} | |
| {'loss': '3.194', 'grad_norm': '0.5004', 'learning_rate': '0.0002743', 'epoch': '0.5367'} | |
| {'loss': '3.193', 'grad_norm': '0.5276', 'learning_rate': '0.0002737', 'epoch': '0.5374'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.06it/s] | |
| {'loss': '3.2', 'grad_norm': '0.4736', 'learning_rate': '0.0002731', 'epoch': '0.5381'} | |
| {'loss': '3.195', 'grad_norm': '0.4756', 'learning_rate': '0.0002724', 'epoch': '0.5387'} | |
| {'loss': '3.196', 'grad_norm': '0.4828', 'learning_rate': '0.0002718', 'epoch': '0.5394'} | |
| {'loss': '3.196', 'grad_norm': '0.5102', 'learning_rate': '0.0002712', 'epoch': '0.54'} | |
| {'loss': '3.195', 'grad_norm': '0.5154', 'learning_rate': '0.0002706', 'epoch': '0.5407'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5.25it/s] | |
| {'loss': '3.194', 'grad_norm': '0.5212', 'learning_rate': '0.0002699', 'epoch': '0.5413'} | |
| {'loss': '3.194', 'grad_norm': '0.5162', 'learning_rate': '0.0002693', 'epoch': '0.542'} | |
| {'loss': '3.189', 'grad_norm': '0.4899', 'learning_rate': '0.0002687', 'epoch': '0.5426'} | |
| {'loss': '3.199', 'grad_norm': '0.5448', 'learning_rate': '0.000268', 'epoch': '0.5433'} | |
| {'loss': '3.194', 'grad_norm': '0.4566', 'learning_rate': '0.0002674', 'epoch': '0.5439'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.27it/s] | |
| {'loss': '3.191', 'grad_norm': '0.4878', 'learning_rate': '0.0002668', 'epoch': '0.5446'} | |
| {'loss': '3.191', 'grad_norm': '0.4991', 'learning_rate': '0.0002662', 'epoch': '0.5453'} | |
| {'loss': '3.196', 'grad_norm': '0.4875', 'learning_rate': '0.0002655', 'epoch': '0.5459'} | |
| {'loss': '3.188', 'grad_norm': '0.4875', 'learning_rate': '0.0002649', 'epoch': '0.5466'} | |
| {'loss': '3.194', 'grad_norm': '0.4746', 'learning_rate': '0.0002643', 'epoch': '0.5472'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.18it/s] | |
| {'loss': '3.19', 'grad_norm': '0.5066', 'learning_rate': '0.0002637', 'epoch': '0.5479'} | |
| {'loss': '3.193', 'grad_norm': '0.5393', 'learning_rate': '0.000263', 'epoch': '0.5485'} | |
| {'loss': '3.193', 'grad_norm': '0.4919', 'learning_rate': '0.0002624', 'epoch': '0.5492'} | |
| {'loss': '3.192', 'grad_norm': '0.493', 'learning_rate': '0.0002618', 'epoch': '0.5498'} | |
| {'loss': '3.194', 'grad_norm': '0.5546', 'learning_rate': '0.0002612', 'epoch': '0.5505'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 1.90it/s] | |
| {'loss': '3.191', 'grad_norm': '0.4898', 'learning_rate': '0.0002605', 'epoch': '0.5512'} | |
| {'loss': '3.193', 'grad_norm': '0.4823', 'learning_rate': '0.0002599', 'epoch': '0.5518'} | |
| {'loss': '3.2', 'grad_norm': '0.5015', 'learning_rate': '0.0002593', 'epoch': '0.5525'} | |
| {'loss': '3.184', 'grad_norm': '0.49', 'learning_rate': '0.0002587', 'epoch': '0.5531'} | |
| {'loss': '3.187', 'grad_norm': '0.4773', 'learning_rate': '0.000258', 'epoch': '0.5538'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.87it/s] | |
| {'loss': '3.194', 'grad_norm': '0.4765', 'learning_rate': '0.0002574', 'epoch': '0.5544'} | |
| {'loss': '3.191', 'grad_norm': '0.5152', 'learning_rate': '0.0002568', 'epoch': '0.5551'} | |
| {'loss': '3.192', 'grad_norm': '0.476', 'learning_rate': '0.0002562', 'epoch': '0.5557'} | |
| {'loss': '3.189', 'grad_norm': '0.5036', 'learning_rate': '0.0002555', 'epoch': '0.5564'} | |
| {'loss': '3.187', 'grad_norm': '0.5126', 'learning_rate': '0.0002549', 'epoch': '0.5571'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.69it/s] | |
| {'loss': '3.188', 'grad_norm': '0.476', 'learning_rate': '0.0002543', 'epoch': '0.5577'} | |
| {'loss': '3.194', 'grad_norm': '0.5298', 'learning_rate': '0.0002537', 'epoch': '0.5584'} | |
| {'loss': '3.192', 'grad_norm': '0.5278', 'learning_rate': '0.0002531', 'epoch': '0.559'} | |
| {'loss': '3.195', 'grad_norm': '0.502', 'learning_rate': '0.0002524', 'epoch': '0.5597'} | |
| {'loss': '3.191', 'grad_norm': '0.5166', 'learning_rate': '0.0002518', 'epoch': '0.5603'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.31it/s] | |
| {'loss': '3.191', 'grad_norm': '0.5105', 'learning_rate': '0.0002512', 'epoch': '0.561'} | |
| {'loss': '3.19', 'grad_norm': '0.4926', 'learning_rate': '0.0002506', 'epoch': '0.5616'} | |
| {'loss': '3.19', 'grad_norm': '0.5134', 'learning_rate': '0.0002499', 'epoch': '0.5623'} | |
| {'loss': '3.186', 'grad_norm': '0.4979', 'learning_rate': '0.0002493', 'epoch': '0.563'} | |
| {'loss': '3.191', 'grad_norm': '0.4789', 'learning_rate': '0.0002487', 'epoch': '0.5636'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.44it/s] | |
| {'loss': '3.189', 'grad_norm': '0.4892', 'learning_rate': '0.0002481', 'epoch': '0.5643'} | |
| {'loss': '3.184', 'grad_norm': '0.5408', 'learning_rate': '0.0002475', 'epoch': '0.5649'} | |
| {'loss': '3.191', 'grad_norm': '0.5111', 'learning_rate': '0.0002468', 'epoch': '0.5656'} | |
| {'loss': '3.185', 'grad_norm': '0.5074', 'learning_rate': '0.0002462', 'epoch': '0.5662'} | |
| {'loss': '3.186', 'grad_norm': '0.508', 'learning_rate': '0.0002456', 'epoch': '0.5669'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.04it/s] | |
| {'loss': '3.182', 'grad_norm': '0.4808', 'learning_rate': '0.000245', 'epoch': '0.5675'} | |
| {'loss': '3.186', 'grad_norm': '0.5043', 'learning_rate': '0.0002444', 'epoch': '0.5682'} | |
| {'loss': '3.186', 'grad_norm': '0.5155', 'learning_rate': '0.0002437', 'epoch': '0.5689'} | |
| {'loss': '3.18', 'grad_norm': '0.5322', 'learning_rate': '0.0002431', 'epoch': '0.5695'} | |
| {'loss': '3.181', 'grad_norm': '0.5248', 'learning_rate': '0.0002425', 'epoch': '0.5702'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.53it/s] | |
| {'loss': '3.183', 'grad_norm': '0.4909', 'learning_rate': '0.0002419', 'epoch': '0.5708'} | |
| {'loss': '3.182', 'grad_norm': '0.49', 'learning_rate': '0.0002413', 'epoch': '0.5715'} | |
| {'loss': '3.182', 'grad_norm': '0.5144', 'learning_rate': '0.0002407', 'epoch': '0.5721'} | |
| {'loss': '3.19', 'grad_norm': '0.4774', 'learning_rate': '0.00024', 'epoch': '0.5728'} | |
| {'loss': '3.182', 'grad_norm': '0.5148', 'learning_rate': '0.0002394', 'epoch': '0.5734'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.58it/s] | |
| {'loss': '3.186', 'grad_norm': '0.5123', 'learning_rate': '0.0002388', 'epoch': '0.5741'} | |
| {'loss': '3.183', 'grad_norm': '0.4903', 'learning_rate': '0.0002382', 'epoch': '0.5748'} | |
| {'loss': '3.184', 'grad_norm': '0.5072', 'learning_rate': '0.0002376', 'epoch': '0.5754'} | |
| {'loss': '3.187', 'grad_norm': '0.475', 'learning_rate': '0.0002369', 'epoch': '0.5761'} | |
| {'loss': '3.183', 'grad_norm': '0.5192', 'learning_rate': '0.0002363', 'epoch': '0.5767'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.63it/s] | |
| {'loss': '3.183', 'grad_norm': '0.5417', 'learning_rate': '0.0002357', 'epoch': '0.5774'} | |
| {'loss': '3.187', 'grad_norm': '0.5046', 'learning_rate': '0.0002351', 'epoch': '0.578'} | |
| {'loss': '3.182', 'grad_norm': '0.512', 'learning_rate': '0.0002345', 'epoch': '0.5787'} | |
| {'loss': '3.182', 'grad_norm': '0.5204', 'learning_rate': '0.0002339', 'epoch': '0.5793'} | |
| {'loss': '3.18', 'grad_norm': '0.5054', 'learning_rate': '0.0002333', 'epoch': '0.58'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.64it/s] | |
| {'loss': '3.185', 'grad_norm': '0.4974', 'learning_rate': '0.0002326', 'epoch': '0.5806'} | |
| {'loss': '3.182', 'grad_norm': '0.5186', 'learning_rate': '0.000232', 'epoch': '0.5813'} | |
| {'loss': '3.176', 'grad_norm': '0.4747', 'learning_rate': '0.0002314', 'epoch': '0.582'} | |
| {'loss': '3.179', 'grad_norm': '0.517', 'learning_rate': '0.0002308', 'epoch': '0.5826'} | |
| {'loss': '3.178', 'grad_norm': '0.5182', 'learning_rate': '0.0002302', 'epoch': '0.5833'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.11it/s] | |
| {'loss': '3.173', 'grad_norm': '0.5144', 'learning_rate': '0.0002296', 'epoch': '0.5839'} | |
| {'loss': '3.178', 'grad_norm': '0.5407', 'learning_rate': '0.000229', 'epoch': '0.5846'} | |
| {'loss': '3.179', 'grad_norm': '0.504', 'learning_rate': '0.0002284', 'epoch': '0.5852'} | |
| {'loss': '3.181', 'grad_norm': '0.5402', 'learning_rate': '0.0002277', 'epoch': '0.5859'} | |
| {'loss': '3.182', 'grad_norm': '0.5216', 'learning_rate': '0.0002271', 'epoch': '0.5865'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5.21it/s] | |
| {'loss': '3.174', 'grad_norm': '0.4905', 'learning_rate': '0.0002265', 'epoch': '0.5872'} | |
| {'loss': '3.178', 'grad_norm': '0.5149', 'learning_rate': '0.0002259', 'epoch': '0.5879'} | |
| {'loss': '3.179', 'grad_norm': '0.5395', 'learning_rate': '0.0002253', 'epoch': '0.5885'} | |
| {'loss': '3.179', 'grad_norm': '0.5102', 'learning_rate': '0.0002247', 'epoch': '0.5892'} | |
| {'loss': '3.179', 'grad_norm': '0.5097', 'learning_rate': '0.0002241', 'epoch': '0.5898'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.08it/s] | |
| {'loss': '3.18', 'grad_norm': '0.5185', 'learning_rate': '0.0002235', 'epoch': '0.5905'} | |
| {'loss': '3.173', 'grad_norm': '0.5301', 'learning_rate': '0.0002229', 'epoch': '0.5911'} | |
| {'loss': '3.18', 'grad_norm': '0.5129', 'learning_rate': '0.0002222', 'epoch': '0.5918'} | |
| {'loss': '3.173', 'grad_norm': '0.4803', 'learning_rate': '0.0002216', 'epoch': '0.5924'} | |
| {'loss': '3.173', 'grad_norm': '0.5046', 'learning_rate': '0.000221', 'epoch': '0.5931'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.36it/s] | |
| {'loss': '3.174', 'grad_norm': '0.525', 'learning_rate': '0.0002204', 'epoch': '0.5938'} | |
| {'loss': '3.178', 'grad_norm': '0.539', 'learning_rate': '0.0002198', 'epoch': '0.5944'} | |
| {'loss': '3.167', 'grad_norm': '0.5354', 'learning_rate': '0.0002192', 'epoch': '0.5951'} | |
| {'loss': '3.181', 'grad_norm': '0.5387', 'learning_rate': '0.0002186', 'epoch': '0.5957'} | |
| {'loss': '3.176', 'grad_norm': '0.5296', 'learning_rate': '0.000218', 'epoch': '0.5964'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.48it/s] | |
| {'loss': '3.178', 'grad_norm': '0.5171', 'learning_rate': '0.0002174', 'epoch': '0.597'} | |
| {'loss': '3.176', 'grad_norm': '0.5112', 'learning_rate': '0.0002168', 'epoch': '0.5977'} | |
| {'loss': '3.179', 'grad_norm': '0.5747', 'learning_rate': '0.0002162', 'epoch': '0.5983'} | |
| {'loss': '3.17', 'grad_norm': '0.5282', 'learning_rate': '0.0002156', 'epoch': '0.599'} | |
| {'loss': '3.176', 'grad_norm': '0.5876', 'learning_rate': '0.000215', 'epoch': '0.5997'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.66it/s] | |
| {'loss': '3.171', 'grad_norm': '0.5339', 'learning_rate': '0.0002144', 'epoch': '0.6003'} | |
| {'loss': '3.174', 'grad_norm': '0.5226', 'learning_rate': '0.0002138', 'epoch': '0.601'} | |
| {'loss': '3.174', 'grad_norm': '0.5338', 'learning_rate': '0.0002132', 'epoch': '0.6016'} | |
| {'loss': '3.173', 'grad_norm': '0.5094', 'learning_rate': '0.0002126', 'epoch': '0.6023'} | |
| {'loss': '3.176', 'grad_norm': '0.5306', 'learning_rate': '0.000212', 'epoch': '0.6029'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.95it/s] | |
| {'loss': '3.175', 'grad_norm': '0.5294', 'learning_rate': '0.0002113', 'epoch': '0.6036'} | |
| {'loss': '3.173', 'grad_norm': '0.5798', 'learning_rate': '0.0002107', 'epoch': '0.6042'} | |
| {'loss': '3.175', 'grad_norm': '0.5204', 'learning_rate': '0.0002101', 'epoch': '0.6049'} | |
| {'loss': '3.172', 'grad_norm': '1.071', 'learning_rate': '0.0002095', 'epoch': '0.6056'} | |
| {'loss': '3.173', 'grad_norm': '0.5092', 'learning_rate': '0.0002089', 'epoch': '0.6062'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.48it/s] | |
| {'loss': '3.168', 'grad_norm': '0.5041', 'learning_rate': '0.0002083', 'epoch': '0.6069'} | |
| {'loss': '3.169', 'grad_norm': '0.504', 'learning_rate': '0.0002077', 'epoch': '0.6075'} | |
| {'loss': '3.17', 'grad_norm': '0.4974', 'learning_rate': '0.0002071', 'epoch': '0.6082'} | |
| {'loss': '3.177', 'grad_norm': '0.5456', 'learning_rate': '0.0002065', 'epoch': '0.6088'} | |
| {'loss': '3.17', 'grad_norm': '0.5426', 'learning_rate': '0.0002059', 'epoch': '0.6095'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5.67it/s] | |
| {'loss': '3.164', 'grad_norm': '0.5177', 'learning_rate': '0.0002053', 'epoch': '0.6101'} | |
| {'loss': '3.173', 'grad_norm': '0.5066', 'learning_rate': '0.0002047', 'epoch': '0.6108'} | |
| {'loss': '3.163', 'grad_norm': '0.5179', 'learning_rate': '0.0002042', 'epoch': '0.6115'} | |
| {'loss': '3.173', 'grad_norm': '0.5473', 'learning_rate': '0.0002036', 'epoch': '0.6121'} | |
| {'loss': '3.168', 'grad_norm': '0.5406', 'learning_rate': '0.000203', 'epoch': '0.6128'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.45it/s] | |
| {'loss': '3.174', 'grad_norm': '0.5623', 'learning_rate': '0.0002024', 'epoch': '0.6134'} | |
| {'loss': '3.176', 'grad_norm': '0.5514', 'learning_rate': '0.0002018', 'epoch': '0.6141'} | |
| {'loss': '3.17', 'grad_norm': '0.508', 'learning_rate': '0.0002012', 'epoch': '0.6147'} | |
| {'loss': '3.169', 'grad_norm': '0.5315', 'learning_rate': '0.0002006', 'epoch': '0.6154'} | |
| {'loss': '3.17', 'grad_norm': '0.5205', 'learning_rate': '0.0002', 'epoch': '0.616'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.73it/s] | |
| {'loss': '3.167', 'grad_norm': '0.5443', 'learning_rate': '0.0001994', 'epoch': '0.6167'} | |
| {'loss': '3.17', 'grad_norm': '0.5174', 'learning_rate': '0.0001988', 'epoch': '0.6173'} | |
| {'loss': '3.17', 'grad_norm': '0.5217', 'learning_rate': '0.0001982', 'epoch': '0.618'} | |
| {'loss': '3.168', 'grad_norm': '0.5222', 'learning_rate': '0.0001976', 'epoch': '0.6187'} | |
| {'loss': '3.17', 'grad_norm': '0.5357', 'learning_rate': '0.000197', 'epoch': '0.6193'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5.18it/s] | |
| {'loss': '3.159', 'grad_norm': '0.5494', 'learning_rate': '0.0001964', 'epoch': '0.62'} | |
| {'loss': '3.166', 'grad_norm': '0.5646', 'learning_rate': '0.0001958', 'epoch': '0.6206'} | |
| {'loss': '3.173', 'grad_norm': '0.5311', 'learning_rate': '0.0001952', 'epoch': '0.6213'} | |
| {'loss': '3.163', 'grad_norm': '0.5358', 'learning_rate': '0.0001947', 'epoch': '0.6219'} | |
| {'loss': '3.168', 'grad_norm': '0.5401', 'learning_rate': '0.0001941', 'epoch': '0.6226'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.14it/s] | |
| {'loss': '3.169', 'grad_norm': '0.5432', 'learning_rate': '0.0001935', 'epoch': '0.6232'} | |
| {'loss': '3.17', 'grad_norm': '0.5396', 'learning_rate': '0.0001929', 'epoch': '0.6239'} | |
| {'loss': '3.162', 'grad_norm': '0.5662', 'learning_rate': '0.0001923', 'epoch': '0.6246'} | |
| {'loss': '3.163', 'grad_norm': '0.5535', 'learning_rate': '0.0001917', 'epoch': '0.6252'} | |
| {'loss': '3.166', 'grad_norm': '0.5391', 'learning_rate': '0.0001911', 'epoch': '0.6259'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.23it/s] | |
| {'loss': '3.166', 'grad_norm': '0.5118', 'learning_rate': '0.0001905', 'epoch': '0.6265'} | |
| {'loss': '3.163', 'grad_norm': '0.5524', 'learning_rate': '0.0001899', 'epoch': '0.6272'} | |
| {'loss': '3.163', 'grad_norm': '0.5239', 'learning_rate': '0.0001894', 'epoch': '0.6278'} | |
| {'loss': '3.166', 'grad_norm': '0.5643', 'learning_rate': '0.0001888', 'epoch': '0.6285'} | |
| {'loss': '3.168', 'grad_norm': '0.5382', 'learning_rate': '0.0001882', 'epoch': '0.6291'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5.24it/s] | |
| {'loss': '3.162', 'grad_norm': '0.5235', 'learning_rate': '0.0001876', 'epoch': '0.6298'} | |
| {'loss': '3.163', 'grad_norm': '0.5406', 'learning_rate': '0.000187', 'epoch': '0.6305'} | |
| {'loss': '3.163', 'grad_norm': '0.5208', 'learning_rate': '0.0001864', 'epoch': '0.6311'} | |
| {'loss': '3.163', 'grad_norm': '0.5627', 'learning_rate': '0.0001859', 'epoch': '0.6318'} | |
| {'loss': '3.167', 'grad_norm': '0.5496', 'learning_rate': '0.0001853', 'epoch': '0.6324'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.25it/s] | |
| {'loss': '3.159', 'grad_norm': '0.5695', 'learning_rate': '0.0001847', 'epoch': '0.6331'} | |
| {'loss': '3.165', 'grad_norm': '0.543', 'learning_rate': '0.0001841', 'epoch': '0.6337'} | |
| {'loss': '3.168', 'grad_norm': '0.5322', 'learning_rate': '0.0001835', 'epoch': '0.6344'} | |
| {'loss': '3.164', 'grad_norm': '0.5374', 'learning_rate': '0.0001829', 'epoch': '0.635'} | |
| {'loss': '3.162', 'grad_norm': '0.5852', 'learning_rate': '0.0001824', 'epoch': '0.6357'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.02it/s] | |
| {'loss': '3.166', 'grad_norm': '0.5389', 'learning_rate': '0.0001818', 'epoch': '0.6364'} | |
| {'loss': '3.163', 'grad_norm': '0.5538', 'learning_rate': '0.0001812', 'epoch': '0.637'} | |
| {'loss': '3.16', 'grad_norm': '0.5125', 'learning_rate': '0.0001806', 'epoch': '0.6377'} | |
| {'loss': '3.164', 'grad_norm': '0.5276', 'learning_rate': '0.0001801', 'epoch': '0.6383'} | |
| {'loss': '3.161', 'grad_norm': '0.5108', 'learning_rate': '0.0001795', 'epoch': '0.639'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.64it/s] | |
| {'loss': '3.159', 'grad_norm': '0.56', 'learning_rate': '0.0001789', 'epoch': '0.6396'} | |
| {'loss': '3.16', 'grad_norm': '0.521', 'learning_rate': '0.0001783', 'epoch': '0.6403'} | |
| {'loss': '3.165', 'grad_norm': '0.5607', 'learning_rate': '0.0001777', 'epoch': '0.6409'} | |
| {'loss': '3.16', 'grad_norm': '0.5172', 'learning_rate': '0.0001772', 'epoch': '0.6416'} | |
| {'loss': '3.159', 'grad_norm': '0.549', 'learning_rate': '0.0001766', 'epoch': '0.6423'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.81it/s] | |
| {'loss': '3.166', 'grad_norm': '0.5457', 'learning_rate': '0.000176', 'epoch': '0.6429'} | |
| {'loss': '3.165', 'grad_norm': '0.5586', 'learning_rate': '0.0001754', 'epoch': '0.6436'} | |
| {'loss': '3.155', 'grad_norm': '0.5298', 'learning_rate': '0.0001749', 'epoch': '0.6442'} | |
| {'loss': '3.157', 'grad_norm': '0.5386', 'learning_rate': '0.0001743', 'epoch': '0.6449'} | |
| {'loss': '3.16', 'grad_norm': '0.5544', 'learning_rate': '0.0001737', 'epoch': '0.6455'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.89it/s] | |
| {'loss': '3.156', 'grad_norm': '0.5515', 'learning_rate': '0.0001732', 'epoch': '0.6462'} | |
| {'loss': '3.161', 'grad_norm': '0.5372', 'learning_rate': '0.0001726', 'epoch': '0.6468'} | |
| {'loss': '3.152', 'grad_norm': '0.558', 'learning_rate': '0.000172', 'epoch': '0.6475'} | |
| {'loss': '3.151', 'grad_norm': '0.5763', 'learning_rate': '0.0001714', 'epoch': '0.6482'} | |
| {'loss': '3.155', 'grad_norm': '0.5592', 'learning_rate': '0.0001709', 'epoch': '0.6488'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.64it/s] | |
| {'loss': '3.157', 'grad_norm': '0.5608', 'learning_rate': '0.0001703', 'epoch': '0.6495'} | |
| {'loss': '3.153', 'grad_norm': '0.5551', 'learning_rate': '0.0001697', 'epoch': '0.6501'} | |
| {'loss': '3.156', 'grad_norm': '0.5657', 'learning_rate': '0.0001692', 'epoch': '0.6508'} | |
| {'loss': '3.156', 'grad_norm': '0.5577', 'learning_rate': '0.0001686', 'epoch': '0.6514'} | |
| {'loss': '3.15', 'grad_norm': '0.602', 'learning_rate': '0.000168', 'epoch': '0.6521'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5.68it/s] | |
| {'loss': '3.157', 'grad_norm': '0.5451', 'learning_rate': '0.0001675', 'epoch': '0.6527'} | |
| {'loss': '3.153', 'grad_norm': '0.6076', 'learning_rate': '0.0001669', 'epoch': '0.6534'} | |
| {'loss': '3.156', 'grad_norm': '0.5373', 'learning_rate': '0.0001663', 'epoch': '0.654'} | |
| {'loss': '3.153', 'grad_norm': '0.5865', 'learning_rate': '0.0001658', 'epoch': '0.6547'} | |
| {'loss': '3.151', 'grad_norm': '0.5465', 'learning_rate': '0.0001652', 'epoch': '0.6554'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.42it/s] | |
| {'loss': '3.153', 'grad_norm': '0.5427', 'learning_rate': '0.0001647', 'epoch': '0.656'} | |
| {'loss': '3.156', 'grad_norm': '0.6229', 'learning_rate': '0.0001641', 'epoch': '0.6567'} | |
| {'loss': '3.155', 'grad_norm': '0.6797', 'learning_rate': '0.0001635', 'epoch': '0.6573'} | |
| {'loss': '3.151', 'grad_norm': '0.5807', 'learning_rate': '0.000163', 'epoch': '0.658'} | |
| {'loss': '3.151', 'grad_norm': '0.5284', 'learning_rate': '0.0001624', 'epoch': '0.6586'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.82it/s] | |
| {'loss': '3.153', 'grad_norm': '0.5532', 'learning_rate': '0.0001618', 'epoch': '0.6593'} | |
| {'loss': '3.15', 'grad_norm': '0.6013', 'learning_rate': '0.0001613', 'epoch': '0.6599'} | |
| {'loss': '3.154', 'grad_norm': '0.5649', 'learning_rate': '0.0001607', 'epoch': '0.6606'} | |
| {'loss': '3.147', 'grad_norm': '0.5472', 'learning_rate': '0.0001602', 'epoch': '0.6613'} | |
| {'loss': '3.154', 'grad_norm': '0.5299', 'learning_rate': '0.0001596', 'epoch': '0.6619'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.78it/s] | |
| {'loss': '3.155', 'grad_norm': '0.5921', 'learning_rate': '0.0001591', 'epoch': '0.6626'} | |
| {'loss': '3.15', 'grad_norm': '0.6007', 'learning_rate': '0.0001585', 'epoch': '0.6632'} | |
| {'loss': '3.157', 'grad_norm': '0.5583', 'learning_rate': '0.0001579', 'epoch': '0.6639'} | |
| {'loss': '3.151', 'grad_norm': '0.5939', 'learning_rate': '0.0001574', 'epoch': '0.6645'} | |
| {'loss': '3.149', 'grad_norm': '1.452', 'learning_rate': '0.0001568', 'epoch': '0.6652'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 2.61it/s] | |
| {'loss': '3.155', 'grad_norm': '0.5665', 'learning_rate': '0.0001563', 'epoch': '0.6658'} | |
| {'loss': '3.154', 'grad_norm': '0.5714', 'learning_rate': '0.0001557', 'epoch': '0.6665'} | |
| {'loss': '3.147', 'grad_norm': '0.5902', 'learning_rate': '0.0001552', 'epoch': '0.6672'} | |
| {'loss': '3.149', 'grad_norm': '0.6296', 'learning_rate': '0.0001546', 'epoch': '0.6678'} | |
| {'loss': '3.15', 'grad_norm': '0.5713', 'learning_rate': '0.0001541', 'epoch': '0.6685'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 2.70it/s] | |
| {'loss': '3.15', 'grad_norm': '0.5553', 'learning_rate': '0.0001535', 'epoch': '0.6691'} | |
| {'loss': '3.157', 'grad_norm': '0.5912', 'learning_rate': '0.000153', 'epoch': '0.6698'} | |
| {'loss': '3.149', 'grad_norm': '0.5765', 'learning_rate': '0.0001524', 'epoch': '0.6704'} | |
| {'loss': '3.152', 'grad_norm': '0.5983', 'learning_rate': '0.0001519', 'epoch': '0.6711'} | |
| {'loss': '3.148', 'grad_norm': '0.5928', 'learning_rate': '0.0001513', 'epoch': '0.6717'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.53it/s] | |
| {'loss': '3.144', 'grad_norm': '0.5669', 'learning_rate': '0.0001508', 'epoch': '0.6724'} | |
| {'loss': '3.148', 'grad_norm': '0.5714', 'learning_rate': '0.0001502', 'epoch': '0.6731'} | |
| {'loss': '3.152', 'grad_norm': '0.5747', 'learning_rate': '0.0001497', 'epoch': '0.6737'} | |
| {'loss': '3.148', 'grad_norm': '0.5512', 'learning_rate': '0.0001491', 'epoch': '0.6744'} | |
| {'loss': '3.149', 'grad_norm': '0.5569', 'learning_rate': '0.0001486', 'epoch': '0.675'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.23it/s] | |
| {'loss': '3.143', 'grad_norm': '0.5966', 'learning_rate': '0.0001481', 'epoch': '0.6757'} | |
| {'loss': '3.144', 'grad_norm': '0.5712', 'learning_rate': '0.0001475', 'epoch': '0.6763'} | |
| {'loss': '3.146', 'grad_norm': '0.5782', 'learning_rate': '0.000147', 'epoch': '0.677'} | |
| {'loss': '3.146', 'grad_norm': '0.5699', 'learning_rate': '0.0001464', 'epoch': '0.6776'} | |
| {'loss': '3.147', 'grad_norm': '0.5496', 'learning_rate': '0.0001459', 'epoch': '0.6783'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.75it/s] | |
| {'loss': '3.145', 'grad_norm': '0.5719', 'learning_rate': '0.0001454', 'epoch': '0.679'} | |
| {'loss': '3.141', 'grad_norm': '0.607', 'learning_rate': '0.0001448', 'epoch': '0.6796'} | |
| {'loss': '3.146', 'grad_norm': '0.5804', 'learning_rate': '0.0001443', 'epoch': '0.6803'} | |
| {'loss': '3.143', 'grad_norm': '0.5717', 'learning_rate': '0.0001437', 'epoch': '0.6809'} | |
| {'loss': '3.15', 'grad_norm': '0.5833', 'learning_rate': '0.0001432', 'epoch': '0.6816'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.20it/s] | |
| {'loss': '3.147', 'grad_norm': '0.5545', 'learning_rate': '0.0001427', 'epoch': '0.6822'} | |
| {'loss': '3.147', 'grad_norm': '0.5901', 'learning_rate': '0.0001421', 'epoch': '0.6829'} | |
| {'loss': '3.141', 'grad_norm': '0.5919', 'learning_rate': '0.0001416', 'epoch': '0.6835'} | |
| {'loss': '3.144', 'grad_norm': '0.5657', 'learning_rate': '0.0001411', 'epoch': '0.6842'} | |
| {'loss': '3.141', 'grad_norm': '0.5798', 'learning_rate': '0.0001405', 'epoch': '0.6849'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5.79it/s] | |
| {'loss': '3.143', 'grad_norm': '0.571', 'learning_rate': '0.00014', 'epoch': '0.6855'} | |
| {'loss': '3.141', 'grad_norm': '0.5803', 'learning_rate': '0.0001395', 'epoch': '0.6862'} | |
| {'loss': '3.147', 'grad_norm': '0.5813', 'learning_rate': '0.0001389', 'epoch': '0.6868'} | |
| {'loss': '3.142', 'grad_norm': '0.5874', 'learning_rate': '0.0001384', 'epoch': '0.6875'} | |
| {'loss': '3.142', 'grad_norm': '0.6025', 'learning_rate': '0.0001379', 'epoch': '0.6881'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.43it/s] | |
| {'loss': '3.143', 'grad_norm': '0.6342', 'learning_rate': '0.0001373', 'epoch': '0.6888'} | |
| {'loss': '3.14', 'grad_norm': '0.5826', 'learning_rate': '0.0001368', 'epoch': '0.6894'} | |
| {'loss': '3.144', 'grad_norm': '0.6076', 'learning_rate': '0.0001363', 'epoch': '0.6901'} | |
| {'loss': '3.146', 'grad_norm': '0.55', 'learning_rate': '0.0001357', 'epoch': '0.6907'} | |
| {'loss': '3.14', 'grad_norm': '0.6183', 'learning_rate': '0.0001352', 'epoch': '0.6914'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.67it/s] | |
| {'loss': '3.141', 'grad_norm': '0.568', 'learning_rate': '0.0001347', 'epoch': '0.6921'} | |
| {'loss': '3.141', 'grad_norm': '0.5741', 'learning_rate': '0.0001342', 'epoch': '0.6927'} | |
| {'loss': '3.143', 'grad_norm': '0.5948', 'learning_rate': '0.0001336', 'epoch': '0.6934'} | |
| {'loss': '3.149', 'grad_norm': '0.5456', 'learning_rate': '0.0001331', 'epoch': '0.694'} | |
| {'loss': '3.138', 'grad_norm': '0.5717', 'learning_rate': '0.0001326', 'epoch': '0.6947'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.28it/s] | |
| {'loss': '3.138', 'grad_norm': '0.5628', 'learning_rate': '0.0001321', 'epoch': '0.6953'} | |
| {'loss': '3.14', 'grad_norm': '0.5818', 'learning_rate': '0.0001315', 'epoch': '0.696'} | |
| {'loss': '3.139', 'grad_norm': '0.583', 'learning_rate': '0.000131', 'epoch': '0.6966'} | |
| {'loss': '3.139', 'grad_norm': '0.5855', 'learning_rate': '0.0001305', 'epoch': '0.6973'} | |
| {'loss': '3.139', 'grad_norm': '0.5954', 'learning_rate': '0.00013', 'epoch': '0.698'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.36it/s] | |
| {'loss': '3.138', 'grad_norm': '0.5906', 'learning_rate': '0.0001295', 'epoch': '0.6986'} | |
| {'loss': '3.138', 'grad_norm': '0.5608', 'learning_rate': '0.0001289', 'epoch': '0.6993'} | |
| {'loss': '3.139', 'grad_norm': '0.5546', 'learning_rate': '0.0001284', 'epoch': '0.6999'} | |
| {'loss': '3.136', 'grad_norm': '0.601', 'learning_rate': '0.0001279', 'epoch': '0.7006'} | |
| {'loss': '3.131', 'grad_norm': '0.5743', 'learning_rate': '0.0001274', 'epoch': '0.7012'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.17it/s] | |
| {'loss': '3.139', 'grad_norm': '0.5747', 'learning_rate': '0.0001269', 'epoch': '0.7019'} | |
| {'loss': '3.135', 'grad_norm': '0.5832', 'learning_rate': '0.0001264', 'epoch': '0.7025'} | |
| {'loss': '3.138', 'grad_norm': '0.574', 'learning_rate': '0.0001259', 'epoch': '0.7032'} | |
| {'loss': '3.132', 'grad_norm': '0.5809', 'learning_rate': '0.0001253', 'epoch': '0.7039'} | |
| {'loss': '3.14', 'grad_norm': '0.5787', 'learning_rate': '0.0001248', 'epoch': '0.7045'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.54it/s] | |
| {'loss': '3.138', 'grad_norm': '0.622', 'learning_rate': '0.0001243', 'epoch': '0.7052'} | |
| {'loss': '3.136', 'grad_norm': '0.5596', 'learning_rate': '0.0001238', 'epoch': '0.7058'} | |
| {'loss': '3.135', 'grad_norm': '0.5907', 'learning_rate': '0.0001233', 'epoch': '0.7065'} | |
| {'loss': '3.136', 'grad_norm': '0.5579', 'learning_rate': '0.0001228', 'epoch': '0.7071'} | |
| {'loss': '3.136', 'grad_norm': '0.5956', 'learning_rate': '0.0001223', 'epoch': '0.7078'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5.17it/s] | |
| {'loss': '3.135', 'grad_norm': '0.5904', 'learning_rate': '0.0001218', 'epoch': '0.7084'} | |
| {'loss': '3.14', 'grad_norm': '0.5702', 'learning_rate': '0.0001213', 'epoch': '0.7091'} | |
| {'loss': '3.132', 'grad_norm': '0.5849', 'learning_rate': '0.0001208', 'epoch': '0.7098'} | |
| {'loss': '3.132', 'grad_norm': '0.5682', 'learning_rate': '0.0001203', 'epoch': '0.7104'} | |
| {'loss': '3.134', 'grad_norm': '0.582', 'learning_rate': '0.0001198', 'epoch': '0.7111'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.39it/s] | |
| {'loss': '3.135', 'grad_norm': '0.5797', 'learning_rate': '0.0001192', 'epoch': '0.7117'} | |
| {'loss': '3.139', 'grad_norm': '0.6054', 'learning_rate': '0.0001187', 'epoch': '0.7124'} | |
| {'loss': '3.133', 'grad_norm': '0.5864', 'learning_rate': '0.0001182', 'epoch': '0.713'} | |
| {'loss': '3.131', 'grad_norm': '0.5758', 'learning_rate': '0.0001177', 'epoch': '0.7137'} | |
| {'loss': '3.136', 'grad_norm': '0.6006', 'learning_rate': '0.0001172', 'epoch': '0.7143'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.64it/s] | |
| {'loss': '3.134', 'grad_norm': '0.5658', 'learning_rate': '0.0001167', 'epoch': '0.715'} | |
| {'loss': '3.132', 'grad_norm': '0.5958', 'learning_rate': '0.0001162', 'epoch': '0.7157'} | |
| {'loss': '3.13', 'grad_norm': '0.5866', 'learning_rate': '0.0001157', 'epoch': '0.7163'} | |
| {'loss': '3.13', 'grad_norm': '0.5767', 'learning_rate': '0.0001153', 'epoch': '0.717'} | |
| {'loss': '3.122', 'grad_norm': '0.6066', 'learning_rate': '0.0001148', 'epoch': '0.7176'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.29it/s] | |
| {'loss': '3.131', 'grad_norm': '0.5783', 'learning_rate': '0.0001143', 'epoch': '0.7183'} | |
| {'loss': '3.135', 'grad_norm': '0.6035', 'learning_rate': '0.0001138', 'epoch': '0.7189'} | |
| {'loss': '3.126', 'grad_norm': '0.6012', 'learning_rate': '0.0001133', 'epoch': '0.7196'} | |
| {'loss': '3.129', 'grad_norm': '0.6369', 'learning_rate': '0.0001128', 'epoch': '0.7202'} | |
| {'loss': '3.125', 'grad_norm': '0.6026', 'learning_rate': '0.0001123', 'epoch': '0.7209'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.27it/s] | |
| {'loss': '3.132', 'grad_norm': '0.5961', 'learning_rate': '0.0001118', 'epoch': '0.7216'} | |
| {'loss': '3.127', 'grad_norm': '0.601', 'learning_rate': '0.0001113', 'epoch': '0.7222'} | |
| {'loss': '3.122', 'grad_norm': '0.5667', 'learning_rate': '0.0001108', 'epoch': '0.7229'} | |
| {'loss': '3.124', 'grad_norm': '0.6108', 'learning_rate': '0.0001103', 'epoch': '0.7235'} | |
| {'loss': '3.124', 'grad_norm': '0.5747', 'learning_rate': '0.0001098', 'epoch': '0.7242'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.27it/s] | |
| {'loss': '3.131', 'grad_norm': '0.5828', 'learning_rate': '0.0001094', 'epoch': '0.7248'} | |
| {'loss': '3.123', 'grad_norm': '0.664', 'learning_rate': '0.0001089', 'epoch': '0.7255'} | |
| {'loss': '3.123', 'grad_norm': '0.5727', 'learning_rate': '0.0001084', 'epoch': '0.7261'} | |
| {'loss': '3.12', 'grad_norm': '0.5916', 'learning_rate': '0.0001079', 'epoch': '0.7268'} | |
| {'loss': '3.134', 'grad_norm': '0.5927', 'learning_rate': '0.0001074', 'epoch': '0.7274'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.32it/s] | |
| {'loss': '3.129', 'grad_norm': '0.5993', 'learning_rate': '0.0001069', 'epoch': '0.7281'} | |
| {'loss': '3.127', 'grad_norm': '0.6461', 'learning_rate': '0.0001064', 'epoch': '0.7288'} | |
| {'loss': '3.126', 'grad_norm': '0.628', 'learning_rate': '0.000106', 'epoch': '0.7294'} | |
| {'loss': '3.131', 'grad_norm': '0.5937', 'learning_rate': '0.0001055', 'epoch': '0.7301'} | |
| {'loss': '3.125', 'grad_norm': '0.601', 'learning_rate': '0.000105', 'epoch': '0.7307'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.63it/s] | |
| {'loss': '3.128', 'grad_norm': '0.6029', 'learning_rate': '0.0001045', 'epoch': '0.7314'} | |
| {'loss': '3.123', 'grad_norm': '0.5728', 'learning_rate': '0.000104', 'epoch': '0.732'} | |
| {'loss': '3.12', 'grad_norm': '0.6318', 'learning_rate': '0.0001036', 'epoch': '0.7327'} | |
| {'loss': '3.125', 'grad_norm': '0.6638', 'learning_rate': '0.0001031', 'epoch': '0.7333'} | |
| {'loss': '3.127', 'grad_norm': '0.5983', 'learning_rate': '0.0001026', 'epoch': '0.734'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.68it/s] | |
| {'loss': '3.129', 'grad_norm': '0.5987', 'learning_rate': '0.0001021', 'epoch': '0.7347'} | |
| {'loss': '3.125', 'grad_norm': '0.6129', 'learning_rate': '0.0001017', 'epoch': '0.7353'} | |
| {'loss': '3.129', 'grad_norm': '0.6193', 'learning_rate': '0.0001012', 'epoch': '0.736'} | |
| {'loss': '3.123', 'grad_norm': '0.6388', 'learning_rate': '0.0001007', 'epoch': '0.7366'} | |
| {'loss': '3.124', 'grad_norm': '0.6258', 'learning_rate': '0.0001003', 'epoch': '0.7373'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.78it/s] | |
| {'loss': '3.123', 'grad_norm': '0.6015', 'learning_rate': '9.979e-05', 'epoch': '0.7379'} | |
| {'loss': '3.127', 'grad_norm': '0.6257', 'learning_rate': '9.932e-05', 'epoch': '0.7386'} | |
| {'loss': '3.125', 'grad_norm': '0.5986', 'learning_rate': '9.885e-05', 'epoch': '0.7392'} | |
| {'loss': '3.126', 'grad_norm': '0.5819', 'learning_rate': '9.839e-05', 'epoch': '0.7399'} | |
| {'loss': '3.12', 'grad_norm': '0.6157', 'learning_rate': '9.792e-05', 'epoch': '0.7406'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5.04it/s] | |
| {'loss': '3.122', 'grad_norm': '0.5748', 'learning_rate': '9.745e-05', 'epoch': '0.7412'} | |
| {'loss': '3.125', 'grad_norm': '0.6364', 'learning_rate': '9.699e-05', 'epoch': '0.7419'} | |
| {'loss': '3.125', 'grad_norm': '0.6087', 'learning_rate': '9.653e-05', 'epoch': '0.7425'} | |
| {'loss': '3.121', 'grad_norm': '0.5976', 'learning_rate': '9.606e-05', 'epoch': '0.7432'} | |
| {'loss': '3.125', 'grad_norm': '0.5827', 'learning_rate': '9.56e-05', 'epoch': '0.7438'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.37it/s] | |
| {'loss': '3.126', 'grad_norm': '0.6309', 'learning_rate': '9.514e-05', 'epoch': '0.7445'} | |
| {'loss': '3.121', 'grad_norm': '0.6744', 'learning_rate': '9.468e-05', 'epoch': '0.7451'} | |
| {'loss': '3.124', 'grad_norm': '0.6307', 'learning_rate': '9.422e-05', 'epoch': '0.7458'} | |
| {'loss': '3.123', 'grad_norm': '0.6349', 'learning_rate': '9.376e-05', 'epoch': '0.7465'} | |
| {'loss': '3.12', 'grad_norm': '0.5944', 'learning_rate': '9.331e-05', 'epoch': '0.7471'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.95it/s] | |
| {'loss': '3.124', 'grad_norm': '0.5762', 'learning_rate': '9.285e-05', 'epoch': '0.7478'} | |
| {'loss': '3.122', 'grad_norm': '0.6068', 'learning_rate': '9.239e-05', 'epoch': '0.7484'} | |
| {'loss': '3.115', 'grad_norm': '0.6304', 'learning_rate': '9.194e-05', 'epoch': '0.7491'} | |
| {'loss': '3.117', 'grad_norm': '0.6145', 'learning_rate': '9.149e-05', 'epoch': '0.7497'} | |
| {'loss': '3.117', 'grad_norm': '0.6105', 'learning_rate': '9.103e-05', 'epoch': '0.7504'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.31it/s] | |
| {'loss': '3.117', 'grad_norm': '0.6165', 'learning_rate': '9.058e-05', 'epoch': '0.751'} | |
| {'loss': '3.118', 'grad_norm': '0.6108', 'learning_rate': '9.013e-05', 'epoch': '0.7517'} | |
| {'loss': '3.113', 'grad_norm': '0.6264', 'learning_rate': '8.968e-05', 'epoch': '0.7524'} | |
| {'loss': '3.118', 'grad_norm': '0.6408', 'learning_rate': '8.923e-05', 'epoch': '0.753'} | |
| {'loss': '3.121', 'grad_norm': '0.5837', 'learning_rate': '8.878e-05', 'epoch': '0.7537'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.39it/s] | |
| {'loss': '3.118', 'grad_norm': '0.6373', 'learning_rate': '8.834e-05', 'epoch': '0.7543'} | |
| {'loss': '3.117', 'grad_norm': '0.5999', 'learning_rate': '8.789e-05', 'epoch': '0.755'} | |
| {'loss': '3.121', 'grad_norm': '0.6219', 'learning_rate': '8.745e-05', 'epoch': '0.7556'} | |
| {'loss': '3.117', 'grad_norm': '0.6014', 'learning_rate': '8.7e-05', 'epoch': '0.7563'} | |
| {'loss': '3.113', 'grad_norm': '0.6236', 'learning_rate': '8.656e-05', 'epoch': '0.7569'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.94it/s] | |
| {'loss': '3.118', 'grad_norm': '0.6067', 'learning_rate': '8.611e-05', 'epoch': '0.7576'} | |
| {'loss': '3.115', 'grad_norm': '0.6211', 'learning_rate': '8.567e-05', 'epoch': '0.7583'} | |
| {'loss': '3.119', 'grad_norm': '0.642', 'learning_rate': '8.523e-05', 'epoch': '0.7589'} | |
| {'loss': '3.116', 'grad_norm': '0.619', 'learning_rate': '8.479e-05', 'epoch': '0.7596'} | |
| {'loss': '3.112', 'grad_norm': '0.62', 'learning_rate': '8.435e-05', 'epoch': '0.7602'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.99it/s] | |
| {'loss': '3.113', 'grad_norm': '0.6389', 'learning_rate': '8.392e-05', 'epoch': '0.7609'} | |
| {'loss': '3.113', 'grad_norm': '0.6062', 'learning_rate': '8.348e-05', 'epoch': '0.7615'} | |
| {'loss': '3.119', 'grad_norm': '0.613', 'learning_rate': '8.304e-05', 'epoch': '0.7622'} | |
| {'loss': '3.118', 'grad_norm': '0.6003', 'learning_rate': '8.261e-05', 'epoch': '0.7628'} | |
| {'loss': '3.116', 'grad_norm': '0.6023', 'learning_rate': '8.218e-05', 'epoch': '0.7635'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.55it/s] | |
| {'loss': '3.113', 'grad_norm': '0.6265', 'learning_rate': '8.174e-05', 'epoch': '0.7641'} | |
| {'loss': '3.116', 'grad_norm': '0.5955', 'learning_rate': '8.131e-05', 'epoch': '0.7648'} | |
| {'loss': '3.117', 'grad_norm': '0.6189', 'learning_rate': '8.088e-05', 'epoch': '0.7655'} | |
| {'loss': '3.115', 'grad_norm': '0.6251', 'learning_rate': '8.045e-05', 'epoch': '0.7661'} | |
| {'loss': '3.113', 'grad_norm': '0.652', 'learning_rate': '8.002e-05', 'epoch': '0.7668'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.07it/s] | |
| {'loss': '3.12', 'grad_norm': '0.6518', 'learning_rate': '7.959e-05', 'epoch': '0.7674'} | |
| {'loss': '3.112', 'grad_norm': '0.6251', 'learning_rate': '7.917e-05', 'epoch': '0.7681'} | |
| {'loss': '3.106', 'grad_norm': '0.6289', 'learning_rate': '7.874e-05', 'epoch': '0.7687'} | |
| {'loss': '3.118', 'grad_norm': '0.6065', 'learning_rate': '7.831e-05', 'epoch': '0.7694'} | |
| {'loss': '3.112', 'grad_norm': '0.6435', 'learning_rate': '7.789e-05', 'epoch': '0.77'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.12it/s] | |
| {'loss': '3.115', 'grad_norm': '0.626', 'learning_rate': '7.747e-05', 'epoch': '0.7707'} | |
| {'loss': '3.112', 'grad_norm': '0.614', 'learning_rate': '7.705e-05', 'epoch': '0.7714'} | |
| {'loss': '3.112', 'grad_norm': '0.618', 'learning_rate': '7.662e-05', 'epoch': '0.772'} | |
| {'loss': '3.112', 'grad_norm': '0.6314', 'learning_rate': '7.62e-05', 'epoch': '0.7727'} | |
| {'loss': '3.11', 'grad_norm': '0.6665', 'learning_rate': '7.578e-05', 'epoch': '0.7733'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5.75it/s] | |
| {'loss': '3.108', 'grad_norm': '0.6265', 'learning_rate': '7.537e-05', 'epoch': '0.774'} | |
| {'loss': '3.112', 'grad_norm': '0.6313', 'learning_rate': '7.495e-05', 'epoch': '0.7746'} | |
| {'loss': '3.11', 'grad_norm': '0.6252', 'learning_rate': '7.453e-05', 'epoch': '0.7753'} | |
| {'loss': '3.11', 'grad_norm': '0.6468', 'learning_rate': '7.412e-05', 'epoch': '0.7759'} | |
| {'loss': '3.107', 'grad_norm': '0.6373', 'learning_rate': '7.37e-05', 'epoch': '0.7766'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.37it/s] | |
| {'loss': '3.11', 'grad_norm': '0.6131', 'learning_rate': '7.329e-05', 'epoch': '0.7773'} | |
| {'loss': '3.108', 'grad_norm': '0.6232', 'learning_rate': '7.288e-05', 'epoch': '0.7779'} | |
| {'loss': '3.111', 'grad_norm': '0.5948', 'learning_rate': '7.247e-05', 'epoch': '0.7786'} | |
| {'loss': '3.105', 'grad_norm': '0.613', 'learning_rate': '7.206e-05', 'epoch': '0.7792'} | |
| {'loss': '3.113', 'grad_norm': '0.6153', 'learning_rate': '7.165e-05', 'epoch': '0.7799'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 2.74it/s] | |
| {'loss': '3.107', 'grad_norm': '0.5967', 'learning_rate': '7.124e-05', 'epoch': '0.7805'} | |
| {'loss': '3.108', 'grad_norm': '0.6167', 'learning_rate': '7.083e-05', 'epoch': '0.7812'} | |
| {'loss': '3.108', 'grad_norm': '0.6449', 'learning_rate': '7.042e-05', 'epoch': '0.7818'} | |
| {'loss': '3.111', 'grad_norm': '0.6085', 'learning_rate': '7.002e-05', 'epoch': '0.7825'} | |
| {'loss': '3.108', 'grad_norm': '0.6634', 'learning_rate': '6.962e-05', 'epoch': '0.7832'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.46it/s] | |
| {'loss': '3.103', 'grad_norm': '0.6057', 'learning_rate': '6.921e-05', 'epoch': '0.7838'} | |
| {'loss': '3.108', 'grad_norm': '0.6245', 'learning_rate': '6.881e-05', 'epoch': '0.7845'} | |
| {'loss': '3.111', 'grad_norm': '0.6097', 'learning_rate': '6.841e-05', 'epoch': '0.7851'} | |
| {'loss': '3.106', 'grad_norm': '0.6323', 'learning_rate': '6.801e-05', 'epoch': '0.7858'} | |
| {'loss': '3.12', 'grad_norm': '0.6374', 'learning_rate': '6.761e-05', 'epoch': '0.7864'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.70it/s] | |
| {'loss': '3.104', 'grad_norm': '0.6175', 'learning_rate': '6.721e-05', 'epoch': '0.7871'} | |
| {'loss': '3.111', 'grad_norm': '0.6686', 'learning_rate': '6.681e-05', 'epoch': '0.7877'} | |
| {'loss': '3.109', 'grad_norm': '0.6653', 'learning_rate': '6.642e-05', 'epoch': '0.7884'} | |
| {'loss': '3.098', 'grad_norm': '0.6189', 'learning_rate': '6.602e-05', 'epoch': '0.7891'} | |
| {'loss': '3.105', 'grad_norm': '0.6326', 'learning_rate': '6.563e-05', 'epoch': '0.7897'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.19it/s] | |
| {'loss': '3.103', 'grad_norm': '0.6348', 'learning_rate': '6.524e-05', 'epoch': '0.7904'} | |
| {'loss': '3.113', 'grad_norm': '0.6418', 'learning_rate': '6.484e-05', 'epoch': '0.791'} | |
| {'loss': '3.102', 'grad_norm': '0.6452', 'learning_rate': '6.445e-05', 'epoch': '0.7917'} | |
| {'loss': '3.11', 'grad_norm': '0.5985', 'learning_rate': '6.406e-05', 'epoch': '0.7923'} | |
| {'loss': '3.109', 'grad_norm': '0.6277', 'learning_rate': '6.368e-05', 'epoch': '0.793'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.36it/s] | |
| {'loss': '3.107', 'grad_norm': '0.6696', 'learning_rate': '6.329e-05', 'epoch': '0.7936'} | |
| {'loss': '3.11', 'grad_norm': '0.6501', 'learning_rate': '6.29e-05', 'epoch': '0.7943'} | |
| {'loss': '3.108', 'grad_norm': '0.6391', 'learning_rate': '6.252e-05', 'epoch': '0.795'} | |
| {'loss': '3.101', 'grad_norm': '0.6252', 'learning_rate': '6.213e-05', 'epoch': '0.7956'} | |
| {'loss': '3.107', 'grad_norm': '0.6313', 'learning_rate': '6.175e-05', 'epoch': '0.7963'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.39it/s] | |
| {'loss': '3.1', 'grad_norm': '0.6392', 'learning_rate': '6.136e-05', 'epoch': '0.7969'} | |
| {'loss': '3.1', 'grad_norm': '0.6078', 'learning_rate': '6.098e-05', 'epoch': '0.7976'} | |
| {'loss': '3.101', 'grad_norm': '0.6698', 'learning_rate': '6.06e-05', 'epoch': '0.7982'} | |
| {'loss': '3.101', 'grad_norm': '0.6372', 'learning_rate': '6.022e-05', 'epoch': '0.7989'} | |
| {'loss': '3.108', 'grad_norm': '0.6285', 'learning_rate': '5.985e-05', 'epoch': '0.7995'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.45it/s] | |
| {'loss': '3.101', 'grad_norm': '0.6547', 'learning_rate': '5.947e-05', 'epoch': '0.8002'} | |
| {'loss': '3.102', 'grad_norm': '0.6253', 'learning_rate': '5.909e-05', 'epoch': '0.8008'} | |
| {'loss': '3.105', 'grad_norm': '0.6125', 'learning_rate': '5.872e-05', 'epoch': '0.8015'} | |
| {'loss': '3.099', 'grad_norm': '0.6552', 'learning_rate': '5.834e-05', 'epoch': '0.8022'} | |
| {'loss': '3.103', 'grad_norm': '0.6274', 'learning_rate': '5.797e-05', 'epoch': '0.8028'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.94it/s] | |
| {'loss': '3.103', 'grad_norm': '0.6341', 'learning_rate': '5.76e-05', 'epoch': '0.8035'} | |
| {'loss': '3.103', 'grad_norm': '0.6262', 'learning_rate': '5.723e-05', 'epoch': '0.8041'} | |
| {'loss': '3.108', 'grad_norm': '0.6374', 'learning_rate': '5.686e-05', 'epoch': '0.8048'} | |
| {'loss': '3.099', 'grad_norm': '0.65', 'learning_rate': '5.649e-05', 'epoch': '0.8054'} | |
| {'loss': '3.104', 'grad_norm': '0.6207', 'learning_rate': '5.612e-05', 'epoch': '0.8061'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.77it/s] | |
| {'loss': '3.098', 'grad_norm': '0.6371', 'learning_rate': '5.576e-05', 'epoch': '0.8067'} | |
| {'loss': '3.09', 'grad_norm': '0.6499', 'learning_rate': '5.539e-05', 'epoch': '0.8074'} | |
| {'loss': '3.096', 'grad_norm': '0.6082', 'learning_rate': '5.503e-05', 'epoch': '0.8081'} | |
| {'loss': '3.097', 'grad_norm': '0.6333', 'learning_rate': '5.466e-05', 'epoch': '0.8087'} | |
| {'loss': '3.101', 'grad_norm': '0.6255', 'learning_rate': '5.43e-05', 'epoch': '0.8094'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.59it/s] | |
| {'loss': '3.1', 'grad_norm': '0.6589', 'learning_rate': '5.394e-05', 'epoch': '0.81'} | |
| {'loss': '3.097', 'grad_norm': '0.6141', 'learning_rate': '5.358e-05', 'epoch': '0.8107'} | |
| {'loss': '3.094', 'grad_norm': '0.6351', 'learning_rate': '5.322e-05', 'epoch': '0.8113'} | |
| {'loss': '3.099', 'grad_norm': '0.633', 'learning_rate': '5.286e-05', 'epoch': '0.812'} | |
| {'loss': '3.098', 'grad_norm': '0.6227', 'learning_rate': '5.251e-05', 'epoch': '0.8126'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.37it/s] | |
| {'loss': '3.108', 'grad_norm': '0.6204', 'learning_rate': '5.215e-05', 'epoch': '0.8133'} | |
| {'loss': '3.093', 'grad_norm': '0.6097', 'learning_rate': '5.179e-05', 'epoch': '0.814'} | |
| {'loss': '3.095', 'grad_norm': '0.672', 'learning_rate': '5.144e-05', 'epoch': '0.8146'} | |
| {'loss': '3.097', 'grad_norm': '0.6189', 'learning_rate': '5.109e-05', 'epoch': '0.8153'} | |
| {'loss': '3.1', 'grad_norm': '0.6341', 'learning_rate': '5.074e-05', 'epoch': '0.8159'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.74it/s] | |
| {'loss': '3.1', 'grad_norm': '0.6408', 'learning_rate': '5.039e-05', 'epoch': '0.8166'} | |
| {'loss': '3.091', 'grad_norm': '0.6513', 'learning_rate': '5.004e-05', 'epoch': '0.8172'} | |
| {'loss': '3.098', 'grad_norm': '0.6188', 'learning_rate': '4.969e-05', 'epoch': '0.8179'} | |
| {'loss': '3.092', 'grad_norm': '0.6241', 'learning_rate': '4.934e-05', 'epoch': '0.8185'} | |
| {'loss': '3.092', 'grad_norm': '0.6283', 'learning_rate': '4.9e-05', 'epoch': '0.8192'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.67it/s] | |
| {'loss': '3.097', 'grad_norm': '0.6626', 'learning_rate': '4.865e-05', 'epoch': '0.8199'} | |
| {'loss': '3.098', 'grad_norm': '0.6264', 'learning_rate': '4.831e-05', 'epoch': '0.8205'} | |
| {'loss': '3.094', 'grad_norm': '0.6382', 'learning_rate': '4.797e-05', 'epoch': '0.8212'} | |
| {'loss': '3.088', 'grad_norm': '0.5975', 'learning_rate': '4.763e-05', 'epoch': '0.8218'} | |
| {'loss': '3.096', 'grad_norm': '0.616', 'learning_rate': '4.729e-05', 'epoch': '0.8225'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 2.99it/s] | |
| {'loss': '3.101', 'grad_norm': '0.6172', 'learning_rate': '4.695e-05', 'epoch': '0.8231'} | |
| {'loss': '3.09', 'grad_norm': '0.6163', 'learning_rate': '4.661e-05', 'epoch': '0.8238'} | |
| {'loss': '3.094', 'grad_norm': '0.6191', 'learning_rate': '4.627e-05', 'epoch': '0.8244'} | |
| {'loss': '3.093', 'grad_norm': '0.6242', 'learning_rate': '4.594e-05', 'epoch': '0.8251'} | |
| {'loss': '3.089', 'grad_norm': '0.6498', 'learning_rate': '4.56e-05', 'epoch': '0.8258'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.79it/s] | |
| {'loss': '3.093', 'grad_norm': '0.6488', 'learning_rate': '4.527e-05', 'epoch': '0.8264'} | |
| {'loss': '3.095', 'grad_norm': '0.626', 'learning_rate': '4.494e-05', 'epoch': '0.8271'} | |
| {'loss': '3.095', 'grad_norm': '0.6518', 'learning_rate': '4.46e-05', 'epoch': '0.8277'} | |
| {'loss': '3.087', 'grad_norm': '0.6512', 'learning_rate': '4.427e-05', 'epoch': '0.8284'} | |
| {'loss': '3.092', 'grad_norm': '0.6292', 'learning_rate': '4.395e-05', 'epoch': '0.829'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.27it/s] | |
| {'loss': '3.09', 'grad_norm': '0.6151', 'learning_rate': '4.362e-05', 'epoch': '0.8297'} | |
| {'loss': '3.096', 'grad_norm': '0.6268', 'learning_rate': '4.329e-05', 'epoch': '0.8303'} | |
| {'loss': '3.097', 'grad_norm': '0.6287', 'learning_rate': '4.297e-05', 'epoch': '0.831'} | |
| {'loss': '3.092', 'grad_norm': '0.6075', 'learning_rate': '4.264e-05', 'epoch': '0.8317'} | |
| {'loss': '3.087', 'grad_norm': '0.6037', 'learning_rate': '4.232e-05', 'epoch': '0.8323'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5.76it/s] | |
| {'loss': '3.087', 'grad_norm': '0.6239', 'learning_rate': '4.2e-05', 'epoch': '0.833'} | |
| {'loss': '3.091', 'grad_norm': '0.6161', 'learning_rate': '4.167e-05', 'epoch': '0.8336'} | |
| {'loss': '3.091', 'grad_norm': '0.6112', 'learning_rate': '4.135e-05', 'epoch': '0.8343'} | |
| {'loss': '3.09', 'grad_norm': '0.6804', 'learning_rate': '4.104e-05', 'epoch': '0.8349'} | |
| {'loss': '3.091', 'grad_norm': '0.6317', 'learning_rate': '4.072e-05', 'epoch': '0.8356'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.26it/s] | |
| {'loss': '3.101', 'grad_norm': '0.6333', 'learning_rate': '4.04e-05', 'epoch': '0.8362'} | |
| {'loss': '3.097', 'grad_norm': '0.5939', 'learning_rate': '4.009e-05', 'epoch': '0.8369'} | |
| {'loss': '3.088', 'grad_norm': '0.6334', 'learning_rate': '3.977e-05', 'epoch': '0.8375'} | |
| {'loss': '3.099', 'grad_norm': '0.6321', 'learning_rate': '3.946e-05', 'epoch': '0.8382'} | |
| {'loss': '3.09', 'grad_norm': '0.6519', 'learning_rate': '3.915e-05', 'epoch': '0.8389'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.11it/s] | |
| {'loss': '3.094', 'grad_norm': '0.6287', 'learning_rate': '3.884e-05', 'epoch': '0.8395'} | |
| {'loss': '3.085', 'grad_norm': '0.6018', 'learning_rate': '3.853e-05', 'epoch': '0.8402'} | |
| {'loss': '3.089', 'grad_norm': '0.6308', 'learning_rate': '3.822e-05', 'epoch': '0.8408'} | |
| {'loss': '3.09', 'grad_norm': '0.6488', 'learning_rate': '3.791e-05', 'epoch': '0.8415'} | |
| {'loss': '3.094', 'grad_norm': '0.6538', 'learning_rate': '3.761e-05', 'epoch': '0.8421'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.60it/s] | |
| {'loss': '3.088', 'grad_norm': '0.6571', 'learning_rate': '3.73e-05', 'epoch': '0.8428'} | |
| {'loss': '3.092', 'grad_norm': '0.6458', 'learning_rate': '3.7e-05', 'epoch': '0.8434'} | |
| {'loss': '3.087', 'grad_norm': '0.6269', 'learning_rate': '3.669e-05', 'epoch': '0.8441'} | |
| {'loss': '3.092', 'grad_norm': '0.6375', 'learning_rate': '3.639e-05', 'epoch': '0.8448'} | |
| {'loss': '3.089', 'grad_norm': '0.6501', 'learning_rate': '3.609e-05', 'epoch': '0.8454'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.04it/s] | |
| {'loss': '3.088', 'grad_norm': '0.63', 'learning_rate': '3.579e-05', 'epoch': '0.8461'} | |
| {'loss': '3.087', 'grad_norm': '0.6549', 'learning_rate': '3.549e-05', 'epoch': '0.8467'} | |
| {'loss': '3.085', 'grad_norm': '0.6008', 'learning_rate': '3.52e-05', 'epoch': '0.8474'} | |
| {'loss': '3.087', 'grad_norm': '0.6403', 'learning_rate': '3.49e-05', 'epoch': '0.848'} | |
| {'loss': '3.094', 'grad_norm': '0.6625', 'learning_rate': '3.461e-05', 'epoch': '0.8487'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.84it/s] | |
| {'loss': '3.084', 'grad_norm': '0.6461', 'learning_rate': '3.431e-05', 'epoch': '0.8493'} | |
| {'loss': '3.083', 'grad_norm': '0.6318', 'learning_rate': '3.402e-05', 'epoch': '0.85'} | |
| {'loss': '3.091', 'grad_norm': '0.6709', 'learning_rate': '3.373e-05', 'epoch': '0.8507'} | |
| {'loss': '3.083', 'grad_norm': '0.6835', 'learning_rate': '3.344e-05', 'epoch': '0.8513'} | |
| {'loss': '3.091', 'grad_norm': '0.6315', 'learning_rate': '3.315e-05', 'epoch': '0.852'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.20it/s] | |
| {'loss': '3.087', 'grad_norm': '0.6687', 'learning_rate': '3.287e-05', 'epoch': '0.8526'} | |
| {'loss': '3.086', 'grad_norm': '0.6351', 'learning_rate': '3.258e-05', 'epoch': '0.8533'} | |
| {'loss': '3.086', 'grad_norm': '0.6432', 'learning_rate': '3.229e-05', 'epoch': '0.8539'} | |
| {'loss': '3.087', 'grad_norm': '0.6356', 'learning_rate': '3.201e-05', 'epoch': '0.8546'} | |
| {'loss': '3.087', 'grad_norm': '0.6576', 'learning_rate': '3.173e-05', 'epoch': '0.8552'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.56it/s] | |
| {'loss': '3.09', 'grad_norm': '0.6351', 'learning_rate': '3.145e-05', 'epoch': '0.8559'} | |
| {'loss': '3.085', 'grad_norm': '0.6415', 'learning_rate': '3.117e-05', 'epoch': '0.8566'} | |
| {'loss': '3.08', 'grad_norm': '0.6342', 'learning_rate': '3.089e-05', 'epoch': '0.8572'} | |
| {'loss': '3.081', 'grad_norm': '0.6257', 'learning_rate': '3.061e-05', 'epoch': '0.8579'} | |
| {'loss': '3.085', 'grad_norm': '0.6447', 'learning_rate': '3.033e-05', 'epoch': '0.8585'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.76it/s] | |
| {'loss': '3.083', 'grad_norm': '0.6242', 'learning_rate': '3.006e-05', 'epoch': '0.8592'} | |
| {'loss': '3.08', 'grad_norm': '0.6469', 'learning_rate': '2.978e-05', 'epoch': '0.8598'} | |
| {'loss': '3.084', 'grad_norm': '0.6203', 'learning_rate': '2.951e-05', 'epoch': '0.8605'} | |
| {'loss': '3.083', 'grad_norm': '0.6534', 'learning_rate': '2.924e-05', 'epoch': '0.8611'} | |
| {'loss': '3.084', 'grad_norm': '0.6662', 'learning_rate': '2.897e-05', 'epoch': '0.8618'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.52it/s] | |
| {'loss': '3.082', 'grad_norm': '0.6394', 'learning_rate': '2.87e-05', 'epoch': '0.8625'} | |
| {'loss': '3.085', 'grad_norm': '0.6713', 'learning_rate': '2.843e-05', 'epoch': '0.8631'} | |
| {'loss': '3.084', 'grad_norm': '0.643', 'learning_rate': '2.816e-05', 'epoch': '0.8638'} | |
| {'loss': '3.087', 'grad_norm': '0.6698', 'learning_rate': '2.79e-05', 'epoch': '0.8644'} | |
| {'loss': '3.08', 'grad_norm': '0.6403', 'learning_rate': '2.763e-05', 'epoch': '0.8651'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5.24it/s] | |
| {'loss': '3.084', 'grad_norm': '0.6504', 'learning_rate': '2.737e-05', 'epoch': '0.8657'} | |
| {'loss': '3.085', 'grad_norm': '0.623', 'learning_rate': '2.71e-05', 'epoch': '0.8664'} | |
| {'loss': '3.08', 'grad_norm': '0.6216', 'learning_rate': '2.684e-05', 'epoch': '0.867'} | |
| {'loss': '3.082', 'grad_norm': '0.6374', 'learning_rate': '2.658e-05', 'epoch': '0.8677'} | |
| {'loss': '3.078', 'grad_norm': '0.6831', 'learning_rate': '2.632e-05', 'epoch': '0.8684'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.96it/s] | |
| {'loss': '3.085', 'grad_norm': '0.6528', 'learning_rate': '2.607e-05', 'epoch': '0.869'} | |
| {'loss': '3.08', 'grad_norm': '0.6289', 'learning_rate': '2.581e-05', 'epoch': '0.8697'} | |
| {'loss': '3.077', 'grad_norm': '0.6318', 'learning_rate': '2.556e-05', 'epoch': '0.8703'} | |
| {'loss': '3.078', 'grad_norm': '0.6657', 'learning_rate': '2.53e-05', 'epoch': '0.871'} | |
| {'loss': '3.08', 'grad_norm': '0.6297', 'learning_rate': '2.505e-05', 'epoch': '0.8716'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.77it/s] | |
| {'loss': '3.078', 'grad_norm': '0.6604', 'learning_rate': '2.48e-05', 'epoch': '0.8723'} | |
| {'loss': '3.082', 'grad_norm': '0.6406', 'learning_rate': '2.455e-05', 'epoch': '0.8729'} | |
| {'loss': '3.085', 'grad_norm': '0.6484', 'learning_rate': '2.43e-05', 'epoch': '0.8736'} | |
| {'loss': '3.077', 'grad_norm': '0.632', 'learning_rate': '2.405e-05', 'epoch': '0.8742'} | |
| {'loss': '3.078', 'grad_norm': '0.639', 'learning_rate': '2.38e-05', 'epoch': '0.8749'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.51it/s] | |
| {'loss': '3.081', 'grad_norm': '0.6376', 'learning_rate': '2.356e-05', 'epoch': '0.8756'} | |
| {'loss': '3.075', 'grad_norm': '0.6321', 'learning_rate': '2.331e-05', 'epoch': '0.8762'} | |
| {'loss': '3.077', 'grad_norm': '0.6607', 'learning_rate': '2.307e-05', 'epoch': '0.8769'} | |
| {'loss': '3.082', 'grad_norm': '0.7398', 'learning_rate': '2.283e-05', 'epoch': '0.8775'} | |
| {'loss': '3.082', 'grad_norm': '0.6342', 'learning_rate': '2.259e-05', 'epoch': '0.8782'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.07it/s] | |
| {'loss': '3.079', 'grad_norm': '0.6187', 'learning_rate': '2.235e-05', 'epoch': '0.8788'} | |
| {'loss': '3.076', 'grad_norm': '0.6446', 'learning_rate': '2.211e-05', 'epoch': '0.8795'} | |
| {'loss': '3.075', 'grad_norm': '0.6186', 'learning_rate': '2.187e-05', 'epoch': '0.8801'} | |
| {'loss': '3.075', 'grad_norm': '0.6268', 'learning_rate': '2.164e-05', 'epoch': '0.8808'} | |
| {'loss': '3.076', 'grad_norm': '0.6597', 'learning_rate': '2.14e-05', 'epoch': '0.8815'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.37it/s] | |
| {'loss': '3.081', 'grad_norm': '0.6454', 'learning_rate': '2.117e-05', 'epoch': '0.8821'} | |
| {'loss': '3.084', 'grad_norm': '0.6401', 'learning_rate': '2.094e-05', 'epoch': '0.8828'} | |
| {'loss': '3.08', 'grad_norm': '0.6363', 'learning_rate': '2.071e-05', 'epoch': '0.8834'} | |
| {'loss': '3.082', 'grad_norm': '0.6486', 'learning_rate': '2.048e-05', 'epoch': '0.8841'} | |
| {'loss': '3.077', 'grad_norm': '0.6501', 'learning_rate': '2.025e-05', 'epoch': '0.8847'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.04it/s] | |
| {'loss': '3.082', 'grad_norm': '0.6356', 'learning_rate': '2.002e-05', 'epoch': '0.8854'} | |
| {'loss': '3.078', 'grad_norm': '0.6306', 'learning_rate': '1.98e-05', 'epoch': '0.886'} | |
| {'loss': '3.085', 'grad_norm': '0.644', 'learning_rate': '1.957e-05', 'epoch': '0.8867'} | |
| {'loss': '3.08', 'grad_norm': '0.6425', 'learning_rate': '1.935e-05', 'epoch': '0.8874'} | |
| {'loss': '3.079', 'grad_norm': '0.6338', 'learning_rate': '1.913e-05', 'epoch': '0.888'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.25it/s] | |
| {'loss': '3.078', 'grad_norm': '0.6506', 'learning_rate': '1.891e-05', 'epoch': '0.8887'} | |
| {'loss': '3.078', 'grad_norm': '0.6191', 'learning_rate': '1.869e-05', 'epoch': '0.8893'} | |
| {'loss': '3.075', 'grad_norm': '0.6515', 'learning_rate': '1.847e-05', 'epoch': '0.89'} | |
| {'loss': '3.073', 'grad_norm': '0.6753', 'learning_rate': '1.825e-05', 'epoch': '0.8906'} | |
| {'loss': '3.072', 'grad_norm': '0.6671', 'learning_rate': '1.804e-05', 'epoch': '0.8913'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5.86it/s] | |
| {'loss': '3.078', 'grad_norm': '0.6234', 'learning_rate': '1.782e-05', 'epoch': '0.8919'} | |
| {'loss': '3.072', 'grad_norm': '0.6311', 'learning_rate': '1.761e-05', 'epoch': '0.8926'} | |
| {'loss': '3.075', 'grad_norm': '0.6438', 'learning_rate': '1.74e-05', 'epoch': '0.8933'} | |
| {'loss': '3.071', 'grad_norm': '0.6311', 'learning_rate': '1.718e-05', 'epoch': '0.8939'} | |
| {'loss': '3.076', 'grad_norm': '0.6256', 'learning_rate': '1.698e-05', 'epoch': '0.8946'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.93it/s] | |
| {'loss': '3.079', 'grad_norm': '0.6462', 'learning_rate': '1.677e-05', 'epoch': '0.8952'} | |
| {'loss': '3.079', 'grad_norm': '0.636', 'learning_rate': '1.656e-05', 'epoch': '0.8959'} | |
| {'loss': '3.076', 'grad_norm': '0.6386', 'learning_rate': '1.635e-05', 'epoch': '0.8965'} | |
| {'loss': '3.081', 'grad_norm': '0.6233', 'learning_rate': '1.615e-05', 'epoch': '0.8972'} | |
| {'loss': '3.069', 'grad_norm': '0.6368', 'learning_rate': '1.595e-05', 'epoch': '0.8978'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.44it/s] | |
| {'loss': '3.074', 'grad_norm': '0.6314', 'learning_rate': '1.574e-05', 'epoch': '0.8985'} | |
| {'loss': '3.075', 'grad_norm': '0.6271', 'learning_rate': '1.554e-05', 'epoch': '0.8992'} | |
| {'loss': '3.073', 'grad_norm': '0.634', 'learning_rate': '1.534e-05', 'epoch': '0.8998'} | |
| {'loss': '3.078', 'grad_norm': '0.645', 'learning_rate': '1.515e-05', 'epoch': '0.9005'} | |
| {'loss': '3.078', 'grad_norm': '0.6711', 'learning_rate': '1.495e-05', 'epoch': '0.9011'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5.93it/s] | |
| {'loss': '3.074', 'grad_norm': '0.6349', 'learning_rate': '1.475e-05', 'epoch': '0.9018'} | |
| {'loss': '3.074', 'grad_norm': '0.6345', 'learning_rate': '1.456e-05', 'epoch': '0.9024'} | |
| {'loss': '3.075', 'grad_norm': '0.6464', 'learning_rate': '1.436e-05', 'epoch': '0.9031'} | |
| {'loss': '3.079', 'grad_norm': '0.6274', 'learning_rate': '1.417e-05', 'epoch': '0.9037'} | |
| {'loss': '3.074', 'grad_norm': '0.6355', 'learning_rate': '1.398e-05', 'epoch': '0.9044'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.02it/s] | |
| {'loss': '3.075', 'grad_norm': '0.6217', 'learning_rate': '1.379e-05', 'epoch': '0.9051'} | |
| {'loss': '3.074', 'grad_norm': '0.6245', 'learning_rate': '1.36e-05', 'epoch': '0.9057'} | |
| {'loss': '3.072', 'grad_norm': '0.6193', 'learning_rate': '1.342e-05', 'epoch': '0.9064'} | |
| {'loss': '3.077', 'grad_norm': '0.6435', 'learning_rate': '1.323e-05', 'epoch': '0.907'} | |
| {'loss': '3.078', 'grad_norm': '0.6243', 'learning_rate': '1.305e-05', 'epoch': '0.9077'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5.76it/s] | |
| {'loss': '3.076', 'grad_norm': '0.6273', 'learning_rate': '1.286e-05', 'epoch': '0.9083'} | |
| {'loss': '3.074', 'grad_norm': '0.6271', 'learning_rate': '1.268e-05', 'epoch': '0.909'} | |
| {'loss': '3.072', 'grad_norm': '0.63', 'learning_rate': '1.25e-05', 'epoch': '0.9096'} | |
| {'loss': '3.076', 'grad_norm': '0.6378', 'learning_rate': '1.232e-05', 'epoch': '0.9103'} | |
| {'loss': '3.077', 'grad_norm': '0.6329', 'learning_rate': '1.214e-05', 'epoch': '0.9109'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5.80it/s] | |
| {'loss': '3.078', 'grad_norm': '0.6295', 'learning_rate': '1.197e-05', 'epoch': '0.9116'} | |
| {'loss': '3.071', 'grad_norm': '0.649', 'learning_rate': '1.179e-05', 'epoch': '0.9123'} | |
| {'loss': '3.073', 'grad_norm': '0.6377', 'learning_rate': '1.162e-05', 'epoch': '0.9129'} | |
| {'loss': '3.08', 'grad_norm': '0.6557', 'learning_rate': '1.144e-05', 'epoch': '0.9136'} | |
| {'loss': '3.074', 'grad_norm': '0.6369', 'learning_rate': '1.127e-05', 'epoch': '0.9142'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.94it/s] | |
| {'loss': '3.073', 'grad_norm': '0.6011', 'learning_rate': '1.11e-05', 'epoch': '0.9149'} | |
| {'loss': '3.071', 'grad_norm': '0.6239', 'learning_rate': '1.093e-05', 'epoch': '0.9155'} | |
| {'loss': '3.07', 'grad_norm': '0.638', 'learning_rate': '1.076e-05', 'epoch': '0.9162'} | |
| {'loss': '3.073', 'grad_norm': '0.6384', 'learning_rate': '1.06e-05', 'epoch': '0.9168'} | |
| {'loss': '3.071', 'grad_norm': '0.6404', 'learning_rate': '1.043e-05', 'epoch': '0.9175'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.03it/s] | |
| {'loss': '3.075', 'grad_norm': '0.6294', 'learning_rate': '1.027e-05', 'epoch': '0.9182'} | |
| {'loss': '3.073', 'grad_norm': '0.6271', 'learning_rate': '1.01e-05', 'epoch': '0.9188'} | |
| {'loss': '3.072', 'grad_norm': '0.6266', 'learning_rate': '9.943e-06', 'epoch': '0.9195'} | |
| {'loss': '3.07', 'grad_norm': '0.6384', 'learning_rate': '9.783e-06', 'epoch': '0.9201'} | |
| {'loss': '3.066', 'grad_norm': '0.6363', 'learning_rate': '9.624e-06', 'epoch': '0.9208'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.71it/s] | |
| {'loss': '3.074', 'grad_norm': '0.6663', 'learning_rate': '9.466e-06', 'epoch': '0.9214'} | |
| {'loss': '3.072', 'grad_norm': '0.6378', 'learning_rate': '9.31e-06', 'epoch': '0.9221'} | |
| {'loss': '3.075', 'grad_norm': '0.6404', 'learning_rate': '9.154e-06', 'epoch': '0.9227'} | |
| {'loss': '3.069', 'grad_norm': '0.6214', 'learning_rate': '9.001e-06', 'epoch': '0.9234'} | |
| {'loss': '3.067', 'grad_norm': '0.6255', 'learning_rate': '8.848e-06', 'epoch': '0.9241'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.68it/s] | |
| {'loss': '3.07', 'grad_norm': '0.658', 'learning_rate': '8.697e-06', 'epoch': '0.9247'} | |
| {'loss': '3.074', 'grad_norm': '0.6355', 'learning_rate': '8.547e-06', 'epoch': '0.9254'} | |
| {'loss': '3.064', 'grad_norm': '0.639', 'learning_rate': '8.398e-06', 'epoch': '0.926'} | |
| {'loss': '3.076', 'grad_norm': '0.6207', 'learning_rate': '8.251e-06', 'epoch': '0.9267'} | |
| {'loss': '3.066', 'grad_norm': '0.6397', 'learning_rate': '8.104e-06', 'epoch': '0.9273'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.74it/s] | |
| {'loss': '3.075', 'grad_norm': '0.6619', 'learning_rate': '7.96e-06', 'epoch': '0.928'} | |
| {'loss': '3.072', 'grad_norm': '0.6326', 'learning_rate': '7.816e-06', 'epoch': '0.9286'} | |
| {'loss': '3.073', 'grad_norm': '0.6189', 'learning_rate': '7.674e-06', 'epoch': '0.9293'} | |
| {'loss': '3.071', 'grad_norm': '0.6381', 'learning_rate': '7.533e-06', 'epoch': '0.93'} | |
| {'loss': '3.074', 'grad_norm': '0.6502', 'learning_rate': '7.393e-06', 'epoch': '0.9306'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.69it/s] | |
| {'loss': '3.072', 'grad_norm': '0.6347', 'learning_rate': '7.255e-06', 'epoch': '0.9313'} | |
| {'loss': '3.074', 'grad_norm': '0.6605', 'learning_rate': '7.117e-06', 'epoch': '0.9319'} | |
| {'loss': '3.067', 'grad_norm': '0.6154', 'learning_rate': '6.982e-06', 'epoch': '0.9326'} | |
| {'loss': '3.07', 'grad_norm': '0.6206', 'learning_rate': '6.847e-06', 'epoch': '0.9332'} | |
| {'loss': '3.079', 'grad_norm': '0.6341', 'learning_rate': '6.714e-06', 'epoch': '0.9339'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.44it/s] | |
| {'loss': '3.07', 'grad_norm': '0.6226', 'learning_rate': '6.582e-06', 'epoch': '0.9345'} | |
| {'loss': '3.075', 'grad_norm': '0.6262', 'learning_rate': '6.451e-06', 'epoch': '0.9352'} | |
| {'loss': '3.069', 'grad_norm': '0.6522', 'learning_rate': '6.322e-06', 'epoch': '0.9359'} | |
| {'loss': '3.072', 'grad_norm': '0.6267', 'learning_rate': '6.194e-06', 'epoch': '0.9365'} | |
| {'loss': '3.067', 'grad_norm': '0.6446', 'learning_rate': '6.067e-06', 'epoch': '0.9372'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.72it/s] | |
| {'loss': '3.065', 'grad_norm': '0.6418', 'learning_rate': '5.942e-06', 'epoch': '0.9378'} | |
| {'loss': '3.063', 'grad_norm': '0.642', 'learning_rate': '5.817e-06', 'epoch': '0.9385'} | |
| {'loss': '3.073', 'grad_norm': '0.6267', 'learning_rate': '5.695e-06', 'epoch': '0.9391'} | |
| {'loss': '3.072', 'grad_norm': '0.6296', 'learning_rate': '5.573e-06', 'epoch': '0.9398'} | |
| {'loss': '3.071', 'grad_norm': '0.642', 'learning_rate': '5.453e-06', 'epoch': '0.9404'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.25it/s] | |
| {'loss': '3.073', 'grad_norm': '0.6269', 'learning_rate': '5.334e-06', 'epoch': '0.9411'} | |
| {'loss': '3.066', 'grad_norm': '0.6304', 'learning_rate': '5.216e-06', 'epoch': '0.9418'} | |
| {'loss': '3.073', 'grad_norm': '0.6287', 'learning_rate': '5.1e-06', 'epoch': '0.9424'} | |
| {'loss': '3.068', 'grad_norm': '0.6149', 'learning_rate': '4.985e-06', 'epoch': '0.9431'} | |
| {'loss': '3.067', 'grad_norm': '0.6215', 'learning_rate': '4.871e-06', 'epoch': '0.9437'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5.65it/s] | |
| {'loss': '3.071', 'grad_norm': '0.6201', 'learning_rate': '4.758e-06', 'epoch': '0.9444'} | |
| {'loss': '3.067', 'grad_norm': '0.649', 'learning_rate': '4.647e-06', 'epoch': '0.945'} | |
| {'loss': '3.067', 'grad_norm': '0.6585', 'learning_rate': '4.537e-06', 'epoch': '0.9457'} | |
| {'loss': '3.072', 'grad_norm': '0.6165', 'learning_rate': '4.429e-06', 'epoch': '0.9463'} | |
| {'loss': '3.071', 'grad_norm': '0.6314', 'learning_rate': '4.322e-06', 'epoch': '0.947'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.75it/s] | |
| {'loss': '3.069', 'grad_norm': '0.6282', 'learning_rate': '4.216e-06', 'epoch': '0.9476'} | |
| {'loss': '3.066', 'grad_norm': '0.6435', 'learning_rate': '4.111e-06', 'epoch': '0.9483'} | |
| {'loss': '3.067', 'grad_norm': '0.6286', 'learning_rate': '4.008e-06', 'epoch': '0.949'} | |
| {'loss': '3.074', 'grad_norm': '0.6373', 'learning_rate': '3.906e-06', 'epoch': '0.9496'} | |
| {'loss': '3.068', 'grad_norm': '0.6283', 'learning_rate': '3.805e-06', 'epoch': '0.9503'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.93it/s] | |
| {'loss': '3.07', 'grad_norm': '0.6298', 'learning_rate': '3.706e-06', 'epoch': '0.9509'} | |
| {'loss': '3.069', 'grad_norm': '0.6145', 'learning_rate': '3.607e-06', 'epoch': '0.9516'} | |
| {'loss': '3.064', 'grad_norm': '0.6296', 'learning_rate': '3.511e-06', 'epoch': '0.9522'} | |
| {'loss': '3.064', 'grad_norm': '0.6225', 'learning_rate': '3.415e-06', 'epoch': '0.9529'} | |
| {'loss': '3.068', 'grad_norm': '0.6201', 'learning_rate': '3.321e-06', 'epoch': '0.9535'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 2.93it/s] | |
| {'loss': '3.067', 'grad_norm': '0.6384', 'learning_rate': '3.228e-06', 'epoch': '0.9542'} | |
| {'loss': '3.068', 'grad_norm': '0.623', 'learning_rate': '3.137e-06', 'epoch': '0.9549'} | |
| {'loss': '3.071', 'grad_norm': '0.639', 'learning_rate': '3.046e-06', 'epoch': '0.9555'} | |
| {'loss': '3.06', 'grad_norm': '0.639', 'learning_rate': '2.957e-06', 'epoch': '0.9562'} | |
| {'loss': '3.072', 'grad_norm': '0.6321', 'learning_rate': '2.87e-06', 'epoch': '0.9568'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.49it/s] | |
| {'loss': '3.074', 'grad_norm': '0.6452', 'learning_rate': '2.783e-06', 'epoch': '0.9575'} | |
| {'loss': '3.067', 'grad_norm': '0.6178', 'learning_rate': '2.698e-06', 'epoch': '0.9581'} | |
| {'loss': '3.068', 'grad_norm': '0.6036', 'learning_rate': '2.615e-06', 'epoch': '0.9588'} | |
| {'loss': '3.071', 'grad_norm': '0.6093', 'learning_rate': '2.532e-06', 'epoch': '0.9594'} | |
| {'loss': '3.067', 'grad_norm': '0.6378', 'learning_rate': '2.451e-06', 'epoch': '0.9601'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.66it/s] | |
| {'loss': '3.067', 'grad_norm': '0.6776', 'learning_rate': '2.372e-06', 'epoch': '0.9608'} | |
| {'loss': '3.066', 'grad_norm': '0.6275', 'learning_rate': '2.293e-06', 'epoch': '0.9614'} | |
| {'loss': '3.072', 'grad_norm': '0.6393', 'learning_rate': '2.216e-06', 'epoch': '0.9621'} | |
| {'loss': '3.07', 'grad_norm': '0.6164', 'learning_rate': '2.14e-06', 'epoch': '0.9627'} | |
| {'loss': '3.067', 'grad_norm': '0.6251', 'learning_rate': '2.066e-06', 'epoch': '0.9634'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.52it/s] | |
| {'loss': '3.064', 'grad_norm': '0.6521', 'learning_rate': '1.993e-06', 'epoch': '0.964'} | |
| {'loss': '3.071', 'grad_norm': '0.6272', 'learning_rate': '1.921e-06', 'epoch': '0.9647'} | |
| {'loss': '3.07', 'grad_norm': '0.602', 'learning_rate': '1.85e-06', 'epoch': '0.9653'} | |
| {'loss': '3.074', 'grad_norm': '0.6393', 'learning_rate': '1.781e-06', 'epoch': '0.966'} | |
| {'loss': '3.075', 'grad_norm': '0.6397', 'learning_rate': '1.713e-06', 'epoch': '0.9667'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.51it/s] | |
| {'loss': '3.071', 'grad_norm': '0.618', 'learning_rate': '1.646e-06', 'epoch': '0.9673'} | |
| {'loss': '3.066', 'grad_norm': '0.6098', 'learning_rate': '1.581e-06', 'epoch': '0.968'} | |
| {'loss': '3.069', 'grad_norm': '0.6137', 'learning_rate': '1.517e-06', 'epoch': '0.9686'} | |
| {'loss': '3.065', 'grad_norm': '0.6376', 'learning_rate': '1.454e-06', 'epoch': '0.9693'} | |
| {'loss': '3.069', 'grad_norm': '0.6404', 'learning_rate': '1.393e-06', 'epoch': '0.9699'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.27it/s] | |
| {'loss': '3.064', 'grad_norm': '0.6353', 'learning_rate': '1.333e-06', 'epoch': '0.9706'} | |
| {'loss': '3.066', 'grad_norm': '0.6159', 'learning_rate': '1.274e-06', 'epoch': '0.9712'} | |
| {'loss': '3.068', 'grad_norm': '0.619', 'learning_rate': '1.217e-06', 'epoch': '0.9719'} | |
| {'loss': '3.071', 'grad_norm': '0.6164', 'learning_rate': '1.161e-06', 'epoch': '0.9726'} | |
| {'loss': '3.072', 'grad_norm': '0.6158', 'learning_rate': '1.106e-06', 'epoch': '0.9732'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5.81it/s] | |
| {'loss': '3.06', 'grad_norm': '0.6324', 'learning_rate': '1.053e-06', 'epoch': '0.9739'} | |
| {'loss': '3.067', 'grad_norm': '0.6276', 'learning_rate': '1.001e-06', 'epoch': '0.9745'} | |
| {'loss': '3.067', 'grad_norm': '0.6294', 'learning_rate': '9.5e-07', 'epoch': '0.9752'} | |
| {'loss': '3.071', 'grad_norm': '0.6368', 'learning_rate': '9.005e-07', 'epoch': '0.9758'} | |
| {'loss': '3.061', 'grad_norm': '0.6432', 'learning_rate': '8.524e-07', 'epoch': '0.9765'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.73it/s] | |
| {'loss': '3.066', 'grad_norm': '0.6291', 'learning_rate': '8.056e-07', 'epoch': '0.9771'} | |
| {'loss': '3.062', 'grad_norm': '0.6361', 'learning_rate': '7.601e-07', 'epoch': '0.9778'} | |
| {'loss': '3.061', 'grad_norm': '0.6068', 'learning_rate': '7.159e-07', 'epoch': '0.9785'} | |
| {'loss': '3.063', 'grad_norm': '0.6391', 'learning_rate': '6.73e-07', 'epoch': '0.9791'} | |
| {'loss': '3.077', 'grad_norm': '0.6224', 'learning_rate': '6.315e-07', 'epoch': '0.9798'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 2.92it/s] | |
| {'loss': '3.067', 'grad_norm': '0.6116', 'learning_rate': '5.913e-07', 'epoch': '0.9804'} | |
| {'loss': '3.065', 'grad_norm': '0.6334', 'learning_rate': '5.524e-07', 'epoch': '0.9811'} | |
| {'loss': '3.072', 'grad_norm': '0.6061', 'learning_rate': '5.148e-07', 'epoch': '0.9817'} | |
| {'loss': '3.07', 'grad_norm': '0.643', 'learning_rate': '4.786e-07', 'epoch': '0.9824'} | |
| {'loss': '3.065', 'grad_norm': '0.6425', 'learning_rate': '4.437e-07', 'epoch': '0.983'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.65it/s] | |
| {'loss': '3.068', 'grad_norm': '0.6226', 'learning_rate': '4.101e-07', 'epoch': '0.9837'} | |
| {'loss': '3.065', 'grad_norm': '0.6083', 'learning_rate': '3.778e-07', 'epoch': '0.9844'} | |
| {'loss': '3.067', 'grad_norm': '0.6412', 'learning_rate': '3.468e-07', 'epoch': '0.985'} | |
| {'loss': '3.069', 'grad_norm': '0.6097', 'learning_rate': '3.172e-07', 'epoch': '0.9857'} | |
| {'loss': '3.061', 'grad_norm': '0.6264', 'learning_rate': '2.889e-07', 'epoch': '0.9863'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.04it/s] | |
| {'loss': '3.073', 'grad_norm': '0.6309', 'learning_rate': '2.619e-07', 'epoch': '0.987'} | |
| {'loss': '3.068', 'grad_norm': '0.6195', 'learning_rate': '2.362e-07', 'epoch': '0.9876'} | |
| {'loss': '3.066', 'grad_norm': '0.6339', 'learning_rate': '2.119e-07', 'epoch': '0.9883'} | |
| {'loss': '3.068', 'grad_norm': '0.6069', 'learning_rate': '1.888e-07', 'epoch': '0.9889'} | |
| {'loss': '3.069', 'grad_norm': '0.6255', 'learning_rate': '1.672e-07', 'epoch': '0.9896'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 3.28it/s] | |
| {'loss': '3.074', 'grad_norm': '0.6384', 'learning_rate': '1.468e-07', 'epoch': '0.9902'} | |
| {'loss': '3.075', 'grad_norm': '0.6445', 'learning_rate': '1.277e-07', 'epoch': '0.9909'} | |
| {'loss': '3.067', 'grad_norm': '0.6542', 'learning_rate': '1.1e-07', 'epoch': '0.9916'} | |
| {'loss': '3.068', 'grad_norm': '0.6511', 'learning_rate': '9.359e-08', 'epoch': '0.9922'} | |
| {'loss': '3.068', 'grad_norm': '0.626', 'learning_rate': '7.851e-08', 'epoch': '0.9929'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 4.03it/s] | |
| {'loss': '3.064', 'grad_norm': '0.637', 'learning_rate': '6.476e-08', 'epoch': '0.9935'} | |
| {'loss': '3.062', 'grad_norm': '0.6444', 'learning_rate': '5.232e-08', 'epoch': '0.9942'} | |
| {'loss': '3.064', 'grad_norm': '0.626', 'learning_rate': '4.121e-08', 'epoch': '0.9948'} | |
| {'loss': '3.072', 'grad_norm': '0.6168', 'learning_rate': '3.143e-08', 'epoch': '0.9955'} | |
| {'loss': '3.066', 'grad_norm': '0.6374', 'learning_rate': '2.297e-08', 'epoch': '0.9961'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5.86it/s] | |
| {'loss': '3.066', 'grad_norm': '0.6623', 'learning_rate': '1.583e-08', 'epoch': '0.9968'} | |
| {'loss': '3.067', 'grad_norm': '0.6331', 'learning_rate': '1.002e-08', 'epoch': '0.9975'} | |
| {'loss': '3.07', 'grad_norm': '0.6147', 'learning_rate': '5.53e-09', 'epoch': '0.9981'} | |
| {'loss': '3.069', 'grad_norm': '0.6511', 'learning_rate': '2.365e-09', 'epoch': '0.9988'} | |
| {'loss': '3.072', 'grad_norm': '0.6274', 'learning_rate': '5.244e-10', 'epoch': '0.9994'} | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.92it/s] | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 6.04it/s] | |
| {'train_runtime': '2.965e+05', 'train_samples_per_second': '65.86', 'train_steps_per_second': '0.515', 'train_loss': '3.259', 'epoch': '1'} | |
| 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 152588/152588 [82:22:20<00:00, 1.94s/it] | |
| Writing model shards: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 1/1 [00:00<00:00, 5.65it/s] | |
| [*] Training finished. | |