Text Generation
Transformers
Safetensors
llama
Generated from Trainer
trl
sft
conversational
text-generation-inference
Instructions to use cfei621/OlympicCoder-32B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use cfei621/OlympicCoder-32B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="cfei621/OlympicCoder-32B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("cfei621/OlympicCoder-32B") model = AutoModelForCausalLM.from_pretrained("cfei621/OlympicCoder-32B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use cfei621/OlympicCoder-32B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "cfei621/OlympicCoder-32B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "cfei621/OlympicCoder-32B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/cfei621/OlympicCoder-32B
- SGLang
How to use cfei621/OlympicCoder-32B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "cfei621/OlympicCoder-32B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "cfei621/OlympicCoder-32B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "cfei621/OlympicCoder-32B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "cfei621/OlympicCoder-32B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use cfei621/OlympicCoder-32B with Docker Model Runner:
docker model run hf.co/cfei621/OlympicCoder-32B
| { | |
| "best_metric": null, | |
| "best_model_checkpoint": null, | |
| "epoch": 9.944281524926687, | |
| "eval_steps": 500, | |
| "global_step": 1700, | |
| "is_hyper_param_search": false, | |
| "is_local_process_zero": true, | |
| "is_world_process_zero": true, | |
| "log_history": [ | |
| { | |
| "epoch": 0.005865102639296188, | |
| "grad_norm": 53.88819372213907, | |
| "learning_rate": 7.843137254901962e-07, | |
| "loss": 2.5558, | |
| "mean_token_accuracy": 0.4949319437146187, | |
| "step": 1 | |
| }, | |
| { | |
| "epoch": 0.011730205278592375, | |
| "grad_norm": 45.53813500273709, | |
| "learning_rate": 1.5686274509803923e-06, | |
| "loss": 2.6383, | |
| "mean_token_accuracy": 0.4859154038131237, | |
| "step": 2 | |
| }, | |
| { | |
| "epoch": 0.017595307917888565, | |
| "grad_norm": 46.07293762408888, | |
| "learning_rate": 2.3529411764705885e-06, | |
| "loss": 2.5158, | |
| "mean_token_accuracy": 0.48826197162270546, | |
| "step": 3 | |
| }, | |
| { | |
| "epoch": 0.02346041055718475, | |
| "grad_norm": 21.731930798046225, | |
| "learning_rate": 3.1372549019607846e-06, | |
| "loss": 2.4153, | |
| "mean_token_accuracy": 0.507574986666441, | |
| "step": 4 | |
| }, | |
| { | |
| "epoch": 0.02932551319648094, | |
| "grad_norm": 13.719250081157423, | |
| "learning_rate": 3.92156862745098e-06, | |
| "loss": 2.1828, | |
| "mean_token_accuracy": 0.5369513519108295, | |
| "step": 5 | |
| }, | |
| { | |
| "epoch": 0.03519061583577713, | |
| "grad_norm": 11.296846771241944, | |
| "learning_rate": 4.705882352941177e-06, | |
| "loss": 2.0254, | |
| "mean_token_accuracy": 0.5546183437108994, | |
| "step": 6 | |
| }, | |
| { | |
| "epoch": 0.04105571847507331, | |
| "grad_norm": 11.181546197747945, | |
| "learning_rate": 5.4901960784313735e-06, | |
| "loss": 2.0634, | |
| "mean_token_accuracy": 0.5418906435370445, | |
| "step": 7 | |
| }, | |
| { | |
| "epoch": 0.0469208211143695, | |
| "grad_norm": 9.95073085746692, | |
| "learning_rate": 6.274509803921569e-06, | |
| "loss": 1.909, | |
| "mean_token_accuracy": 0.5698892250657082, | |
| "step": 8 | |
| }, | |
| { | |
| "epoch": 0.05278592375366569, | |
| "grad_norm": 8.750145661654372, | |
| "learning_rate": 7.058823529411766e-06, | |
| "loss": 1.7282, | |
| "mean_token_accuracy": 0.5925135612487793, | |
| "step": 9 | |
| }, | |
| { | |
| "epoch": 0.05865102639296188, | |
| "grad_norm": 6.881468868221019, | |
| "learning_rate": 7.84313725490196e-06, | |
| "loss": 1.6569, | |
| "mean_token_accuracy": 0.6076302081346512, | |
| "step": 10 | |
| }, | |
| { | |
| "epoch": 0.06451612903225806, | |
| "grad_norm": 8.253095142047671, | |
| "learning_rate": 8.627450980392157e-06, | |
| "loss": 1.5307, | |
| "mean_token_accuracy": 0.6325564980506897, | |
| "step": 11 | |
| }, | |
| { | |
| "epoch": 0.07038123167155426, | |
| "grad_norm": 7.4261225290773485, | |
| "learning_rate": 9.411764705882354e-06, | |
| "loss": 1.4483, | |
| "mean_token_accuracy": 0.6526486948132515, | |
| "step": 12 | |
| }, | |
| { | |
| "epoch": 0.07624633431085044, | |
| "grad_norm": 6.779863395778969, | |
| "learning_rate": 1.0196078431372549e-05, | |
| "loss": 1.4433, | |
| "mean_token_accuracy": 0.6399537846446037, | |
| "step": 13 | |
| }, | |
| { | |
| "epoch": 0.08211143695014662, | |
| "grad_norm": 6.268466773477867, | |
| "learning_rate": 1.0980392156862747e-05, | |
| "loss": 1.5392, | |
| "mean_token_accuracy": 0.6327727138996124, | |
| "step": 14 | |
| }, | |
| { | |
| "epoch": 0.08797653958944282, | |
| "grad_norm": 6.160706527674582, | |
| "learning_rate": 1.1764705882352942e-05, | |
| "loss": 1.4847, | |
| "mean_token_accuracy": 0.6257975250482559, | |
| "step": 15 | |
| }, | |
| { | |
| "epoch": 0.093841642228739, | |
| "grad_norm": 5.222490978742135, | |
| "learning_rate": 1.2549019607843138e-05, | |
| "loss": 1.3997, | |
| "mean_token_accuracy": 0.6548606380820274, | |
| "step": 16 | |
| }, | |
| { | |
| "epoch": 0.09970674486803519, | |
| "grad_norm": 4.975143902167062, | |
| "learning_rate": 1.3333333333333333e-05, | |
| "loss": 1.2949, | |
| "mean_token_accuracy": 0.6746953576803207, | |
| "step": 17 | |
| }, | |
| { | |
| "epoch": 0.10557184750733138, | |
| "grad_norm": 5.673987761988572, | |
| "learning_rate": 1.4117647058823532e-05, | |
| "loss": 1.2628, | |
| "mean_token_accuracy": 0.6803952604532242, | |
| "step": 18 | |
| }, | |
| { | |
| "epoch": 0.11143695014662756, | |
| "grad_norm": 4.956998983619939, | |
| "learning_rate": 1.4901960784313726e-05, | |
| "loss": 1.4657, | |
| "mean_token_accuracy": 0.6383700221776962, | |
| "step": 19 | |
| }, | |
| { | |
| "epoch": 0.11730205278592376, | |
| "grad_norm": 4.6674469767268665, | |
| "learning_rate": 1.568627450980392e-05, | |
| "loss": 1.3336, | |
| "mean_token_accuracy": 0.6622527092695236, | |
| "step": 20 | |
| }, | |
| { | |
| "epoch": 0.12316715542521994, | |
| "grad_norm": 4.56146568749882, | |
| "learning_rate": 1.647058823529412e-05, | |
| "loss": 1.2586, | |
| "mean_token_accuracy": 0.6755763664841652, | |
| "step": 21 | |
| }, | |
| { | |
| "epoch": 0.12903225806451613, | |
| "grad_norm": 4.178539403683921, | |
| "learning_rate": 1.7254901960784314e-05, | |
| "loss": 1.2955, | |
| "mean_token_accuracy": 0.6781341508030891, | |
| "step": 22 | |
| }, | |
| { | |
| "epoch": 0.1348973607038123, | |
| "grad_norm": 4.691795666447551, | |
| "learning_rate": 1.8039215686274513e-05, | |
| "loss": 1.2063, | |
| "mean_token_accuracy": 0.7020066231489182, | |
| "step": 23 | |
| }, | |
| { | |
| "epoch": 0.14076246334310852, | |
| "grad_norm": 4.393626747850234, | |
| "learning_rate": 1.8823529411764708e-05, | |
| "loss": 1.2492, | |
| "mean_token_accuracy": 0.6896483823657036, | |
| "step": 24 | |
| }, | |
| { | |
| "epoch": 0.1466275659824047, | |
| "grad_norm": 4.600031146460728, | |
| "learning_rate": 1.9607843137254903e-05, | |
| "loss": 1.2452, | |
| "mean_token_accuracy": 0.6773535162210464, | |
| "step": 25 | |
| }, | |
| { | |
| "epoch": 0.15249266862170088, | |
| "grad_norm": 4.750027023339882, | |
| "learning_rate": 2.0392156862745097e-05, | |
| "loss": 1.3096, | |
| "mean_token_accuracy": 0.6744465455412865, | |
| "step": 26 | |
| }, | |
| { | |
| "epoch": 0.15835777126099707, | |
| "grad_norm": 5.016234982836503, | |
| "learning_rate": 2.1176470588235296e-05, | |
| "loss": 1.2253, | |
| "mean_token_accuracy": 0.6968231871724129, | |
| "step": 27 | |
| }, | |
| { | |
| "epoch": 0.16422287390029325, | |
| "grad_norm": 4.1501723534181165, | |
| "learning_rate": 2.1960784313725494e-05, | |
| "loss": 1.0885, | |
| "mean_token_accuracy": 0.7159736528992653, | |
| "step": 28 | |
| }, | |
| { | |
| "epoch": 0.17008797653958943, | |
| "grad_norm": 5.027518185476307, | |
| "learning_rate": 2.274509803921569e-05, | |
| "loss": 1.1875, | |
| "mean_token_accuracy": 0.7006409764289856, | |
| "step": 29 | |
| }, | |
| { | |
| "epoch": 0.17595307917888564, | |
| "grad_norm": 4.695128131871855, | |
| "learning_rate": 2.3529411764705884e-05, | |
| "loss": 1.2233, | |
| "mean_token_accuracy": 0.6893411129713058, | |
| "step": 30 | |
| }, | |
| { | |
| "epoch": 0.18181818181818182, | |
| "grad_norm": 4.444186099052533, | |
| "learning_rate": 2.431372549019608e-05, | |
| "loss": 1.3716, | |
| "mean_token_accuracy": 0.6714634746313095, | |
| "step": 31 | |
| }, | |
| { | |
| "epoch": 0.187683284457478, | |
| "grad_norm": 4.594781832701057, | |
| "learning_rate": 2.5098039215686277e-05, | |
| "loss": 1.0986, | |
| "mean_token_accuracy": 0.7190676406025887, | |
| "step": 32 | |
| }, | |
| { | |
| "epoch": 0.1935483870967742, | |
| "grad_norm": 5.074679164350963, | |
| "learning_rate": 2.5882352941176475e-05, | |
| "loss": 1.2351, | |
| "mean_token_accuracy": 0.6866041272878647, | |
| "step": 33 | |
| }, | |
| { | |
| "epoch": 0.19941348973607037, | |
| "grad_norm": 3.9217364207838075, | |
| "learning_rate": 2.6666666666666667e-05, | |
| "loss": 1.0962, | |
| "mean_token_accuracy": 0.7126260623335838, | |
| "step": 34 | |
| }, | |
| { | |
| "epoch": 0.20527859237536658, | |
| "grad_norm": 4.24349529061146, | |
| "learning_rate": 2.7450980392156865e-05, | |
| "loss": 1.1477, | |
| "mean_token_accuracy": 0.7069110572338104, | |
| "step": 35 | |
| }, | |
| { | |
| "epoch": 0.21114369501466276, | |
| "grad_norm": 4.236355982392962, | |
| "learning_rate": 2.8235294117647063e-05, | |
| "loss": 1.1657, | |
| "mean_token_accuracy": 0.7083533778786659, | |
| "step": 36 | |
| }, | |
| { | |
| "epoch": 0.21700879765395895, | |
| "grad_norm": 4.056198751455204, | |
| "learning_rate": 2.9019607843137258e-05, | |
| "loss": 1.104, | |
| "mean_token_accuracy": 0.7193247005343437, | |
| "step": 37 | |
| }, | |
| { | |
| "epoch": 0.22287390029325513, | |
| "grad_norm": 3.952005738151635, | |
| "learning_rate": 2.9803921568627453e-05, | |
| "loss": 1.0282, | |
| "mean_token_accuracy": 0.7227280139923096, | |
| "step": 38 | |
| }, | |
| { | |
| "epoch": 0.2287390029325513, | |
| "grad_norm": 4.268826077737683, | |
| "learning_rate": 3.0588235294117644e-05, | |
| "loss": 1.325, | |
| "mean_token_accuracy": 0.6643695458769798, | |
| "step": 39 | |
| }, | |
| { | |
| "epoch": 0.23460410557184752, | |
| "grad_norm": 3.8742320180768868, | |
| "learning_rate": 3.137254901960784e-05, | |
| "loss": 1.0486, | |
| "mean_token_accuracy": 0.7284644469618797, | |
| "step": 40 | |
| }, | |
| { | |
| "epoch": 0.2404692082111437, | |
| "grad_norm": 3.5533315273234276, | |
| "learning_rate": 3.215686274509804e-05, | |
| "loss": 1.0222, | |
| "mean_token_accuracy": 0.7333771288394928, | |
| "step": 41 | |
| }, | |
| { | |
| "epoch": 0.24633431085043989, | |
| "grad_norm": 3.9010486506007456, | |
| "learning_rate": 3.294117647058824e-05, | |
| "loss": 1.0616, | |
| "mean_token_accuracy": 0.723753847181797, | |
| "step": 42 | |
| }, | |
| { | |
| "epoch": 0.25219941348973607, | |
| "grad_norm": 3.234591569823163, | |
| "learning_rate": 3.372549019607844e-05, | |
| "loss": 0.8581, | |
| "mean_token_accuracy": 0.7685456275939941, | |
| "step": 43 | |
| }, | |
| { | |
| "epoch": 0.25806451612903225, | |
| "grad_norm": 3.743981662725678, | |
| "learning_rate": 3.450980392156863e-05, | |
| "loss": 1.037, | |
| "mean_token_accuracy": 0.7245888039469719, | |
| "step": 44 | |
| }, | |
| { | |
| "epoch": 0.26392961876832843, | |
| "grad_norm": 3.7804639874968826, | |
| "learning_rate": 3.529411764705883e-05, | |
| "loss": 0.9217, | |
| "mean_token_accuracy": 0.7485020831227303, | |
| "step": 45 | |
| }, | |
| { | |
| "epoch": 0.2697947214076246, | |
| "grad_norm": 3.716939922880826, | |
| "learning_rate": 3.6078431372549025e-05, | |
| "loss": 1.0878, | |
| "mean_token_accuracy": 0.716446116566658, | |
| "step": 46 | |
| }, | |
| { | |
| "epoch": 0.2756598240469208, | |
| "grad_norm": 3.902425102382139, | |
| "learning_rate": 3.686274509803922e-05, | |
| "loss": 1.0478, | |
| "mean_token_accuracy": 0.7267983332276344, | |
| "step": 47 | |
| }, | |
| { | |
| "epoch": 0.28152492668621704, | |
| "grad_norm": 3.4333368458738134, | |
| "learning_rate": 3.7647058823529415e-05, | |
| "loss": 0.9818, | |
| "mean_token_accuracy": 0.7405076771974564, | |
| "step": 48 | |
| }, | |
| { | |
| "epoch": 0.2873900293255132, | |
| "grad_norm": 3.88647483745419, | |
| "learning_rate": 3.8431372549019614e-05, | |
| "loss": 0.9826, | |
| "mean_token_accuracy": 0.766911968588829, | |
| "step": 49 | |
| }, | |
| { | |
| "epoch": 0.2932551319648094, | |
| "grad_norm": 3.98665632736657, | |
| "learning_rate": 3.9215686274509805e-05, | |
| "loss": 1.2055, | |
| "mean_token_accuracy": 0.7004791498184204, | |
| "step": 50 | |
| }, | |
| { | |
| "epoch": 0.2991202346041056, | |
| "grad_norm": 3.2689866643155785, | |
| "learning_rate": 4e-05, | |
| "loss": 0.9299, | |
| "mean_token_accuracy": 0.7512985840439796, | |
| "step": 51 | |
| }, | |
| { | |
| "epoch": 0.30498533724340177, | |
| "grad_norm": 3.6566795475047242, | |
| "learning_rate": 3.999996733363487e-05, | |
| "loss": 1.055, | |
| "mean_token_accuracy": 0.7290665507316589, | |
| "step": 52 | |
| }, | |
| { | |
| "epoch": 0.31085043988269795, | |
| "grad_norm": 3.5319474505893376, | |
| "learning_rate": 3.9999869334658026e-05, | |
| "loss": 0.9811, | |
| "mean_token_accuracy": 0.7426001131534576, | |
| "step": 53 | |
| }, | |
| { | |
| "epoch": 0.31671554252199413, | |
| "grad_norm": 3.729947058428488, | |
| "learning_rate": 3.9999706003425177e-05, | |
| "loss": 1.0335, | |
| "mean_token_accuracy": 0.731833204627037, | |
| "step": 54 | |
| }, | |
| { | |
| "epoch": 0.3225806451612903, | |
| "grad_norm": 3.747352951199307, | |
| "learning_rate": 3.999947734052915e-05, | |
| "loss": 1.1685, | |
| "mean_token_accuracy": 0.7056599855422974, | |
| "step": 55 | |
| }, | |
| { | |
| "epoch": 0.3284457478005865, | |
| "grad_norm": 3.1603059667205113, | |
| "learning_rate": 3.999918334679989e-05, | |
| "loss": 1.0283, | |
| "mean_token_accuracy": 0.7293788716197014, | |
| "step": 56 | |
| }, | |
| { | |
| "epoch": 0.3343108504398827, | |
| "grad_norm": 3.5593596872927393, | |
| "learning_rate": 3.999882402330448e-05, | |
| "loss": 0.9824, | |
| "mean_token_accuracy": 0.7320685908198357, | |
| "step": 57 | |
| }, | |
| { | |
| "epoch": 0.34017595307917886, | |
| "grad_norm": 3.3758708442048717, | |
| "learning_rate": 3.999839937134712e-05, | |
| "loss": 0.9158, | |
| "mean_token_accuracy": 0.7505558356642723, | |
| "step": 58 | |
| }, | |
| { | |
| "epoch": 0.3460410557184751, | |
| "grad_norm": 3.7326151053945926, | |
| "learning_rate": 3.999790939246912e-05, | |
| "loss": 1.1747, | |
| "mean_token_accuracy": 0.7094109430909157, | |
| "step": 59 | |
| }, | |
| { | |
| "epoch": 0.3519061583577713, | |
| "grad_norm": 3.528279867739533, | |
| "learning_rate": 3.999735408844892e-05, | |
| "loss": 0.9405, | |
| "mean_token_accuracy": 0.7575809136033058, | |
| "step": 60 | |
| }, | |
| { | |
| "epoch": 0.35777126099706746, | |
| "grad_norm": 3.3839518740419026, | |
| "learning_rate": 3.999673346130203e-05, | |
| "loss": 1.0557, | |
| "mean_token_accuracy": 0.7328348979353905, | |
| "step": 61 | |
| }, | |
| { | |
| "epoch": 0.36363636363636365, | |
| "grad_norm": 3.2633650772189298, | |
| "learning_rate": 3.999604751328109e-05, | |
| "loss": 0.8524, | |
| "mean_token_accuracy": 0.7810637131333351, | |
| "step": 62 | |
| }, | |
| { | |
| "epoch": 0.36950146627565983, | |
| "grad_norm": 3.1490026716573074, | |
| "learning_rate": 3.999529624687581e-05, | |
| "loss": 0.8142, | |
| "mean_token_accuracy": 0.7814765721559525, | |
| "step": 63 | |
| }, | |
| { | |
| "epoch": 0.375366568914956, | |
| "grad_norm": 3.2754413857067997, | |
| "learning_rate": 3.999447966481298e-05, | |
| "loss": 0.9898, | |
| "mean_token_accuracy": 0.7611383348703384, | |
| "step": 64 | |
| }, | |
| { | |
| "epoch": 0.3812316715542522, | |
| "grad_norm": 3.660506367731786, | |
| "learning_rate": 3.999359777005647e-05, | |
| "loss": 1.0594, | |
| "mean_token_accuracy": 0.7227808833122253, | |
| "step": 65 | |
| }, | |
| { | |
| "epoch": 0.3870967741935484, | |
| "grad_norm": 3.1676884856312792, | |
| "learning_rate": 3.999265056580719e-05, | |
| "loss": 0.7991, | |
| "mean_token_accuracy": 0.7763359025120735, | |
| "step": 66 | |
| }, | |
| { | |
| "epoch": 0.39296187683284456, | |
| "grad_norm": 4.029643282419232, | |
| "learning_rate": 3.999163805550313e-05, | |
| "loss": 1.1659, | |
| "mean_token_accuracy": 0.7266501784324646, | |
| "step": 67 | |
| }, | |
| { | |
| "epoch": 0.39882697947214074, | |
| "grad_norm": 3.7749079691303575, | |
| "learning_rate": 3.9990560242819274e-05, | |
| "loss": 1.0427, | |
| "mean_token_accuracy": 0.742008738219738, | |
| "step": 68 | |
| }, | |
| { | |
| "epoch": 0.4046920821114369, | |
| "grad_norm": 2.847418175540077, | |
| "learning_rate": 3.9989417131667647e-05, | |
| "loss": 0.8382, | |
| "mean_token_accuracy": 0.7917518392205238, | |
| "step": 69 | |
| }, | |
| { | |
| "epoch": 0.41055718475073316, | |
| "grad_norm": 3.000734305407189, | |
| "learning_rate": 3.9988208726197293e-05, | |
| "loss": 0.859, | |
| "mean_token_accuracy": 0.7706286013126373, | |
| "step": 70 | |
| }, | |
| { | |
| "epoch": 0.41642228739002934, | |
| "grad_norm": 3.2494689736047166, | |
| "learning_rate": 3.998693503079423e-05, | |
| "loss": 1.0425, | |
| "mean_token_accuracy": 0.748763732612133, | |
| "step": 71 | |
| }, | |
| { | |
| "epoch": 0.4222873900293255, | |
| "grad_norm": 3.3185644568863504, | |
| "learning_rate": 3.998559605008146e-05, | |
| "loss": 0.911, | |
| "mean_token_accuracy": 0.7488791346549988, | |
| "step": 72 | |
| }, | |
| { | |
| "epoch": 0.4281524926686217, | |
| "grad_norm": 3.308636181185568, | |
| "learning_rate": 3.9984191788918936e-05, | |
| "loss": 0.9626, | |
| "mean_token_accuracy": 0.7521362155675888, | |
| "step": 73 | |
| }, | |
| { | |
| "epoch": 0.4340175953079179, | |
| "grad_norm": 3.0388816030710157, | |
| "learning_rate": 3.998272225240356e-05, | |
| "loss": 1.0393, | |
| "mean_token_accuracy": 0.7457354441285133, | |
| "step": 74 | |
| }, | |
| { | |
| "epoch": 0.4398826979472141, | |
| "grad_norm": 2.9938798095118306, | |
| "learning_rate": 3.9981187445869165e-05, | |
| "loss": 0.8992, | |
| "mean_token_accuracy": 0.7812586054205894, | |
| "step": 75 | |
| }, | |
| { | |
| "epoch": 0.44574780058651026, | |
| "grad_norm": 2.8252620826574075, | |
| "learning_rate": 3.9979587374886466e-05, | |
| "loss": 0.9967, | |
| "mean_token_accuracy": 0.7389231100678444, | |
| "step": 76 | |
| }, | |
| { | |
| "epoch": 0.45161290322580644, | |
| "grad_norm": 3.186265590142511, | |
| "learning_rate": 3.997792204526309e-05, | |
| "loss": 0.9009, | |
| "mean_token_accuracy": 0.7611340954899788, | |
| "step": 77 | |
| }, | |
| { | |
| "epoch": 0.4574780058651026, | |
| "grad_norm": 2.7414294876337193, | |
| "learning_rate": 3.99761914630435e-05, | |
| "loss": 0.8311, | |
| "mean_token_accuracy": 0.7738662958145142, | |
| "step": 78 | |
| }, | |
| { | |
| "epoch": 0.4633431085043988, | |
| "grad_norm": 2.7941315506109317, | |
| "learning_rate": 3.997439563450901e-05, | |
| "loss": 0.795, | |
| "mean_token_accuracy": 0.7797789797186852, | |
| "step": 79 | |
| }, | |
| { | |
| "epoch": 0.46920821114369504, | |
| "grad_norm": 3.5423646785271306, | |
| "learning_rate": 3.997253456617775e-05, | |
| "loss": 0.8952, | |
| "mean_token_accuracy": 0.7717294245958328, | |
| "step": 80 | |
| }, | |
| { | |
| "epoch": 0.4750733137829912, | |
| "grad_norm": 2.8793698731844946, | |
| "learning_rate": 3.997060826480465e-05, | |
| "loss": 0.7861, | |
| "mean_token_accuracy": 0.7887553870677948, | |
| "step": 81 | |
| }, | |
| { | |
| "epoch": 0.4809384164222874, | |
| "grad_norm": 2.8342850937430124, | |
| "learning_rate": 3.9968616737381414e-05, | |
| "loss": 0.9051, | |
| "mean_token_accuracy": 0.7746857702732086, | |
| "step": 82 | |
| }, | |
| { | |
| "epoch": 0.4868035190615836, | |
| "grad_norm": 2.765248067167797, | |
| "learning_rate": 3.996655999113647e-05, | |
| "loss": 0.7809, | |
| "mean_token_accuracy": 0.7994513288140297, | |
| "step": 83 | |
| }, | |
| { | |
| "epoch": 0.49266862170087977, | |
| "grad_norm": 2.63444146182952, | |
| "learning_rate": 3.9964438033534994e-05, | |
| "loss": 0.6653, | |
| "mean_token_accuracy": 0.8137254267930984, | |
| "step": 84 | |
| }, | |
| { | |
| "epoch": 0.49853372434017595, | |
| "grad_norm": 2.5758158024473894, | |
| "learning_rate": 3.996225087227881e-05, | |
| "loss": 0.8351, | |
| "mean_token_accuracy": 0.7905571460723877, | |
| "step": 85 | |
| }, | |
| { | |
| "epoch": 0.5043988269794721, | |
| "grad_norm": 2.679363954385569, | |
| "learning_rate": 3.995999851530645e-05, | |
| "loss": 0.7733, | |
| "mean_token_accuracy": 0.8147490695118904, | |
| "step": 86 | |
| }, | |
| { | |
| "epoch": 0.5102639296187683, | |
| "grad_norm": 3.101877229990958, | |
| "learning_rate": 3.995768097079305e-05, | |
| "loss": 0.8257, | |
| "mean_token_accuracy": 0.7912640422582626, | |
| "step": 87 | |
| }, | |
| { | |
| "epoch": 0.5161290322580645, | |
| "grad_norm": 3.2623148963738577, | |
| "learning_rate": 3.9955298247150365e-05, | |
| "loss": 0.9696, | |
| "mean_token_accuracy": 0.7420180812478065, | |
| "step": 88 | |
| }, | |
| { | |
| "epoch": 0.5219941348973607, | |
| "grad_norm": 2.771096820053822, | |
| "learning_rate": 3.9952850353026715e-05, | |
| "loss": 0.7977, | |
| "mean_token_accuracy": 0.7726879939436913, | |
| "step": 89 | |
| }, | |
| { | |
| "epoch": 0.5278592375366569, | |
| "grad_norm": 2.871934260005109, | |
| "learning_rate": 3.9950337297306976e-05, | |
| "loss": 0.8731, | |
| "mean_token_accuracy": 0.7779897972941399, | |
| "step": 90 | |
| }, | |
| { | |
| "epoch": 0.533724340175953, | |
| "grad_norm": 3.2685752749904244, | |
| "learning_rate": 3.994775908911251e-05, | |
| "loss": 0.9586, | |
| "mean_token_accuracy": 0.7562056183815002, | |
| "step": 91 | |
| }, | |
| { | |
| "epoch": 0.5395894428152492, | |
| "grad_norm": 2.969500327515322, | |
| "learning_rate": 3.9945115737801183e-05, | |
| "loss": 0.81, | |
| "mean_token_accuracy": 0.7800324559211731, | |
| "step": 92 | |
| }, | |
| { | |
| "epoch": 0.5454545454545454, | |
| "grad_norm": 2.924171514995882, | |
| "learning_rate": 3.99424072529673e-05, | |
| "loss": 0.9543, | |
| "mean_token_accuracy": 0.7631434127688408, | |
| "step": 93 | |
| }, | |
| { | |
| "epoch": 0.5513196480938416, | |
| "grad_norm": 2.793490417192969, | |
| "learning_rate": 3.993963364444155e-05, | |
| "loss": 0.8121, | |
| "mean_token_accuracy": 0.7810578048229218, | |
| "step": 94 | |
| }, | |
| { | |
| "epoch": 0.5571847507331378, | |
| "grad_norm": 2.9166073057905835, | |
| "learning_rate": 3.9936794922291015e-05, | |
| "loss": 0.9339, | |
| "mean_token_accuracy": 0.7549960389733315, | |
| "step": 95 | |
| }, | |
| { | |
| "epoch": 0.5630498533724341, | |
| "grad_norm": 3.198161361235805, | |
| "learning_rate": 3.993389109681912e-05, | |
| "loss": 0.8918, | |
| "mean_token_accuracy": 0.7610742226243019, | |
| "step": 96 | |
| }, | |
| { | |
| "epoch": 0.5689149560117303, | |
| "grad_norm": 2.9220063831682896, | |
| "learning_rate": 3.993092217856557e-05, | |
| "loss": 0.8076, | |
| "mean_token_accuracy": 0.7827741503715515, | |
| "step": 97 | |
| }, | |
| { | |
| "epoch": 0.5747800586510264, | |
| "grad_norm": 3.27007124996685, | |
| "learning_rate": 3.9927888178306346e-05, | |
| "loss": 0.9276, | |
| "mean_token_accuracy": 0.7676652446389198, | |
| "step": 98 | |
| }, | |
| { | |
| "epoch": 0.5806451612903226, | |
| "grad_norm": 3.1316564998983196, | |
| "learning_rate": 3.992478910705364e-05, | |
| "loss": 0.9, | |
| "mean_token_accuracy": 0.7664341479539871, | |
| "step": 99 | |
| }, | |
| { | |
| "epoch": 0.5865102639296188, | |
| "grad_norm": 2.4604893204923655, | |
| "learning_rate": 3.992162497605583e-05, | |
| "loss": 0.7547, | |
| "mean_token_accuracy": 0.81553765386343, | |
| "step": 100 | |
| }, | |
| { | |
| "epoch": 0.592375366568915, | |
| "grad_norm": 2.531774075426503, | |
| "learning_rate": 3.991839579679742e-05, | |
| "loss": 0.8143, | |
| "mean_token_accuracy": 0.7856726944446564, | |
| "step": 101 | |
| }, | |
| { | |
| "epoch": 0.5982404692082112, | |
| "grad_norm": 2.8565788729879573, | |
| "learning_rate": 3.991510158099905e-05, | |
| "loss": 0.645, | |
| "mean_token_accuracy": 0.8300929889082909, | |
| "step": 102 | |
| }, | |
| { | |
| "epoch": 0.6041055718475073, | |
| "grad_norm": 2.5799784187137043, | |
| "learning_rate": 3.991174234061738e-05, | |
| "loss": 0.6879, | |
| "mean_token_accuracy": 0.8256210163235664, | |
| "step": 103 | |
| }, | |
| { | |
| "epoch": 0.6099706744868035, | |
| "grad_norm": 2.8988288323460547, | |
| "learning_rate": 3.9908318087845104e-05, | |
| "loss": 0.8477, | |
| "mean_token_accuracy": 0.7800894677639008, | |
| "step": 104 | |
| }, | |
| { | |
| "epoch": 0.6158357771260997, | |
| "grad_norm": 2.652941902337806, | |
| "learning_rate": 3.990482883511086e-05, | |
| "loss": 0.628, | |
| "mean_token_accuracy": 0.828456349670887, | |
| "step": 105 | |
| }, | |
| { | |
| "epoch": 0.6217008797653959, | |
| "grad_norm": 2.459546673804448, | |
| "learning_rate": 3.990127459507924e-05, | |
| "loss": 0.6674, | |
| "mean_token_accuracy": 0.8069566413760185, | |
| "step": 106 | |
| }, | |
| { | |
| "epoch": 0.6275659824046921, | |
| "grad_norm": 5.737679812729463, | |
| "learning_rate": 3.98976553806507e-05, | |
| "loss": 0.6696, | |
| "mean_token_accuracy": 0.8196565955877304, | |
| "step": 107 | |
| }, | |
| { | |
| "epoch": 0.6334310850439883, | |
| "grad_norm": 3.013498732232973, | |
| "learning_rate": 3.989397120496152e-05, | |
| "loss": 0.6028, | |
| "mean_token_accuracy": 0.8420541435480118, | |
| "step": 108 | |
| }, | |
| { | |
| "epoch": 0.6392961876832844, | |
| "grad_norm": 2.546654879348618, | |
| "learning_rate": 3.989022208138377e-05, | |
| "loss": 0.6393, | |
| "mean_token_accuracy": 0.8270744681358337, | |
| "step": 109 | |
| }, | |
| { | |
| "epoch": 0.6451612903225806, | |
| "grad_norm": 3.20470392326585, | |
| "learning_rate": 3.9886408023525256e-05, | |
| "loss": 0.8824, | |
| "mean_token_accuracy": 0.7817021533846855, | |
| "step": 110 | |
| }, | |
| { | |
| "epoch": 0.6510263929618768, | |
| "grad_norm": 4.058205693055433, | |
| "learning_rate": 3.9882529045229475e-05, | |
| "loss": 0.9437, | |
| "mean_token_accuracy": 0.7480623200535774, | |
| "step": 111 | |
| }, | |
| { | |
| "epoch": 0.656891495601173, | |
| "grad_norm": 2.943950340930463, | |
| "learning_rate": 3.987858516057554e-05, | |
| "loss": 0.6783, | |
| "mean_token_accuracy": 0.8154667392373085, | |
| "step": 112 | |
| }, | |
| { | |
| "epoch": 0.6627565982404692, | |
| "grad_norm": 2.7735485267748383, | |
| "learning_rate": 3.9874576383878165e-05, | |
| "loss": 0.7185, | |
| "mean_token_accuracy": 0.8148206546902657, | |
| "step": 113 | |
| }, | |
| { | |
| "epoch": 0.6686217008797654, | |
| "grad_norm": 2.846138915008106, | |
| "learning_rate": 3.9870502729687594e-05, | |
| "loss": 0.7421, | |
| "mean_token_accuracy": 0.8019789308309555, | |
| "step": 114 | |
| }, | |
| { | |
| "epoch": 0.6744868035190615, | |
| "grad_norm": 2.644169473625222, | |
| "learning_rate": 3.986636421278954e-05, | |
| "loss": 0.7782, | |
| "mean_token_accuracy": 0.8042545765638351, | |
| "step": 115 | |
| }, | |
| { | |
| "epoch": 0.6803519061583577, | |
| "grad_norm": 2.3392486448858927, | |
| "learning_rate": 3.986216084820515e-05, | |
| "loss": 0.5596, | |
| "mean_token_accuracy": 0.8448091298341751, | |
| "step": 116 | |
| }, | |
| { | |
| "epoch": 0.6862170087976539, | |
| "grad_norm": 2.583813549613289, | |
| "learning_rate": 3.985789265119095e-05, | |
| "loss": 0.6485, | |
| "mean_token_accuracy": 0.8111533001065254, | |
| "step": 117 | |
| }, | |
| { | |
| "epoch": 0.6920821114369502, | |
| "grad_norm": 2.4887332265478617, | |
| "learning_rate": 3.985355963723875e-05, | |
| "loss": 0.5541, | |
| "mean_token_accuracy": 0.851133368909359, | |
| "step": 118 | |
| }, | |
| { | |
| "epoch": 0.6979472140762464, | |
| "grad_norm": 2.4799587573079513, | |
| "learning_rate": 3.9849161822075655e-05, | |
| "loss": 0.6044, | |
| "mean_token_accuracy": 0.8356705754995346, | |
| "step": 119 | |
| }, | |
| { | |
| "epoch": 0.7038123167155426, | |
| "grad_norm": 2.530137167351601, | |
| "learning_rate": 3.984469922166396e-05, | |
| "loss": 0.7096, | |
| "mean_token_accuracy": 0.8197364211082458, | |
| "step": 120 | |
| }, | |
| { | |
| "epoch": 0.7096774193548387, | |
| "grad_norm": 2.824821340244295, | |
| "learning_rate": 3.984017185220109e-05, | |
| "loss": 0.8867, | |
| "mean_token_accuracy": 0.7859450578689575, | |
| "step": 121 | |
| }, | |
| { | |
| "epoch": 0.7155425219941349, | |
| "grad_norm": 2.5599681132585617, | |
| "learning_rate": 3.9835579730119576e-05, | |
| "loss": 0.7616, | |
| "mean_token_accuracy": 0.7998316809535027, | |
| "step": 122 | |
| }, | |
| { | |
| "epoch": 0.7214076246334311, | |
| "grad_norm": 2.3785812997234257, | |
| "learning_rate": 3.9830922872086974e-05, | |
| "loss": 0.743, | |
| "mean_token_accuracy": 0.8207187280058861, | |
| "step": 123 | |
| }, | |
| { | |
| "epoch": 0.7272727272727273, | |
| "grad_norm": 2.585305660680504, | |
| "learning_rate": 3.9826201295005784e-05, | |
| "loss": 0.8467, | |
| "mean_token_accuracy": 0.7912140190601349, | |
| "step": 124 | |
| }, | |
| { | |
| "epoch": 0.7331378299120235, | |
| "grad_norm": 2.8062519533225103, | |
| "learning_rate": 3.982141501601343e-05, | |
| "loss": 0.8176, | |
| "mean_token_accuracy": 0.7884003445506096, | |
| "step": 125 | |
| }, | |
| { | |
| "epoch": 0.7390029325513197, | |
| "grad_norm": 2.693775516289464, | |
| "learning_rate": 3.9816564052482164e-05, | |
| "loss": 0.7498, | |
| "mean_token_accuracy": 0.799252338707447, | |
| "step": 126 | |
| }, | |
| { | |
| "epoch": 0.7448680351906158, | |
| "grad_norm": 2.9011965003727425, | |
| "learning_rate": 3.981164842201904e-05, | |
| "loss": 0.7827, | |
| "mean_token_accuracy": 0.8069761171936989, | |
| "step": 127 | |
| }, | |
| { | |
| "epoch": 0.750733137829912, | |
| "grad_norm": 2.838289459808484, | |
| "learning_rate": 3.9806668142465804e-05, | |
| "loss": 0.8198, | |
| "mean_token_accuracy": 0.7964724600315094, | |
| "step": 128 | |
| }, | |
| { | |
| "epoch": 0.7565982404692082, | |
| "grad_norm": 2.210570471941626, | |
| "learning_rate": 3.9801623231898856e-05, | |
| "loss": 0.607, | |
| "mean_token_accuracy": 0.833724357187748, | |
| "step": 129 | |
| }, | |
| { | |
| "epoch": 0.7624633431085044, | |
| "grad_norm": 2.4332563355629326, | |
| "learning_rate": 3.9796513708629186e-05, | |
| "loss": 0.6395, | |
| "mean_token_accuracy": 0.8242924064397812, | |
| "step": 130 | |
| }, | |
| { | |
| "epoch": 0.7683284457478006, | |
| "grad_norm": 2.745783922339993, | |
| "learning_rate": 3.979133959120229e-05, | |
| "loss": 0.6428, | |
| "mean_token_accuracy": 0.8359903246164322, | |
| "step": 131 | |
| }, | |
| { | |
| "epoch": 0.7741935483870968, | |
| "grad_norm": 2.446231449332525, | |
| "learning_rate": 3.9786100898398145e-05, | |
| "loss": 0.6281, | |
| "mean_token_accuracy": 0.8326933979988098, | |
| "step": 132 | |
| }, | |
| { | |
| "epoch": 0.7800586510263929, | |
| "grad_norm": 2.2972578840395315, | |
| "learning_rate": 3.9780797649231085e-05, | |
| "loss": 0.6695, | |
| "mean_token_accuracy": 0.8279871791601181, | |
| "step": 133 | |
| }, | |
| { | |
| "epoch": 0.7859237536656891, | |
| "grad_norm": 2.7410960234964894, | |
| "learning_rate": 3.9775429862949745e-05, | |
| "loss": 0.7781, | |
| "mean_token_accuracy": 0.8053743913769722, | |
| "step": 134 | |
| }, | |
| { | |
| "epoch": 0.7917888563049853, | |
| "grad_norm": 2.5415583652058436, | |
| "learning_rate": 3.976999755903704e-05, | |
| "loss": 0.763, | |
| "mean_token_accuracy": 0.7993438169360161, | |
| "step": 135 | |
| }, | |
| { | |
| "epoch": 0.7976539589442815, | |
| "grad_norm": 2.270662512492905, | |
| "learning_rate": 3.976450075721003e-05, | |
| "loss": 0.6284, | |
| "mean_token_accuracy": 0.8407260999083519, | |
| "step": 136 | |
| }, | |
| { | |
| "epoch": 0.8035190615835777, | |
| "grad_norm": 2.5929616858518676, | |
| "learning_rate": 3.975893947741989e-05, | |
| "loss": 0.5968, | |
| "mean_token_accuracy": 0.845864936709404, | |
| "step": 137 | |
| }, | |
| { | |
| "epoch": 0.8093841642228738, | |
| "grad_norm": 2.3854479485716733, | |
| "learning_rate": 3.9753313739851824e-05, | |
| "loss": 0.7616, | |
| "mean_token_accuracy": 0.7994790449738503, | |
| "step": 138 | |
| }, | |
| { | |
| "epoch": 0.8152492668621701, | |
| "grad_norm": 2.8723348402691906, | |
| "learning_rate": 3.974762356492498e-05, | |
| "loss": 0.8845, | |
| "mean_token_accuracy": 0.7835755422711372, | |
| "step": 139 | |
| }, | |
| { | |
| "epoch": 0.8211143695014663, | |
| "grad_norm": 2.479196600373357, | |
| "learning_rate": 3.974186897329239e-05, | |
| "loss": 0.6029, | |
| "mean_token_accuracy": 0.851497046649456, | |
| "step": 140 | |
| }, | |
| { | |
| "epoch": 0.8269794721407625, | |
| "grad_norm": 2.504810711733226, | |
| "learning_rate": 3.97360499858409e-05, | |
| "loss": 0.6289, | |
| "mean_token_accuracy": 0.8382159024477005, | |
| "step": 141 | |
| }, | |
| { | |
| "epoch": 0.8328445747800587, | |
| "grad_norm": 2.935237581676673, | |
| "learning_rate": 3.9730166623691096e-05, | |
| "loss": 0.8208, | |
| "mean_token_accuracy": 0.7849750071763992, | |
| "step": 142 | |
| }, | |
| { | |
| "epoch": 0.8387096774193549, | |
| "grad_norm": 2.6436501176507603, | |
| "learning_rate": 3.9724218908197194e-05, | |
| "loss": 0.5852, | |
| "mean_token_accuracy": 0.8367486670613289, | |
| "step": 143 | |
| }, | |
| { | |
| "epoch": 0.844574780058651, | |
| "grad_norm": 2.924448686602039, | |
| "learning_rate": 3.971820686094701e-05, | |
| "loss": 0.8792, | |
| "mean_token_accuracy": 0.7791745141148567, | |
| "step": 144 | |
| }, | |
| { | |
| "epoch": 0.8504398826979472, | |
| "grad_norm": 2.7037525549378154, | |
| "learning_rate": 3.971213050376183e-05, | |
| "loss": 0.8009, | |
| "mean_token_accuracy": 0.7840462699532509, | |
| "step": 145 | |
| }, | |
| { | |
| "epoch": 0.8563049853372434, | |
| "grad_norm": 2.023965935982458, | |
| "learning_rate": 3.9705989858696387e-05, | |
| "loss": 0.5849, | |
| "mean_token_accuracy": 0.8464046567678452, | |
| "step": 146 | |
| }, | |
| { | |
| "epoch": 0.8621700879765396, | |
| "grad_norm": 2.2389680633489677, | |
| "learning_rate": 3.969978494803876e-05, | |
| "loss": 0.5711, | |
| "mean_token_accuracy": 0.8433157354593277, | |
| "step": 147 | |
| }, | |
| { | |
| "epoch": 0.8680351906158358, | |
| "grad_norm": 2.278050804313283, | |
| "learning_rate": 3.969351579431024e-05, | |
| "loss": 0.5924, | |
| "mean_token_accuracy": 0.8373497202992439, | |
| "step": 148 | |
| }, | |
| { | |
| "epoch": 0.873900293255132, | |
| "grad_norm": 2.402715693045193, | |
| "learning_rate": 3.968718242026533e-05, | |
| "loss": 0.5959, | |
| "mean_token_accuracy": 0.840206079185009, | |
| "step": 149 | |
| }, | |
| { | |
| "epoch": 0.8797653958944281, | |
| "grad_norm": 2.2011155417677726, | |
| "learning_rate": 3.968078484889163e-05, | |
| "loss": 0.4494, | |
| "mean_token_accuracy": 0.8684269115328789, | |
| "step": 150 | |
| }, | |
| { | |
| "epoch": 0.8856304985337243, | |
| "grad_norm": 2.486921790559847, | |
| "learning_rate": 3.9674323103409736e-05, | |
| "loss": 0.6253, | |
| "mean_token_accuracy": 0.8286841958761215, | |
| "step": 151 | |
| }, | |
| { | |
| "epoch": 0.8914956011730205, | |
| "grad_norm": 2.789517335456884, | |
| "learning_rate": 3.966779720727317e-05, | |
| "loss": 0.7794, | |
| "mean_token_accuracy": 0.8061128631234169, | |
| "step": 152 | |
| }, | |
| { | |
| "epoch": 0.8973607038123167, | |
| "grad_norm": 2.510141045705032, | |
| "learning_rate": 3.9661207184168305e-05, | |
| "loss": 0.622, | |
| "mean_token_accuracy": 0.8308300077915192, | |
| "step": 153 | |
| }, | |
| { | |
| "epoch": 0.9032258064516129, | |
| "grad_norm": 2.5109259078139936, | |
| "learning_rate": 3.9654553058014265e-05, | |
| "loss": 0.7255, | |
| "mean_token_accuracy": 0.8097553253173828, | |
| "step": 154 | |
| }, | |
| { | |
| "epoch": 0.9090909090909091, | |
| "grad_norm": 2.186994234639019, | |
| "learning_rate": 3.9647834852962825e-05, | |
| "loss": 0.6056, | |
| "mean_token_accuracy": 0.8426107689738274, | |
| "step": 155 | |
| }, | |
| { | |
| "epoch": 0.9149560117302052, | |
| "grad_norm": 2.690260479130677, | |
| "learning_rate": 3.964105259339838e-05, | |
| "loss": 0.8541, | |
| "mean_token_accuracy": 0.7787467613816261, | |
| "step": 156 | |
| }, | |
| { | |
| "epoch": 0.9208211143695014, | |
| "grad_norm": 2.1337725885680925, | |
| "learning_rate": 3.9634206303937773e-05, | |
| "loss": 0.5244, | |
| "mean_token_accuracy": 0.8537596613168716, | |
| "step": 157 | |
| }, | |
| { | |
| "epoch": 0.9266862170087976, | |
| "grad_norm": 1.9598919005300581, | |
| "learning_rate": 3.962729600943028e-05, | |
| "loss": 0.4997, | |
| "mean_token_accuracy": 0.8664168417453766, | |
| "step": 158 | |
| }, | |
| { | |
| "epoch": 0.9325513196480938, | |
| "grad_norm": 2.325256374795688, | |
| "learning_rate": 3.962032173495748e-05, | |
| "loss": 0.5449, | |
| "mean_token_accuracy": 0.8529787808656693, | |
| "step": 159 | |
| }, | |
| { | |
| "epoch": 0.9384164222873901, | |
| "grad_norm": 2.091552878169736, | |
| "learning_rate": 3.961328350583316e-05, | |
| "loss": 0.5242, | |
| "mean_token_accuracy": 0.8521319106221199, | |
| "step": 160 | |
| }, | |
| { | |
| "epoch": 0.9442815249266863, | |
| "grad_norm": 2.4200043074502355, | |
| "learning_rate": 3.960618134760327e-05, | |
| "loss": 0.6728, | |
| "mean_token_accuracy": 0.832053430378437, | |
| "step": 161 | |
| }, | |
| { | |
| "epoch": 0.9501466275659824, | |
| "grad_norm": 1.9686791505112915, | |
| "learning_rate": 3.959901528604575e-05, | |
| "loss": 0.4286, | |
| "mean_token_accuracy": 0.8681811094284058, | |
| "step": 162 | |
| }, | |
| { | |
| "epoch": 0.9560117302052786, | |
| "grad_norm": 2.599071858919526, | |
| "learning_rate": 3.959178534717053e-05, | |
| "loss": 0.7159, | |
| "mean_token_accuracy": 0.8168910816311836, | |
| "step": 163 | |
| }, | |
| { | |
| "epoch": 0.9618768328445748, | |
| "grad_norm": 2.083295545704035, | |
| "learning_rate": 3.9584491557219366e-05, | |
| "loss": 0.6646, | |
| "mean_token_accuracy": 0.8394544422626495, | |
| "step": 164 | |
| }, | |
| { | |
| "epoch": 0.967741935483871, | |
| "grad_norm": 2.240001768853048, | |
| "learning_rate": 3.957713394266576e-05, | |
| "loss": 0.5787, | |
| "mean_token_accuracy": 0.836028702557087, | |
| "step": 165 | |
| }, | |
| { | |
| "epoch": 0.9736070381231672, | |
| "grad_norm": 2.575766108085151, | |
| "learning_rate": 3.956971253021489e-05, | |
| "loss": 0.5134, | |
| "mean_token_accuracy": 0.8556343615055084, | |
| "step": 166 | |
| }, | |
| { | |
| "epoch": 0.9794721407624634, | |
| "grad_norm": 2.4405755915866165, | |
| "learning_rate": 3.956222734680348e-05, | |
| "loss": 0.6147, | |
| "mean_token_accuracy": 0.8331816866993904, | |
| "step": 167 | |
| }, | |
| { | |
| "epoch": 0.9853372434017595, | |
| "grad_norm": 2.325820647224952, | |
| "learning_rate": 3.955467841959972e-05, | |
| "loss": 0.6321, | |
| "mean_token_accuracy": 0.8455986753106117, | |
| "step": 168 | |
| }, | |
| { | |
| "epoch": 0.9912023460410557, | |
| "grad_norm": 2.178404407582914, | |
| "learning_rate": 3.954706577600318e-05, | |
| "loss": 0.5734, | |
| "mean_token_accuracy": 0.8322867527604103, | |
| "step": 169 | |
| }, | |
| { | |
| "epoch": 0.9970674486803519, | |
| "grad_norm": 2.064610679796685, | |
| "learning_rate": 3.953938944364467e-05, | |
| "loss": 0.7105, | |
| "mean_token_accuracy": 0.8211642429232597, | |
| "step": 170 | |
| }, | |
| { | |
| "epoch": 1.0, | |
| "grad_norm": 2.064610679796685, | |
| "learning_rate": 3.953164945038618e-05, | |
| "loss": 0.6539, | |
| "mean_token_accuracy": 0.8487359285354614, | |
| "step": 171 | |
| }, | |
| { | |
| "epoch": 1.0058651026392962, | |
| "grad_norm": 3.388458997700093, | |
| "learning_rate": 3.952384582432076e-05, | |
| "loss": 0.4128, | |
| "mean_token_accuracy": 0.8756323307752609, | |
| "step": 172 | |
| }, | |
| { | |
| "epoch": 1.0117302052785924, | |
| "grad_norm": 1.946866153227053, | |
| "learning_rate": 3.9515978593772426e-05, | |
| "loss": 0.3487, | |
| "mean_token_accuracy": 0.9017143920063972, | |
| "step": 173 | |
| }, | |
| { | |
| "epoch": 1.0175953079178885, | |
| "grad_norm": 1.8895885148496738, | |
| "learning_rate": 3.9508047787296034e-05, | |
| "loss": 0.2699, | |
| "mean_token_accuracy": 0.9154033064842224, | |
| "step": 174 | |
| }, | |
| { | |
| "epoch": 1.0234604105571847, | |
| "grad_norm": 1.61059572554682, | |
| "learning_rate": 3.9500053433677226e-05, | |
| "loss": 0.2817, | |
| "mean_token_accuracy": 0.9106803834438324, | |
| "step": 175 | |
| }, | |
| { | |
| "epoch": 1.029325513196481, | |
| "grad_norm": 1.8435205509000987, | |
| "learning_rate": 3.949199556193226e-05, | |
| "loss": 0.3626, | |
| "mean_token_accuracy": 0.8898500800132751, | |
| "step": 176 | |
| }, | |
| { | |
| "epoch": 1.035190615835777, | |
| "grad_norm": 1.7339474636607546, | |
| "learning_rate": 3.948387420130796e-05, | |
| "loss": 0.2602, | |
| "mean_token_accuracy": 0.9224452674388885, | |
| "step": 177 | |
| }, | |
| { | |
| "epoch": 1.0410557184750733, | |
| "grad_norm": 1.7997907466178427, | |
| "learning_rate": 3.94756893812816e-05, | |
| "loss": 0.358, | |
| "mean_token_accuracy": 0.8965035080909729, | |
| "step": 178 | |
| }, | |
| { | |
| "epoch": 1.0469208211143695, | |
| "grad_norm": 2.258217493063847, | |
| "learning_rate": 3.946744113156075e-05, | |
| "loss": 0.3243, | |
| "mean_token_accuracy": 0.8938365653157234, | |
| "step": 179 | |
| }, | |
| { | |
| "epoch": 1.0527859237536656, | |
| "grad_norm": 2.593084420555114, | |
| "learning_rate": 3.945912948208324e-05, | |
| "loss": 0.4572, | |
| "mean_token_accuracy": 0.8712584003806114, | |
| "step": 180 | |
| }, | |
| { | |
| "epoch": 1.0586510263929618, | |
| "grad_norm": 2.436465953164275, | |
| "learning_rate": 3.9450754463016994e-05, | |
| "loss": 0.4072, | |
| "mean_token_accuracy": 0.8850173056125641, | |
| "step": 181 | |
| }, | |
| { | |
| "epoch": 1.064516129032258, | |
| "grad_norm": 2.31061284700661, | |
| "learning_rate": 3.9442316104759955e-05, | |
| "loss": 0.3876, | |
| "mean_token_accuracy": 0.8916988000273705, | |
| "step": 182 | |
| }, | |
| { | |
| "epoch": 1.0703812316715542, | |
| "grad_norm": 1.8531473859414216, | |
| "learning_rate": 3.943381443793994e-05, | |
| "loss": 0.4046, | |
| "mean_token_accuracy": 0.8912115767598152, | |
| "step": 183 | |
| }, | |
| { | |
| "epoch": 1.0762463343108504, | |
| "grad_norm": 2.2566644664782145, | |
| "learning_rate": 3.9425249493414585e-05, | |
| "loss": 0.479, | |
| "mean_token_accuracy": 0.8574677929282188, | |
| "step": 184 | |
| }, | |
| { | |
| "epoch": 1.0821114369501466, | |
| "grad_norm": 2.0447795712362664, | |
| "learning_rate": 3.941662130227118e-05, | |
| "loss": 0.5012, | |
| "mean_token_accuracy": 0.8605052158236504, | |
| "step": 185 | |
| }, | |
| { | |
| "epoch": 1.0879765395894427, | |
| "grad_norm": 2.0252956397987494, | |
| "learning_rate": 3.940792989582654e-05, | |
| "loss": 0.4011, | |
| "mean_token_accuracy": 0.8870269060134888, | |
| "step": 186 | |
| }, | |
| { | |
| "epoch": 1.093841642228739, | |
| "grad_norm": 2.1370096750662317, | |
| "learning_rate": 3.939917530562701e-05, | |
| "loss": 0.3308, | |
| "mean_token_accuracy": 0.9050007611513138, | |
| "step": 187 | |
| }, | |
| { | |
| "epoch": 1.099706744868035, | |
| "grad_norm": 2.0265811605099846, | |
| "learning_rate": 3.939035756344818e-05, | |
| "loss": 0.3665, | |
| "mean_token_accuracy": 0.898661196231842, | |
| "step": 188 | |
| }, | |
| { | |
| "epoch": 1.1055718475073313, | |
| "grad_norm": 1.952280542383211, | |
| "learning_rate": 3.93814767012949e-05, | |
| "loss": 0.3688, | |
| "mean_token_accuracy": 0.8879161328077316, | |
| "step": 189 | |
| }, | |
| { | |
| "epoch": 1.1114369501466275, | |
| "grad_norm": 1.7005130946868734, | |
| "learning_rate": 3.937253275140113e-05, | |
| "loss": 0.2879, | |
| "mean_token_accuracy": 0.915998600423336, | |
| "step": 190 | |
| }, | |
| { | |
| "epoch": 1.1173020527859236, | |
| "grad_norm": 1.8692102303872629, | |
| "learning_rate": 3.936352574622978e-05, | |
| "loss": 0.2632, | |
| "mean_token_accuracy": 0.9168166071176529, | |
| "step": 191 | |
| }, | |
| { | |
| "epoch": 1.1231671554252198, | |
| "grad_norm": 1.4636789459840258, | |
| "learning_rate": 3.9354455718472646e-05, | |
| "loss": 0.3907, | |
| "mean_token_accuracy": 0.890878938138485, | |
| "step": 192 | |
| }, | |
| { | |
| "epoch": 1.129032258064516, | |
| "grad_norm": 2.221927967255573, | |
| "learning_rate": 3.934532270105026e-05, | |
| "loss": 0.4421, | |
| "mean_token_accuracy": 0.8873759284615517, | |
| "step": 193 | |
| }, | |
| { | |
| "epoch": 1.1348973607038122, | |
| "grad_norm": 2.370072341257091, | |
| "learning_rate": 3.933612672711179e-05, | |
| "loss": 0.3847, | |
| "mean_token_accuracy": 0.8896044939756393, | |
| "step": 194 | |
| }, | |
| { | |
| "epoch": 1.1407624633431086, | |
| "grad_norm": 1.857139188636865, | |
| "learning_rate": 3.9326867830034915e-05, | |
| "loss": 0.4143, | |
| "mean_token_accuracy": 0.8805570974946022, | |
| "step": 195 | |
| }, | |
| { | |
| "epoch": 1.1466275659824048, | |
| "grad_norm": 1.8437744521798194, | |
| "learning_rate": 3.931754604342568e-05, | |
| "loss": 0.3271, | |
| "mean_token_accuracy": 0.8968425765633583, | |
| "step": 196 | |
| }, | |
| { | |
| "epoch": 1.152492668621701, | |
| "grad_norm": 1.8607108031680868, | |
| "learning_rate": 3.930816140111842e-05, | |
| "loss": 0.2857, | |
| "mean_token_accuracy": 0.9045360758900642, | |
| "step": 197 | |
| }, | |
| { | |
| "epoch": 1.1583577712609971, | |
| "grad_norm": 1.9508835791170454, | |
| "learning_rate": 3.929871393717558e-05, | |
| "loss": 0.3669, | |
| "mean_token_accuracy": 0.9067378118634224, | |
| "step": 198 | |
| }, | |
| { | |
| "epoch": 1.1642228739002933, | |
| "grad_norm": 2.0949010077945265, | |
| "learning_rate": 3.9289203685887644e-05, | |
| "loss": 0.4208, | |
| "mean_token_accuracy": 0.8803550601005554, | |
| "step": 199 | |
| }, | |
| { | |
| "epoch": 1.1700879765395895, | |
| "grad_norm": 2.085118395713445, | |
| "learning_rate": 3.927963068177299e-05, | |
| "loss": 0.4436, | |
| "mean_token_accuracy": 0.8708969727158546, | |
| "step": 200 | |
| }, | |
| { | |
| "epoch": 1.1759530791788857, | |
| "grad_norm": 2.1164241199376157, | |
| "learning_rate": 3.926999495957775e-05, | |
| "loss": 0.5082, | |
| "mean_token_accuracy": 0.8535888493061066, | |
| "step": 201 | |
| }, | |
| { | |
| "epoch": 1.1818181818181819, | |
| "grad_norm": 2.13206613518577, | |
| "learning_rate": 3.9260296554275704e-05, | |
| "loss": 0.5362, | |
| "mean_token_accuracy": 0.8581036180257797, | |
| "step": 202 | |
| }, | |
| { | |
| "epoch": 1.187683284457478, | |
| "grad_norm": 2.1096851845586047, | |
| "learning_rate": 3.925053550106815e-05, | |
| "loss": 0.3643, | |
| "mean_token_accuracy": 0.894468292593956, | |
| "step": 203 | |
| }, | |
| { | |
| "epoch": 1.1935483870967742, | |
| "grad_norm": 1.9205106894225823, | |
| "learning_rate": 3.9240711835383766e-05, | |
| "loss": 0.3395, | |
| "mean_token_accuracy": 0.8946530520915985, | |
| "step": 204 | |
| }, | |
| { | |
| "epoch": 1.1994134897360704, | |
| "grad_norm": 1.9233142739706681, | |
| "learning_rate": 3.9230825592878494e-05, | |
| "loss": 0.3717, | |
| "mean_token_accuracy": 0.895426020026207, | |
| "step": 205 | |
| }, | |
| { | |
| "epoch": 1.2052785923753666, | |
| "grad_norm": 2.0226450173120747, | |
| "learning_rate": 3.92208768094354e-05, | |
| "loss": 0.3037, | |
| "mean_token_accuracy": 0.9123189970850945, | |
| "step": 206 | |
| }, | |
| { | |
| "epoch": 1.2111436950146628, | |
| "grad_norm": 1.8525205465171763, | |
| "learning_rate": 3.921086552116455e-05, | |
| "loss": 0.3298, | |
| "mean_token_accuracy": 0.9004262015223503, | |
| "step": 207 | |
| }, | |
| { | |
| "epoch": 1.217008797653959, | |
| "grad_norm": 2.01008712513797, | |
| "learning_rate": 3.920079176440288e-05, | |
| "loss": 0.3022, | |
| "mean_token_accuracy": 0.9123654365539551, | |
| "step": 208 | |
| }, | |
| { | |
| "epoch": 1.2228739002932552, | |
| "grad_norm": 2.085903024978829, | |
| "learning_rate": 3.9190655575714045e-05, | |
| "loss": 0.4884, | |
| "mean_token_accuracy": 0.8751334249973297, | |
| "step": 209 | |
| }, | |
| { | |
| "epoch": 1.2287390029325513, | |
| "grad_norm": 2.429563569008283, | |
| "learning_rate": 3.918045699188833e-05, | |
| "loss": 0.3692, | |
| "mean_token_accuracy": 0.8923944160342216, | |
| "step": 210 | |
| }, | |
| { | |
| "epoch": 1.2346041055718475, | |
| "grad_norm": 2.0242830825921394, | |
| "learning_rate": 3.9170196049942474e-05, | |
| "loss": 0.3363, | |
| "mean_token_accuracy": 0.9068214148283005, | |
| "step": 211 | |
| }, | |
| { | |
| "epoch": 1.2404692082111437, | |
| "grad_norm": 1.7665849526519697, | |
| "learning_rate": 3.915987278711954e-05, | |
| "loss": 0.3032, | |
| "mean_token_accuracy": 0.9094932749867439, | |
| "step": 212 | |
| }, | |
| { | |
| "epoch": 1.2463343108504399, | |
| "grad_norm": 1.6859984756333792, | |
| "learning_rate": 3.914948724088883e-05, | |
| "loss": 0.4464, | |
| "mean_token_accuracy": 0.8826745226979256, | |
| "step": 213 | |
| }, | |
| { | |
| "epoch": 1.252199413489736, | |
| "grad_norm": 2.107291121286409, | |
| "learning_rate": 3.913903944894565e-05, | |
| "loss": 0.385, | |
| "mean_token_accuracy": 0.8863528296351433, | |
| "step": 214 | |
| }, | |
| { | |
| "epoch": 1.2580645161290323, | |
| "grad_norm": 1.731785903224652, | |
| "learning_rate": 3.912852944921129e-05, | |
| "loss": 0.3462, | |
| "mean_token_accuracy": 0.9012412056326866, | |
| "step": 215 | |
| }, | |
| { | |
| "epoch": 1.2639296187683284, | |
| "grad_norm": 2.137318311123959, | |
| "learning_rate": 3.911795727983279e-05, | |
| "loss": 0.3789, | |
| "mean_token_accuracy": 0.8955278024077415, | |
| "step": 216 | |
| }, | |
| { | |
| "epoch": 1.2697947214076246, | |
| "grad_norm": 2.056102679528605, | |
| "learning_rate": 3.910732297918285e-05, | |
| "loss": 0.4184, | |
| "mean_token_accuracy": 0.8876356407999992, | |
| "step": 217 | |
| }, | |
| { | |
| "epoch": 1.2756598240469208, | |
| "grad_norm": 2.1522822755480475, | |
| "learning_rate": 3.90966265858597e-05, | |
| "loss": 0.4578, | |
| "mean_token_accuracy": 0.8790669366717339, | |
| "step": 218 | |
| }, | |
| { | |
| "epoch": 1.281524926686217, | |
| "grad_norm": 2.2044287057502054, | |
| "learning_rate": 3.908586813868693e-05, | |
| "loss": 0.4495, | |
| "mean_token_accuracy": 0.880668930709362, | |
| "step": 219 | |
| }, | |
| { | |
| "epoch": 1.2873900293255132, | |
| "grad_norm": 2.1039486823285136, | |
| "learning_rate": 3.9075047676713354e-05, | |
| "loss": 0.4446, | |
| "mean_token_accuracy": 0.8780560865998268, | |
| "step": 220 | |
| }, | |
| { | |
| "epoch": 1.2932551319648093, | |
| "grad_norm": 2.154180937578927, | |
| "learning_rate": 3.9064165239212874e-05, | |
| "loss": 0.3996, | |
| "mean_token_accuracy": 0.8869764432311058, | |
| "step": 221 | |
| }, | |
| { | |
| "epoch": 1.2991202346041055, | |
| "grad_norm": 1.924290765487603, | |
| "learning_rate": 3.905322086568434e-05, | |
| "loss": 0.4499, | |
| "mean_token_accuracy": 0.883093573153019, | |
| "step": 222 | |
| }, | |
| { | |
| "epoch": 1.3049853372434017, | |
| "grad_norm": 2.2702727232826745, | |
| "learning_rate": 3.904221459585142e-05, | |
| "loss": 0.3775, | |
| "mean_token_accuracy": 0.889284610748291, | |
| "step": 223 | |
| }, | |
| { | |
| "epoch": 1.310850439882698, | |
| "grad_norm": 2.4413459555324457, | |
| "learning_rate": 3.903114646966242e-05, | |
| "loss": 0.4279, | |
| "mean_token_accuracy": 0.8820584490895271, | |
| "step": 224 | |
| }, | |
| { | |
| "epoch": 1.316715542521994, | |
| "grad_norm": 1.6910433578113089, | |
| "learning_rate": 3.9020016527290166e-05, | |
| "loss": 0.3893, | |
| "mean_token_accuracy": 0.8818121626973152, | |
| "step": 225 | |
| }, | |
| { | |
| "epoch": 1.3225806451612903, | |
| "grad_norm": 1.7243062281355852, | |
| "learning_rate": 3.900882480913185e-05, | |
| "loss": 0.2944, | |
| "mean_token_accuracy": 0.9121311828494072, | |
| "step": 226 | |
| }, | |
| { | |
| "epoch": 1.3284457478005864, | |
| "grad_norm": 2.090346610417419, | |
| "learning_rate": 3.899757135580891e-05, | |
| "loss": 0.4818, | |
| "mean_token_accuracy": 0.8721955120563507, | |
| "step": 227 | |
| }, | |
| { | |
| "epoch": 1.3343108504398826, | |
| "grad_norm": 1.988732267461751, | |
| "learning_rate": 3.898625620816681e-05, | |
| "loss": 0.3523, | |
| "mean_token_accuracy": 0.8897998854517937, | |
| "step": 228 | |
| }, | |
| { | |
| "epoch": 1.3401759530791788, | |
| "grad_norm": 2.061302818576063, | |
| "learning_rate": 3.8974879407275e-05, | |
| "loss": 0.5068, | |
| "mean_token_accuracy": 0.8575932830572128, | |
| "step": 229 | |
| }, | |
| { | |
| "epoch": 1.3460410557184752, | |
| "grad_norm": 2.242260225182611, | |
| "learning_rate": 3.896344099442663e-05, | |
| "loss": 0.3596, | |
| "mean_token_accuracy": 0.8998804241418839, | |
| "step": 230 | |
| }, | |
| { | |
| "epoch": 1.3519061583577714, | |
| "grad_norm": 1.847107530517423, | |
| "learning_rate": 3.895194101113855e-05, | |
| "loss": 0.3204, | |
| "mean_token_accuracy": 0.885312870144844, | |
| "step": 231 | |
| }, | |
| { | |
| "epoch": 1.3577712609970676, | |
| "grad_norm": 1.844656842651701, | |
| "learning_rate": 3.894037949915104e-05, | |
| "loss": 0.3592, | |
| "mean_token_accuracy": 0.9094949513673782, | |
| "step": 232 | |
| }, | |
| { | |
| "epoch": 1.3636363636363638, | |
| "grad_norm": 1.653499557561736, | |
| "learning_rate": 3.8928756500427735e-05, | |
| "loss": 0.3537, | |
| "mean_token_accuracy": 0.8922702744603157, | |
| "step": 233 | |
| }, | |
| { | |
| "epoch": 1.36950146627566, | |
| "grad_norm": 1.8328249166634853, | |
| "learning_rate": 3.89170720571554e-05, | |
| "loss": 0.3317, | |
| "mean_token_accuracy": 0.9058214798569679, | |
| "step": 234 | |
| }, | |
| { | |
| "epoch": 1.3753665689149561, | |
| "grad_norm": 1.8665975142042839, | |
| "learning_rate": 3.890532621174387e-05, | |
| "loss": 0.3413, | |
| "mean_token_accuracy": 0.898667685687542, | |
| "step": 235 | |
| }, | |
| { | |
| "epoch": 1.3812316715542523, | |
| "grad_norm": 1.6996356173562472, | |
| "learning_rate": 3.8893519006825806e-05, | |
| "loss": 0.3258, | |
| "mean_token_accuracy": 0.9047993496060371, | |
| "step": 236 | |
| }, | |
| { | |
| "epoch": 1.3870967741935485, | |
| "grad_norm": 1.950666656245651, | |
| "learning_rate": 3.88816504852566e-05, | |
| "loss": 0.3437, | |
| "mean_token_accuracy": 0.9000433012843132, | |
| "step": 237 | |
| }, | |
| { | |
| "epoch": 1.3929618768328447, | |
| "grad_norm": 1.9752405137217386, | |
| "learning_rate": 3.886972069011419e-05, | |
| "loss": 0.5516, | |
| "mean_token_accuracy": 0.8599549755454063, | |
| "step": 238 | |
| }, | |
| { | |
| "epoch": 1.3988269794721409, | |
| "grad_norm": 2.4127549070414838, | |
| "learning_rate": 3.885772966469891e-05, | |
| "loss": 0.3559, | |
| "mean_token_accuracy": 0.893404632806778, | |
| "step": 239 | |
| }, | |
| { | |
| "epoch": 1.404692082111437, | |
| "grad_norm": 1.911059959976087, | |
| "learning_rate": 3.884567745253335e-05, | |
| "loss": 0.2826, | |
| "mean_token_accuracy": 0.9134182780981064, | |
| "step": 240 | |
| }, | |
| { | |
| "epoch": 1.4105571847507332, | |
| "grad_norm": 1.7911171144834646, | |
| "learning_rate": 3.8833564097362157e-05, | |
| "loss": 0.4307, | |
| "mean_token_accuracy": 0.8844648599624634, | |
| "step": 241 | |
| }, | |
| { | |
| "epoch": 1.4164222873900294, | |
| "grad_norm": 1.6351845128869422, | |
| "learning_rate": 3.8821389643151924e-05, | |
| "loss": 0.2718, | |
| "mean_token_accuracy": 0.9180882722139359, | |
| "step": 242 | |
| }, | |
| { | |
| "epoch": 1.4222873900293256, | |
| "grad_norm": 1.8589530261198506, | |
| "learning_rate": 3.880915413409102e-05, | |
| "loss": 0.3403, | |
| "mean_token_accuracy": 0.9056767523288727, | |
| "step": 243 | |
| }, | |
| { | |
| "epoch": 1.4281524926686218, | |
| "grad_norm": 1.7521182715832206, | |
| "learning_rate": 3.879685761458938e-05, | |
| "loss": 0.4686, | |
| "mean_token_accuracy": 0.8556941673159599, | |
| "step": 244 | |
| }, | |
| { | |
| "epoch": 1.434017595307918, | |
| "grad_norm": 1.9719842940855452, | |
| "learning_rate": 3.8784500129278405e-05, | |
| "loss": 0.2848, | |
| "mean_token_accuracy": 0.9201192110776901, | |
| "step": 245 | |
| }, | |
| { | |
| "epoch": 1.4398826979472141, | |
| "grad_norm": 1.8790574357314227, | |
| "learning_rate": 3.877208172301079e-05, | |
| "loss": 0.4605, | |
| "mean_token_accuracy": 0.8656576499342918, | |
| "step": 246 | |
| }, | |
| { | |
| "epoch": 1.4457478005865103, | |
| "grad_norm": 1.8025911112786197, | |
| "learning_rate": 3.875960244086032e-05, | |
| "loss": 0.3391, | |
| "mean_token_accuracy": 0.9012917503714561, | |
| "step": 247 | |
| }, | |
| { | |
| "epoch": 1.4516129032258065, | |
| "grad_norm": 1.801300852253163, | |
| "learning_rate": 3.8747062328121756e-05, | |
| "loss": 0.378, | |
| "mean_token_accuracy": 0.9069185331463814, | |
| "step": 248 | |
| }, | |
| { | |
| "epoch": 1.4574780058651027, | |
| "grad_norm": 1.534353901888919, | |
| "learning_rate": 3.873446143031064e-05, | |
| "loss": 0.2786, | |
| "mean_token_accuracy": 0.9182317852973938, | |
| "step": 249 | |
| }, | |
| { | |
| "epoch": 1.4633431085043989, | |
| "grad_norm": 1.6153260323031022, | |
| "learning_rate": 3.872179979316314e-05, | |
| "loss": 0.2901, | |
| "mean_token_accuracy": 0.9126993268728256, | |
| "step": 250 | |
| }, | |
| { | |
| "epoch": 1.469208211143695, | |
| "grad_norm": 1.4971841035217601, | |
| "learning_rate": 3.870907746263589e-05, | |
| "loss": 0.2543, | |
| "mean_token_accuracy": 0.9198986664414406, | |
| "step": 251 | |
| }, | |
| { | |
| "epoch": 1.4750733137829912, | |
| "grad_norm": 1.6526746946220452, | |
| "learning_rate": 3.869629448490582e-05, | |
| "loss": 0.3306, | |
| "mean_token_accuracy": 0.9064493998885155, | |
| "step": 252 | |
| }, | |
| { | |
| "epoch": 1.4809384164222874, | |
| "grad_norm": 1.622995160750932, | |
| "learning_rate": 3.868345090636995e-05, | |
| "loss": 0.3715, | |
| "mean_token_accuracy": 0.9039052128791809, | |
| "step": 253 | |
| }, | |
| { | |
| "epoch": 1.4868035190615836, | |
| "grad_norm": 1.9020619860495158, | |
| "learning_rate": 3.867054677364531e-05, | |
| "loss": 0.3454, | |
| "mean_token_accuracy": 0.8969385549426079, | |
| "step": 254 | |
| }, | |
| { | |
| "epoch": 1.4926686217008798, | |
| "grad_norm": 1.6271978444969855, | |
| "learning_rate": 3.865758213356868e-05, | |
| "loss": 0.3595, | |
| "mean_token_accuracy": 0.8911759555339813, | |
| "step": 255 | |
| }, | |
| { | |
| "epoch": 1.498533724340176, | |
| "grad_norm": 1.9095375703624875, | |
| "learning_rate": 3.8644557033196456e-05, | |
| "loss": 0.344, | |
| "mean_token_accuracy": 0.8955858647823334, | |
| "step": 256 | |
| }, | |
| { | |
| "epoch": 1.5043988269794721, | |
| "grad_norm": 1.7428359018038488, | |
| "learning_rate": 3.8631471519804514e-05, | |
| "loss": 0.3832, | |
| "mean_token_accuracy": 0.8968099281191826, | |
| "step": 257 | |
| }, | |
| { | |
| "epoch": 1.5102639296187683, | |
| "grad_norm": 1.862424323093678, | |
| "learning_rate": 3.861832564088797e-05, | |
| "loss": 0.414, | |
| "mean_token_accuracy": 0.8887148573994637, | |
| "step": 258 | |
| }, | |
| { | |
| "epoch": 1.5161290322580645, | |
| "grad_norm": 1.9872329397145905, | |
| "learning_rate": 3.860511944416105e-05, | |
| "loss": 0.2824, | |
| "mean_token_accuracy": 0.9190465062856674, | |
| "step": 259 | |
| }, | |
| { | |
| "epoch": 1.5219941348973607, | |
| "grad_norm": 1.7282818347301152, | |
| "learning_rate": 3.859185297755693e-05, | |
| "loss": 0.2932, | |
| "mean_token_accuracy": 0.9137580618262291, | |
| "step": 260 | |
| }, | |
| { | |
| "epoch": 1.5278592375366569, | |
| "grad_norm": 1.5338018657108714, | |
| "learning_rate": 3.857852628922751e-05, | |
| "loss": 0.2535, | |
| "mean_token_accuracy": 0.9256049245595932, | |
| "step": 261 | |
| }, | |
| { | |
| "epoch": 1.533724340175953, | |
| "grad_norm": 1.9982996544252496, | |
| "learning_rate": 3.856513942754329e-05, | |
| "loss": 0.3044, | |
| "mean_token_accuracy": 0.9065937474370003, | |
| "step": 262 | |
| }, | |
| { | |
| "epoch": 1.5395894428152492, | |
| "grad_norm": 1.6882075733622859, | |
| "learning_rate": 3.8551692441093183e-05, | |
| "loss": 0.224, | |
| "mean_token_accuracy": 0.9296763613820076, | |
| "step": 263 | |
| }, | |
| { | |
| "epoch": 1.5454545454545454, | |
| "grad_norm": 1.6550207607165734, | |
| "learning_rate": 3.85381853786843e-05, | |
| "loss": 0.4096, | |
| "mean_token_accuracy": 0.8737977668642998, | |
| "step": 264 | |
| }, | |
| { | |
| "epoch": 1.5513196480938416, | |
| "grad_norm": 1.9251868350275612, | |
| "learning_rate": 3.852461828934184e-05, | |
| "loss": 0.3813, | |
| "mean_token_accuracy": 0.8981966599822044, | |
| "step": 265 | |
| }, | |
| { | |
| "epoch": 1.5571847507331378, | |
| "grad_norm": 1.8470903871443582, | |
| "learning_rate": 3.851099122230885e-05, | |
| "loss": 0.2921, | |
| "mean_token_accuracy": 0.9101375266909599, | |
| "step": 266 | |
| }, | |
| { | |
| "epoch": 1.563049853372434, | |
| "grad_norm": 1.399781309432104, | |
| "learning_rate": 3.849730422704608e-05, | |
| "loss": 0.4117, | |
| "mean_token_accuracy": 0.8910598680377007, | |
| "step": 267 | |
| }, | |
| { | |
| "epoch": 1.5689149560117301, | |
| "grad_norm": 1.9099984044071514, | |
| "learning_rate": 3.84835573532318e-05, | |
| "loss": 0.2574, | |
| "mean_token_accuracy": 0.9176947250962257, | |
| "step": 268 | |
| }, | |
| { | |
| "epoch": 1.5747800586510263, | |
| "grad_norm": 1.6517764815366145, | |
| "learning_rate": 3.84697506507616e-05, | |
| "loss": 0.3905, | |
| "mean_token_accuracy": 0.8897673636674881, | |
| "step": 269 | |
| }, | |
| { | |
| "epoch": 1.5806451612903225, | |
| "grad_norm": 2.119009041143675, | |
| "learning_rate": 3.845588416974824e-05, | |
| "loss": 0.4224, | |
| "mean_token_accuracy": 0.8940696120262146, | |
| "step": 270 | |
| }, | |
| { | |
| "epoch": 1.5865102639296187, | |
| "grad_norm": 1.9339959130871265, | |
| "learning_rate": 3.844195796052144e-05, | |
| "loss": 0.3438, | |
| "mean_token_accuracy": 0.9041855335235596, | |
| "step": 271 | |
| }, | |
| { | |
| "epoch": 1.5923753665689149, | |
| "grad_norm": 1.640513540139655, | |
| "learning_rate": 3.8427972073627724e-05, | |
| "loss": 0.4909, | |
| "mean_token_accuracy": 0.867702804505825, | |
| "step": 272 | |
| }, | |
| { | |
| "epoch": 1.598240469208211, | |
| "grad_norm": 1.9285791207634029, | |
| "learning_rate": 3.841392655983021e-05, | |
| "loss": 0.2348, | |
| "mean_token_accuracy": 0.9315564408898354, | |
| "step": 273 | |
| }, | |
| { | |
| "epoch": 1.6041055718475072, | |
| "grad_norm": 1.3040809643177875, | |
| "learning_rate": 3.8399821470108444e-05, | |
| "loss": 0.1918, | |
| "mean_token_accuracy": 0.9407966360449791, | |
| "step": 274 | |
| }, | |
| { | |
| "epoch": 1.6099706744868034, | |
| "grad_norm": 1.8717232958282308, | |
| "learning_rate": 3.838565685565819e-05, | |
| "loss": 0.4585, | |
| "mean_token_accuracy": 0.878320649266243, | |
| "step": 275 | |
| }, | |
| { | |
| "epoch": 1.6158357771260996, | |
| "grad_norm": 1.9125488044957306, | |
| "learning_rate": 3.8371432767891295e-05, | |
| "loss": 0.3633, | |
| "mean_token_accuracy": 0.9053328037261963, | |
| "step": 276 | |
| }, | |
| { | |
| "epoch": 1.6217008797653958, | |
| "grad_norm": 1.7227423517526814, | |
| "learning_rate": 3.8357149258435444e-05, | |
| "loss": 0.2805, | |
| "mean_token_accuracy": 0.919714592397213, | |
| "step": 277 | |
| }, | |
| { | |
| "epoch": 1.627565982404692, | |
| "grad_norm": 1.6693426207067774, | |
| "learning_rate": 3.8342806379134005e-05, | |
| "loss": 0.4284, | |
| "mean_token_accuracy": 0.8809763565659523, | |
| "step": 278 | |
| }, | |
| { | |
| "epoch": 1.6334310850439882, | |
| "grad_norm": 1.8397487618596386, | |
| "learning_rate": 3.8328404182045854e-05, | |
| "loss": 0.3692, | |
| "mean_token_accuracy": 0.9018078520894051, | |
| "step": 279 | |
| }, | |
| { | |
| "epoch": 1.6392961876832843, | |
| "grad_norm": 2.0516798648476002, | |
| "learning_rate": 3.831394271944512e-05, | |
| "loss": 0.3575, | |
| "mean_token_accuracy": 0.9079688414931297, | |
| "step": 280 | |
| }, | |
| { | |
| "epoch": 1.6451612903225805, | |
| "grad_norm": 1.7909788776442834, | |
| "learning_rate": 3.82994220438211e-05, | |
| "loss": 0.3413, | |
| "mean_token_accuracy": 0.8984247669577599, | |
| "step": 281 | |
| }, | |
| { | |
| "epoch": 1.6510263929618767, | |
| "grad_norm": 1.907207226886229, | |
| "learning_rate": 3.828484220787797e-05, | |
| "loss": 0.3581, | |
| "mean_token_accuracy": 0.899262860417366, | |
| "step": 282 | |
| }, | |
| { | |
| "epoch": 1.6568914956011729, | |
| "grad_norm": 1.9715579318688201, | |
| "learning_rate": 3.8270203264534644e-05, | |
| "loss": 0.4442, | |
| "mean_token_accuracy": 0.8822287917137146, | |
| "step": 283 | |
| }, | |
| { | |
| "epoch": 1.662756598240469, | |
| "grad_norm": 1.6457253264182927, | |
| "learning_rate": 3.8255505266924585e-05, | |
| "loss": 0.3476, | |
| "mean_token_accuracy": 0.8972783535718918, | |
| "step": 284 | |
| }, | |
| { | |
| "epoch": 1.6686217008797652, | |
| "grad_norm": 1.4827534514844065, | |
| "learning_rate": 3.824074826839557e-05, | |
| "loss": 0.276, | |
| "mean_token_accuracy": 0.9249418452382088, | |
| "step": 285 | |
| }, | |
| { | |
| "epoch": 1.6744868035190614, | |
| "grad_norm": 2.175526808689669, | |
| "learning_rate": 3.822593232250956e-05, | |
| "loss": 0.4903, | |
| "mean_token_accuracy": 0.8697306215763092, | |
| "step": 286 | |
| }, | |
| { | |
| "epoch": 1.6803519061583576, | |
| "grad_norm": 2.021289404237392, | |
| "learning_rate": 3.8211057483042446e-05, | |
| "loss": 0.504, | |
| "mean_token_accuracy": 0.8762414082884789, | |
| "step": 287 | |
| }, | |
| { | |
| "epoch": 1.6862170087976538, | |
| "grad_norm": 1.7314992545258114, | |
| "learning_rate": 3.8196123803983895e-05, | |
| "loss": 0.3522, | |
| "mean_token_accuracy": 0.8997413292527199, | |
| "step": 288 | |
| }, | |
| { | |
| "epoch": 1.6920821114369502, | |
| "grad_norm": 1.8603560940766846, | |
| "learning_rate": 3.818113133953712e-05, | |
| "loss": 0.3271, | |
| "mean_token_accuracy": 0.9100041314959526, | |
| "step": 289 | |
| }, | |
| { | |
| "epoch": 1.6979472140762464, | |
| "grad_norm": 1.4966872907556552, | |
| "learning_rate": 3.816608014411872e-05, | |
| "loss": 0.2537, | |
| "mean_token_accuracy": 0.9245190322399139, | |
| "step": 290 | |
| }, | |
| { | |
| "epoch": 1.7038123167155426, | |
| "grad_norm": 1.603750263252283, | |
| "learning_rate": 3.815097027235845e-05, | |
| "loss": 0.3494, | |
| "mean_token_accuracy": 0.8979176208376884, | |
| "step": 291 | |
| }, | |
| { | |
| "epoch": 1.7096774193548387, | |
| "grad_norm": 1.8283044408892475, | |
| "learning_rate": 3.813580177909906e-05, | |
| "loss": 0.2734, | |
| "mean_token_accuracy": 0.9121185839176178, | |
| "step": 292 | |
| }, | |
| { | |
| "epoch": 1.715542521994135, | |
| "grad_norm": 1.3502432363899728, | |
| "learning_rate": 3.8120574719396023e-05, | |
| "loss": 0.2877, | |
| "mean_token_accuracy": 0.9239327982068062, | |
| "step": 293 | |
| }, | |
| { | |
| "epoch": 1.721407624633431, | |
| "grad_norm": 2.027539954165466, | |
| "learning_rate": 3.810528914851745e-05, | |
| "loss": 0.4155, | |
| "mean_token_accuracy": 0.8827510923147202, | |
| "step": 294 | |
| }, | |
| { | |
| "epoch": 1.7272727272727273, | |
| "grad_norm": 1.7186579918501383, | |
| "learning_rate": 3.808994512194376e-05, | |
| "loss": 0.3796, | |
| "mean_token_accuracy": 0.8875857517123222, | |
| "step": 295 | |
| }, | |
| { | |
| "epoch": 1.7331378299120235, | |
| "grad_norm": 1.813081462480938, | |
| "learning_rate": 3.807454269536758e-05, | |
| "loss": 0.3981, | |
| "mean_token_accuracy": 0.8904116526246071, | |
| "step": 296 | |
| }, | |
| { | |
| "epoch": 1.7390029325513197, | |
| "grad_norm": 1.703513451013957, | |
| "learning_rate": 3.805908192469351e-05, | |
| "loss": 0.2559, | |
| "mean_token_accuracy": 0.9196176305413246, | |
| "step": 297 | |
| }, | |
| { | |
| "epoch": 1.7448680351906158, | |
| "grad_norm": 1.5203584075310517, | |
| "learning_rate": 3.80435628660379e-05, | |
| "loss": 0.3658, | |
| "mean_token_accuracy": 0.8972667753696442, | |
| "step": 298 | |
| }, | |
| { | |
| "epoch": 1.750733137829912, | |
| "grad_norm": 1.4984555872298189, | |
| "learning_rate": 3.802798557572867e-05, | |
| "loss": 0.361, | |
| "mean_token_accuracy": 0.8996458873152733, | |
| "step": 299 | |
| }, | |
| { | |
| "epoch": 1.7565982404692082, | |
| "grad_norm": 1.7784418749764175, | |
| "learning_rate": 3.801235011030506e-05, | |
| "loss": 0.349, | |
| "mean_token_accuracy": 0.9058357775211334, | |
| "step": 300 | |
| }, | |
| { | |
| "epoch": 1.7624633431085044, | |
| "grad_norm": 1.6134415362903494, | |
| "learning_rate": 3.799665652651754e-05, | |
| "loss": 0.2183, | |
| "mean_token_accuracy": 0.9335889741778374, | |
| "step": 301 | |
| }, | |
| { | |
| "epoch": 1.7683284457478006, | |
| "grad_norm": 1.41824274486332, | |
| "learning_rate": 3.7980904881327446e-05, | |
| "loss": 0.2875, | |
| "mean_token_accuracy": 0.9194483831524849, | |
| "step": 302 | |
| }, | |
| { | |
| "epoch": 1.7741935483870968, | |
| "grad_norm": 1.6306057175263466, | |
| "learning_rate": 3.796509523190691e-05, | |
| "loss": 0.3065, | |
| "mean_token_accuracy": 0.9074099063873291, | |
| "step": 303 | |
| }, | |
| { | |
| "epoch": 1.780058651026393, | |
| "grad_norm": 1.3490534339890907, | |
| "learning_rate": 3.794922763563857e-05, | |
| "loss": 0.249, | |
| "mean_token_accuracy": 0.9284609705209732, | |
| "step": 304 | |
| }, | |
| { | |
| "epoch": 1.7859237536656891, | |
| "grad_norm": 1.800371852919932, | |
| "learning_rate": 3.793330215011538e-05, | |
| "loss": 0.3353, | |
| "mean_token_accuracy": 0.9221402704715729, | |
| "step": 305 | |
| }, | |
| { | |
| "epoch": 1.7917888563049853, | |
| "grad_norm": 1.6498676734395081, | |
| "learning_rate": 3.791731883314043e-05, | |
| "loss": 0.3088, | |
| "mean_token_accuracy": 0.9110808372497559, | |
| "step": 306 | |
| }, | |
| { | |
| "epoch": 1.7976539589442815, | |
| "grad_norm": 1.7165297415750906, | |
| "learning_rate": 3.790127774272671e-05, | |
| "loss": 0.2658, | |
| "mean_token_accuracy": 0.9218085780739784, | |
| "step": 307 | |
| }, | |
| { | |
| "epoch": 1.8035190615835777, | |
| "grad_norm": 1.5710559635899641, | |
| "learning_rate": 3.7885178937096884e-05, | |
| "loss": 0.4084, | |
| "mean_token_accuracy": 0.8922935128211975, | |
| "step": 308 | |
| }, | |
| { | |
| "epoch": 1.8093841642228738, | |
| "grad_norm": 1.7141003007560607, | |
| "learning_rate": 3.7869022474683125e-05, | |
| "loss": 0.4542, | |
| "mean_token_accuracy": 0.8909810185432434, | |
| "step": 309 | |
| }, | |
| { | |
| "epoch": 1.8152492668621703, | |
| "grad_norm": 2.1651975563574406, | |
| "learning_rate": 3.7852808414126876e-05, | |
| "loss": 0.401, | |
| "mean_token_accuracy": 0.8917931020259857, | |
| "step": 310 | |
| }, | |
| { | |
| "epoch": 1.8211143695014664, | |
| "grad_norm": 1.5322855424185167, | |
| "learning_rate": 3.783653681427861e-05, | |
| "loss": 0.2438, | |
| "mean_token_accuracy": 0.9244765937328339, | |
| "step": 311 | |
| }, | |
| { | |
| "epoch": 1.8269794721407626, | |
| "grad_norm": 1.5846288322829667, | |
| "learning_rate": 3.7820207734197676e-05, | |
| "loss": 0.3526, | |
| "mean_token_accuracy": 0.9049504473805428, | |
| "step": 312 | |
| }, | |
| { | |
| "epoch": 1.8328445747800588, | |
| "grad_norm": 1.689626483443562, | |
| "learning_rate": 3.780382123315203e-05, | |
| "loss": 0.2199, | |
| "mean_token_accuracy": 0.9369524195790291, | |
| "step": 313 | |
| }, | |
| { | |
| "epoch": 1.838709677419355, | |
| "grad_norm": 1.5158081896163202, | |
| "learning_rate": 3.778737737061807e-05, | |
| "loss": 0.3378, | |
| "mean_token_accuracy": 0.9043772518634796, | |
| "step": 314 | |
| }, | |
| { | |
| "epoch": 1.8445747800586512, | |
| "grad_norm": 1.550058792883986, | |
| "learning_rate": 3.777087620628035e-05, | |
| "loss": 0.2673, | |
| "mean_token_accuracy": 0.9279494360089302, | |
| "step": 315 | |
| }, | |
| { | |
| "epoch": 1.8504398826979473, | |
| "grad_norm": 1.6597960206898577, | |
| "learning_rate": 3.775431780003145e-05, | |
| "loss": 0.2462, | |
| "mean_token_accuracy": 0.9318157806992531, | |
| "step": 316 | |
| }, | |
| { | |
| "epoch": 1.8563049853372435, | |
| "grad_norm": 1.4622011187254726, | |
| "learning_rate": 3.7737702211971684e-05, | |
| "loss": 0.2772, | |
| "mean_token_accuracy": 0.929963655769825, | |
| "step": 317 | |
| }, | |
| { | |
| "epoch": 1.8621700879765397, | |
| "grad_norm": 1.5151732843103733, | |
| "learning_rate": 3.772102950240895e-05, | |
| "loss": 0.2679, | |
| "mean_token_accuracy": 0.931287556886673, | |
| "step": 318 | |
| }, | |
| { | |
| "epoch": 1.868035190615836, | |
| "grad_norm": 1.5891712041281862, | |
| "learning_rate": 3.770429973185842e-05, | |
| "loss": 0.3341, | |
| "mean_token_accuracy": 0.9078937619924545, | |
| "step": 319 | |
| }, | |
| { | |
| "epoch": 1.873900293255132, | |
| "grad_norm": 1.8744855927836221, | |
| "learning_rate": 3.768751296104243e-05, | |
| "loss": 0.2325, | |
| "mean_token_accuracy": 0.9263536930084229, | |
| "step": 320 | |
| }, | |
| { | |
| "epoch": 1.8797653958944283, | |
| "grad_norm": 1.2684553166509498, | |
| "learning_rate": 3.767066925089017e-05, | |
| "loss": 0.3273, | |
| "mean_token_accuracy": 0.9072131887078285, | |
| "step": 321 | |
| }, | |
| { | |
| "epoch": 1.8856304985337244, | |
| "grad_norm": 1.5776419156712478, | |
| "learning_rate": 3.765376866253749e-05, | |
| "loss": 0.2212, | |
| "mean_token_accuracy": 0.9262972101569176, | |
| "step": 322 | |
| }, | |
| { | |
| "epoch": 1.8914956011730206, | |
| "grad_norm": 1.5061879986087867, | |
| "learning_rate": 3.763681125732672e-05, | |
| "loss": 0.2883, | |
| "mean_token_accuracy": 0.9100302383303642, | |
| "step": 323 | |
| }, | |
| { | |
| "epoch": 1.8973607038123168, | |
| "grad_norm": 1.6131493853591912, | |
| "learning_rate": 3.7619797096806386e-05, | |
| "loss": 0.322, | |
| "mean_token_accuracy": 0.9105613902211189, | |
| "step": 324 | |
| }, | |
| { | |
| "epoch": 1.903225806451613, | |
| "grad_norm": 1.563416189435815, | |
| "learning_rate": 3.7602726242731016e-05, | |
| "loss": 0.3532, | |
| "mean_token_accuracy": 0.9058608040213585, | |
| "step": 325 | |
| }, | |
| { | |
| "epoch": 1.9090909090909092, | |
| "grad_norm": 1.600675087277008, | |
| "learning_rate": 3.758559875706092e-05, | |
| "loss": 0.2515, | |
| "mean_token_accuracy": 0.9262616038322449, | |
| "step": 326 | |
| }, | |
| { | |
| "epoch": 1.9149560117302054, | |
| "grad_norm": 1.4397417416715703, | |
| "learning_rate": 3.756841470196195e-05, | |
| "loss": 0.3584, | |
| "mean_token_accuracy": 0.9060768559575081, | |
| "step": 327 | |
| }, | |
| { | |
| "epoch": 1.9208211143695015, | |
| "grad_norm": 1.6091207177650233, | |
| "learning_rate": 3.7551174139805284e-05, | |
| "loss": 0.3848, | |
| "mean_token_accuracy": 0.8916498497128487, | |
| "step": 328 | |
| }, | |
| { | |
| "epoch": 1.9266862170087977, | |
| "grad_norm": 1.8405019636747775, | |
| "learning_rate": 3.75338771331672e-05, | |
| "loss": 0.3563, | |
| "mean_token_accuracy": 0.8986505270004272, | |
| "step": 329 | |
| }, | |
| { | |
| "epoch": 1.932551319648094, | |
| "grad_norm": 1.6744591407188665, | |
| "learning_rate": 3.7516523744828856e-05, | |
| "loss": 0.4076, | |
| "mean_token_accuracy": 0.8903002217411995, | |
| "step": 330 | |
| }, | |
| { | |
| "epoch": 1.93841642228739, | |
| "grad_norm": 1.7844176464926413, | |
| "learning_rate": 3.7499114037776036e-05, | |
| "loss": 0.3237, | |
| "mean_token_accuracy": 0.8996743708848953, | |
| "step": 331 | |
| }, | |
| { | |
| "epoch": 1.9442815249266863, | |
| "grad_norm": 1.892031074058152, | |
| "learning_rate": 3.748164807519894e-05, | |
| "loss": 0.4614, | |
| "mean_token_accuracy": 0.8785504251718521, | |
| "step": 332 | |
| }, | |
| { | |
| "epoch": 1.9501466275659824, | |
| "grad_norm": 1.9529591014745988, | |
| "learning_rate": 3.746412592049197e-05, | |
| "loss": 0.3317, | |
| "mean_token_accuracy": 0.9095030203461647, | |
| "step": 333 | |
| }, | |
| { | |
| "epoch": 1.9560117302052786, | |
| "grad_norm": 1.3361876397839005, | |
| "learning_rate": 3.7446547637253464e-05, | |
| "loss": 0.2158, | |
| "mean_token_accuracy": 0.9386293217539787, | |
| "step": 334 | |
| }, | |
| { | |
| "epoch": 1.9618768328445748, | |
| "grad_norm": 1.5896606700284657, | |
| "learning_rate": 3.742891328928549e-05, | |
| "loss": 0.3072, | |
| "mean_token_accuracy": 0.9178111851215363, | |
| "step": 335 | |
| }, | |
| { | |
| "epoch": 1.967741935483871, | |
| "grad_norm": 1.5606940372882383, | |
| "learning_rate": 3.74112229405936e-05, | |
| "loss": 0.316, | |
| "mean_token_accuracy": 0.9071382880210876, | |
| "step": 336 | |
| }, | |
| { | |
| "epoch": 1.9736070381231672, | |
| "grad_norm": 1.6245220422058784, | |
| "learning_rate": 3.739347665538664e-05, | |
| "loss": 0.3286, | |
| "mean_token_accuracy": 0.9116719961166382, | |
| "step": 337 | |
| }, | |
| { | |
| "epoch": 1.9794721407624634, | |
| "grad_norm": 1.8517780202665928, | |
| "learning_rate": 3.7375674498076445e-05, | |
| "loss": 0.3867, | |
| "mean_token_accuracy": 0.8904563412070274, | |
| "step": 338 | |
| }, | |
| { | |
| "epoch": 1.9853372434017595, | |
| "grad_norm": 1.7036447018124885, | |
| "learning_rate": 3.7357816533277646e-05, | |
| "loss": 0.3053, | |
| "mean_token_accuracy": 0.9213255569338799, | |
| "step": 339 | |
| }, | |
| { | |
| "epoch": 1.9912023460410557, | |
| "grad_norm": 1.404790254341481, | |
| "learning_rate": 3.733990282580745e-05, | |
| "loss": 0.3073, | |
| "mean_token_accuracy": 0.90599025785923, | |
| "step": 340 | |
| }, | |
| { | |
| "epoch": 1.997067448680352, | |
| "grad_norm": 1.7871327162492117, | |
| "learning_rate": 3.732193344068539e-05, | |
| "loss": 0.3151, | |
| "mean_token_accuracy": 0.9118079468607903, | |
| "step": 341 | |
| }, | |
| { | |
| "epoch": 2.0, | |
| "grad_norm": 2.4692118190159413, | |
| "learning_rate": 3.7303908443133054e-05, | |
| "loss": 0.2212, | |
| "mean_token_accuracy": 0.9288553893566132, | |
| "step": 342 | |
| }, | |
| { | |
| "epoch": 2.005865102639296, | |
| "grad_norm": 1.476624660273671, | |
| "learning_rate": 3.728582789857393e-05, | |
| "loss": 0.2001, | |
| "mean_token_accuracy": 0.9483243823051453, | |
| "step": 343 | |
| }, | |
| { | |
| "epoch": 2.0117302052785924, | |
| "grad_norm": 1.503761647923481, | |
| "learning_rate": 3.726769187263308e-05, | |
| "loss": 0.2274, | |
| "mean_token_accuracy": 0.9308255985379219, | |
| "step": 344 | |
| }, | |
| { | |
| "epoch": 2.0175953079178885, | |
| "grad_norm": 1.1442273526640767, | |
| "learning_rate": 3.724950043113695e-05, | |
| "loss": 0.1528, | |
| "mean_token_accuracy": 0.9552865400910378, | |
| "step": 345 | |
| }, | |
| { | |
| "epoch": 2.0234604105571847, | |
| "grad_norm": 1.1676585793324792, | |
| "learning_rate": 3.723125364011313e-05, | |
| "loss": 0.1401, | |
| "mean_token_accuracy": 0.959092803299427, | |
| "step": 346 | |
| }, | |
| { | |
| "epoch": 2.029325513196481, | |
| "grad_norm": 1.4179806604734995, | |
| "learning_rate": 3.7212951565790094e-05, | |
| "loss": 0.1724, | |
| "mean_token_accuracy": 0.9462388530373573, | |
| "step": 347 | |
| }, | |
| { | |
| "epoch": 2.035190615835777, | |
| "grad_norm": 1.3565931025276359, | |
| "learning_rate": 3.7194594274597e-05, | |
| "loss": 0.1713, | |
| "mean_token_accuracy": 0.9462545812129974, | |
| "step": 348 | |
| }, | |
| { | |
| "epoch": 2.0410557184750733, | |
| "grad_norm": 2.0100162506170856, | |
| "learning_rate": 3.7176181833163385e-05, | |
| "loss": 0.214, | |
| "mean_token_accuracy": 0.9363678023219109, | |
| "step": 349 | |
| }, | |
| { | |
| "epoch": 2.0469208211143695, | |
| "grad_norm": 1.574439267468769, | |
| "learning_rate": 3.7157714308318966e-05, | |
| "loss": 0.1768, | |
| "mean_token_accuracy": 0.9480359777808189, | |
| "step": 350 | |
| }, | |
| { | |
| "epoch": 2.0527859237536656, | |
| "grad_norm": 1.4723059029662238, | |
| "learning_rate": 3.713919176709343e-05, | |
| "loss": 0.2094, | |
| "mean_token_accuracy": 0.942324809730053, | |
| "step": 351 | |
| }, | |
| { | |
| "epoch": 2.058651026392962, | |
| "grad_norm": 1.5342137566871517, | |
| "learning_rate": 3.712061427671609e-05, | |
| "loss": 0.1612, | |
| "mean_token_accuracy": 0.9546240866184235, | |
| "step": 352 | |
| }, | |
| { | |
| "epoch": 2.064516129032258, | |
| "grad_norm": 1.468762510345496, | |
| "learning_rate": 3.710198190461575e-05, | |
| "loss": 0.1893, | |
| "mean_token_accuracy": 0.9487204700708389, | |
| "step": 353 | |
| }, | |
| { | |
| "epoch": 2.070381231671554, | |
| "grad_norm": 1.4106607901877921, | |
| "learning_rate": 3.7083294718420394e-05, | |
| "loss": 0.1829, | |
| "mean_token_accuracy": 0.9452722370624542, | |
| "step": 354 | |
| }, | |
| { | |
| "epoch": 2.0762463343108504, | |
| "grad_norm": 1.5971490830702397, | |
| "learning_rate": 3.706455278595696e-05, | |
| "loss": 0.1826, | |
| "mean_token_accuracy": 0.9438930004835129, | |
| "step": 355 | |
| }, | |
| { | |
| "epoch": 2.0821114369501466, | |
| "grad_norm": 1.4328050289867758, | |
| "learning_rate": 3.7045756175251086e-05, | |
| "loss": 0.1757, | |
| "mean_token_accuracy": 0.9502169340848923, | |
| "step": 356 | |
| }, | |
| { | |
| "epoch": 2.0879765395894427, | |
| "grad_norm": 1.3226133601725745, | |
| "learning_rate": 3.7026904954526884e-05, | |
| "loss": 0.1589, | |
| "mean_token_accuracy": 0.949699193239212, | |
| "step": 357 | |
| }, | |
| { | |
| "epoch": 2.093841642228739, | |
| "grad_norm": 1.302704971311971, | |
| "learning_rate": 3.7007999192206676e-05, | |
| "loss": 0.1498, | |
| "mean_token_accuracy": 0.955219678580761, | |
| "step": 358 | |
| }, | |
| { | |
| "epoch": 2.099706744868035, | |
| "grad_norm": 1.3221596689539326, | |
| "learning_rate": 3.698903895691073e-05, | |
| "loss": 0.1894, | |
| "mean_token_accuracy": 0.9378139674663544, | |
| "step": 359 | |
| }, | |
| { | |
| "epoch": 2.1055718475073313, | |
| "grad_norm": 1.5050554126622395, | |
| "learning_rate": 3.697002431745706e-05, | |
| "loss": 0.1808, | |
| "mean_token_accuracy": 0.942346952855587, | |
| "step": 360 | |
| }, | |
| { | |
| "epoch": 2.1114369501466275, | |
| "grad_norm": 1.5953279006691845, | |
| "learning_rate": 3.695095534286111e-05, | |
| "loss": 0.2218, | |
| "mean_token_accuracy": 0.9413799494504929, | |
| "step": 361 | |
| }, | |
| { | |
| "epoch": 2.1173020527859236, | |
| "grad_norm": 1.4771134232796863, | |
| "learning_rate": 3.693183210233557e-05, | |
| "loss": 0.189, | |
| "mean_token_accuracy": 0.9463675022125244, | |
| "step": 362 | |
| }, | |
| { | |
| "epoch": 2.12316715542522, | |
| "grad_norm": 1.38647866538845, | |
| "learning_rate": 3.691265466529007e-05, | |
| "loss": 0.1821, | |
| "mean_token_accuracy": 0.9421042799949646, | |
| "step": 363 | |
| }, | |
| { | |
| "epoch": 2.129032258064516, | |
| "grad_norm": 1.362873233806283, | |
| "learning_rate": 3.689342310133097e-05, | |
| "loss": 0.1592, | |
| "mean_token_accuracy": 0.9517010152339935, | |
| "step": 364 | |
| }, | |
| { | |
| "epoch": 2.134897360703812, | |
| "grad_norm": 1.4294148599496252, | |
| "learning_rate": 3.687413748026108e-05, | |
| "loss": 0.1951, | |
| "mean_token_accuracy": 0.9434895291924477, | |
| "step": 365 | |
| }, | |
| { | |
| "epoch": 2.1407624633431084, | |
| "grad_norm": 1.3167525955804085, | |
| "learning_rate": 3.68547978720794e-05, | |
| "loss": 0.1779, | |
| "mean_token_accuracy": 0.9462132379412651, | |
| "step": 366 | |
| }, | |
| { | |
| "epoch": 2.1466275659824046, | |
| "grad_norm": 1.334685880079026, | |
| "learning_rate": 3.683540434698093e-05, | |
| "loss": 0.1653, | |
| "mean_token_accuracy": 0.9485151618719101, | |
| "step": 367 | |
| }, | |
| { | |
| "epoch": 2.1524926686217007, | |
| "grad_norm": 1.1362560129422759, | |
| "learning_rate": 3.681595697535629e-05, | |
| "loss": 0.1431, | |
| "mean_token_accuracy": 0.9614577367901802, | |
| "step": 368 | |
| }, | |
| { | |
| "epoch": 2.158357771260997, | |
| "grad_norm": 1.2719092502053326, | |
| "learning_rate": 3.6796455827791614e-05, | |
| "loss": 0.1641, | |
| "mean_token_accuracy": 0.9526363462209702, | |
| "step": 369 | |
| }, | |
| { | |
| "epoch": 2.164222873900293, | |
| "grad_norm": 1.4561156580126304, | |
| "learning_rate": 3.677690097506819e-05, | |
| "loss": 0.1771, | |
| "mean_token_accuracy": 0.9508548080921173, | |
| "step": 370 | |
| }, | |
| { | |
| "epoch": 2.1700879765395893, | |
| "grad_norm": 1.355167178954715, | |
| "learning_rate": 3.6757292488162224e-05, | |
| "loss": 0.1812, | |
| "mean_token_accuracy": 0.9459526911377907, | |
| "step": 371 | |
| }, | |
| { | |
| "epoch": 2.1759530791788855, | |
| "grad_norm": 1.4755420760216644, | |
| "learning_rate": 3.673763043824461e-05, | |
| "loss": 0.2183, | |
| "mean_token_accuracy": 0.9343624264001846, | |
| "step": 372 | |
| }, | |
| { | |
| "epoch": 2.1818181818181817, | |
| "grad_norm": 1.4730527980876331, | |
| "learning_rate": 3.671791489668065e-05, | |
| "loss": 0.1958, | |
| "mean_token_accuracy": 0.9429437816143036, | |
| "step": 373 | |
| }, | |
| { | |
| "epoch": 2.187683284457478, | |
| "grad_norm": 1.1714314133219133, | |
| "learning_rate": 3.6698145935029794e-05, | |
| "loss": 0.1445, | |
| "mean_token_accuracy": 0.9535733610391617, | |
| "step": 374 | |
| }, | |
| { | |
| "epoch": 2.193548387096774, | |
| "grad_norm": 1.2427514759243454, | |
| "learning_rate": 3.66783236250454e-05, | |
| "loss": 0.1687, | |
| "mean_token_accuracy": 0.946289099752903, | |
| "step": 375 | |
| }, | |
| { | |
| "epoch": 2.19941348973607, | |
| "grad_norm": 1.475415652426242, | |
| "learning_rate": 3.665844803867443e-05, | |
| "loss": 0.2129, | |
| "mean_token_accuracy": 0.9403799325227737, | |
| "step": 376 | |
| }, | |
| { | |
| "epoch": 2.2052785923753664, | |
| "grad_norm": 1.428248135712194, | |
| "learning_rate": 3.663851924805725e-05, | |
| "loss": 0.1989, | |
| "mean_token_accuracy": 0.9408678263425827, | |
| "step": 377 | |
| }, | |
| { | |
| "epoch": 2.2111436950146626, | |
| "grad_norm": 1.4533859141194323, | |
| "learning_rate": 3.66185373255273e-05, | |
| "loss": 0.2005, | |
| "mean_token_accuracy": 0.9400480091571808, | |
| "step": 378 | |
| }, | |
| { | |
| "epoch": 2.2170087976539588, | |
| "grad_norm": 1.2568428196886394, | |
| "learning_rate": 3.6598502343610906e-05, | |
| "loss": 0.1588, | |
| "mean_token_accuracy": 0.9535493031144142, | |
| "step": 379 | |
| }, | |
| { | |
| "epoch": 2.222873900293255, | |
| "grad_norm": 1.7251468035919506, | |
| "learning_rate": 3.657841437502697e-05, | |
| "loss": 0.2436, | |
| "mean_token_accuracy": 0.9211764186620712, | |
| "step": 380 | |
| }, | |
| { | |
| "epoch": 2.228739002932551, | |
| "grad_norm": 1.563747478750116, | |
| "learning_rate": 3.6558273492686686e-05, | |
| "loss": 0.1925, | |
| "mean_token_accuracy": 0.9419304430484772, | |
| "step": 381 | |
| }, | |
| { | |
| "epoch": 2.2346041055718473, | |
| "grad_norm": 1.3371515680969543, | |
| "learning_rate": 3.6538079769693334e-05, | |
| "loss": 0.1764, | |
| "mean_token_accuracy": 0.9459211006760597, | |
| "step": 382 | |
| }, | |
| { | |
| "epoch": 2.2404692082111435, | |
| "grad_norm": 1.1577027770973363, | |
| "learning_rate": 3.6517833279341954e-05, | |
| "loss": 0.144, | |
| "mean_token_accuracy": 0.9560522958636284, | |
| "step": 383 | |
| }, | |
| { | |
| "epoch": 2.2463343108504397, | |
| "grad_norm": 1.1804473461854166, | |
| "learning_rate": 3.649753409511916e-05, | |
| "loss": 0.1422, | |
| "mean_token_accuracy": 0.9586736932396889, | |
| "step": 384 | |
| }, | |
| { | |
| "epoch": 2.252199413489736, | |
| "grad_norm": 1.5316777495144922, | |
| "learning_rate": 3.6477182290702766e-05, | |
| "loss": 0.2218, | |
| "mean_token_accuracy": 0.933185450732708, | |
| "step": 385 | |
| }, | |
| { | |
| "epoch": 2.258064516129032, | |
| "grad_norm": 1.4337722711894096, | |
| "learning_rate": 3.645677793996161e-05, | |
| "loss": 0.1983, | |
| "mean_token_accuracy": 0.9400441944599152, | |
| "step": 386 | |
| }, | |
| { | |
| "epoch": 2.263929618768328, | |
| "grad_norm": 1.3355048384788264, | |
| "learning_rate": 3.643632111695525e-05, | |
| "loss": 0.1835, | |
| "mean_token_accuracy": 0.9445948526263237, | |
| "step": 387 | |
| }, | |
| { | |
| "epoch": 2.2697947214076244, | |
| "grad_norm": 1.4116132523940614, | |
| "learning_rate": 3.6415811895933685e-05, | |
| "loss": 0.1903, | |
| "mean_token_accuracy": 0.9421262443065643, | |
| "step": 388 | |
| }, | |
| { | |
| "epoch": 2.2756598240469206, | |
| "grad_norm": 1.1489874771022681, | |
| "learning_rate": 3.639525035133712e-05, | |
| "loss": 0.1434, | |
| "mean_token_accuracy": 0.9581286609172821, | |
| "step": 389 | |
| }, | |
| { | |
| "epoch": 2.281524926686217, | |
| "grad_norm": 1.5443270506707967, | |
| "learning_rate": 3.637463655779563e-05, | |
| "loss": 0.2251, | |
| "mean_token_accuracy": 0.937995620071888, | |
| "step": 390 | |
| }, | |
| { | |
| "epoch": 2.2873900293255134, | |
| "grad_norm": 1.3016122311026777, | |
| "learning_rate": 3.6353970590128975e-05, | |
| "loss": 0.1619, | |
| "mean_token_accuracy": 0.9526982828974724, | |
| "step": 391 | |
| }, | |
| { | |
| "epoch": 2.2932551319648096, | |
| "grad_norm": 1.1825803045225631, | |
| "learning_rate": 3.633325252334628e-05, | |
| "loss": 0.1659, | |
| "mean_token_accuracy": 0.9432104825973511, | |
| "step": 392 | |
| }, | |
| { | |
| "epoch": 2.2991202346041058, | |
| "grad_norm": 1.4653068378575294, | |
| "learning_rate": 3.6312482432645746e-05, | |
| "loss": 0.2015, | |
| "mean_token_accuracy": 0.9372165873646736, | |
| "step": 393 | |
| }, | |
| { | |
| "epoch": 2.304985337243402, | |
| "grad_norm": 1.3865139735176901, | |
| "learning_rate": 3.6291660393414414e-05, | |
| "loss": 0.1633, | |
| "mean_token_accuracy": 0.9534904658794403, | |
| "step": 394 | |
| }, | |
| { | |
| "epoch": 2.310850439882698, | |
| "grad_norm": 1.332181176107831, | |
| "learning_rate": 3.6270786481227885e-05, | |
| "loss": 0.1759, | |
| "mean_token_accuracy": 0.9474803134799004, | |
| "step": 395 | |
| }, | |
| { | |
| "epoch": 2.3167155425219943, | |
| "grad_norm": 1.3868946979168462, | |
| "learning_rate": 3.624986077185003e-05, | |
| "loss": 0.1887, | |
| "mean_token_accuracy": 0.9452441483736038, | |
| "step": 396 | |
| }, | |
| { | |
| "epoch": 2.3225806451612905, | |
| "grad_norm": 1.1266308673977825, | |
| "learning_rate": 3.622888334123272e-05, | |
| "loss": 0.1576, | |
| "mean_token_accuracy": 0.958634540438652, | |
| "step": 397 | |
| }, | |
| { | |
| "epoch": 2.3284457478005867, | |
| "grad_norm": 1.1583104865991503, | |
| "learning_rate": 3.620785426551555e-05, | |
| "loss": 0.1652, | |
| "mean_token_accuracy": 0.9511990696191788, | |
| "step": 398 | |
| }, | |
| { | |
| "epoch": 2.334310850439883, | |
| "grad_norm": 1.290128290927979, | |
| "learning_rate": 3.618677362102558e-05, | |
| "loss": 0.1492, | |
| "mean_token_accuracy": 0.9547437205910683, | |
| "step": 399 | |
| }, | |
| { | |
| "epoch": 2.340175953079179, | |
| "grad_norm": 1.5975208805547034, | |
| "learning_rate": 3.616564148427703e-05, | |
| "loss": 0.1867, | |
| "mean_token_accuracy": 0.9401877969503403, | |
| "step": 400 | |
| }, | |
| { | |
| "epoch": 2.346041055718475, | |
| "grad_norm": 1.4029390907999495, | |
| "learning_rate": 3.614445793197103e-05, | |
| "loss": 0.1675, | |
| "mean_token_accuracy": 0.9511738941073418, | |
| "step": 401 | |
| }, | |
| { | |
| "epoch": 2.3519061583577714, | |
| "grad_norm": 1.5974257452922849, | |
| "learning_rate": 3.61232230409953e-05, | |
| "loss": 0.176, | |
| "mean_token_accuracy": 0.9497552961111069, | |
| "step": 402 | |
| }, | |
| { | |
| "epoch": 2.3577712609970676, | |
| "grad_norm": 1.530210607178967, | |
| "learning_rate": 3.6101936888423936e-05, | |
| "loss": 0.1917, | |
| "mean_token_accuracy": 0.9507608339190483, | |
| "step": 403 | |
| }, | |
| { | |
| "epoch": 2.3636363636363638, | |
| "grad_norm": 2.509006214983105, | |
| "learning_rate": 3.6080599551517076e-05, | |
| "loss": 0.1982, | |
| "mean_token_accuracy": 0.9404582604765892, | |
| "step": 404 | |
| }, | |
| { | |
| "epoch": 2.36950146627566, | |
| "grad_norm": 1.5627934606392213, | |
| "learning_rate": 3.605921110772063e-05, | |
| "loss": 0.2118, | |
| "mean_token_accuracy": 0.9388338625431061, | |
| "step": 405 | |
| }, | |
| { | |
| "epoch": 2.375366568914956, | |
| "grad_norm": 1.3580342157157053, | |
| "learning_rate": 3.603777163466601e-05, | |
| "loss": 0.1774, | |
| "mean_token_accuracy": 0.9472236335277557, | |
| "step": 406 | |
| }, | |
| { | |
| "epoch": 2.3812316715542523, | |
| "grad_norm": 1.3490592728792703, | |
| "learning_rate": 3.6016281210169844e-05, | |
| "loss": 0.1825, | |
| "mean_token_accuracy": 0.94171442091465, | |
| "step": 407 | |
| }, | |
| { | |
| "epoch": 2.3870967741935485, | |
| "grad_norm": 1.3544855722262408, | |
| "learning_rate": 3.599473991223369e-05, | |
| "loss": 0.1729, | |
| "mean_token_accuracy": 0.9484469667077065, | |
| "step": 408 | |
| }, | |
| { | |
| "epoch": 2.3929618768328447, | |
| "grad_norm": 1.380080677011728, | |
| "learning_rate": 3.5973147819043765e-05, | |
| "loss": 0.1988, | |
| "mean_token_accuracy": 0.9358495697379112, | |
| "step": 409 | |
| }, | |
| { | |
| "epoch": 2.398826979472141, | |
| "grad_norm": 1.5440916647504026, | |
| "learning_rate": 3.595150500897065e-05, | |
| "loss": 0.215, | |
| "mean_token_accuracy": 0.9350427612662315, | |
| "step": 410 | |
| }, | |
| { | |
| "epoch": 2.404692082111437, | |
| "grad_norm": 1.27541434637295, | |
| "learning_rate": 3.5929811560569e-05, | |
| "loss": 0.1812, | |
| "mean_token_accuracy": 0.9499657303094864, | |
| "step": 411 | |
| }, | |
| { | |
| "epoch": 2.410557184750733, | |
| "grad_norm": 1.198812240028181, | |
| "learning_rate": 3.590806755257726e-05, | |
| "loss": 0.1519, | |
| "mean_token_accuracy": 0.9496930539608002, | |
| "step": 412 | |
| }, | |
| { | |
| "epoch": 2.4164222873900294, | |
| "grad_norm": 1.1424990901176593, | |
| "learning_rate": 3.5886273063917426e-05, | |
| "loss": 0.161, | |
| "mean_token_accuracy": 0.9473123177886009, | |
| "step": 413 | |
| }, | |
| { | |
| "epoch": 2.4222873900293256, | |
| "grad_norm": 1.4649749934528322, | |
| "learning_rate": 3.586442817369467e-05, | |
| "loss": 0.2064, | |
| "mean_token_accuracy": 0.9354598671197891, | |
| "step": 414 | |
| }, | |
| { | |
| "epoch": 2.4281524926686218, | |
| "grad_norm": 1.1474037121371046, | |
| "learning_rate": 3.5842532961197114e-05, | |
| "loss": 0.1511, | |
| "mean_token_accuracy": 0.9541265666484833, | |
| "step": 415 | |
| }, | |
| { | |
| "epoch": 2.434017595307918, | |
| "grad_norm": 1.4617493541935012, | |
| "learning_rate": 3.582058750589555e-05, | |
| "loss": 0.1999, | |
| "mean_token_accuracy": 0.9416180402040482, | |
| "step": 416 | |
| }, | |
| { | |
| "epoch": 2.439882697947214, | |
| "grad_norm": 1.8454858948924562, | |
| "learning_rate": 3.579859188744311e-05, | |
| "loss": 0.2585, | |
| "mean_token_accuracy": 0.918716236948967, | |
| "step": 417 | |
| }, | |
| { | |
| "epoch": 2.4457478005865103, | |
| "grad_norm": 1.5332521836361521, | |
| "learning_rate": 3.5776546185675014e-05, | |
| "loss": 0.2124, | |
| "mean_token_accuracy": 0.9358135014772415, | |
| "step": 418 | |
| }, | |
| { | |
| "epoch": 2.4516129032258065, | |
| "grad_norm": 1.4555437762777426, | |
| "learning_rate": 3.5754450480608244e-05, | |
| "loss": 0.2104, | |
| "mean_token_accuracy": 0.9368014857172966, | |
| "step": 419 | |
| }, | |
| { | |
| "epoch": 2.4574780058651027, | |
| "grad_norm": 1.278448596075258, | |
| "learning_rate": 3.5732304852441294e-05, | |
| "loss": 0.2069, | |
| "mean_token_accuracy": 0.9394045323133469, | |
| "step": 420 | |
| }, | |
| { | |
| "epoch": 2.463343108504399, | |
| "grad_norm": 1.781690690945349, | |
| "learning_rate": 3.571010938155386e-05, | |
| "loss": 0.2731, | |
| "mean_token_accuracy": 0.9193499833345413, | |
| "step": 421 | |
| }, | |
| { | |
| "epoch": 2.469208211143695, | |
| "grad_norm": 1.534397726266545, | |
| "learning_rate": 3.5687864148506515e-05, | |
| "loss": 0.2052, | |
| "mean_token_accuracy": 0.9325775653123856, | |
| "step": 422 | |
| }, | |
| { | |
| "epoch": 2.4750733137829912, | |
| "grad_norm": 1.2252911769053223, | |
| "learning_rate": 3.566556923404048e-05, | |
| "loss": 0.1613, | |
| "mean_token_accuracy": 0.9514370709657669, | |
| "step": 423 | |
| }, | |
| { | |
| "epoch": 2.4809384164222874, | |
| "grad_norm": 1.0729606974866042, | |
| "learning_rate": 3.5643224719077294e-05, | |
| "loss": 0.1496, | |
| "mean_token_accuracy": 0.9551745429635048, | |
| "step": 424 | |
| }, | |
| { | |
| "epoch": 2.4868035190615836, | |
| "grad_norm": 1.5221636427012317, | |
| "learning_rate": 3.5620830684718515e-05, | |
| "loss": 0.1774, | |
| "mean_token_accuracy": 0.9464867487549782, | |
| "step": 425 | |
| }, | |
| { | |
| "epoch": 2.4926686217008798, | |
| "grad_norm": 1.2041644713697683, | |
| "learning_rate": 3.5598387212245456e-05, | |
| "loss": 0.1726, | |
| "mean_token_accuracy": 0.9489129334688187, | |
| "step": 426 | |
| }, | |
| { | |
| "epoch": 2.498533724340176, | |
| "grad_norm": 1.3409374082416965, | |
| "learning_rate": 3.5575894383118846e-05, | |
| "loss": 0.1953, | |
| "mean_token_accuracy": 0.9453120976686478, | |
| "step": 427 | |
| }, | |
| { | |
| "epoch": 2.504398826979472, | |
| "grad_norm": 1.2648011317403038, | |
| "learning_rate": 3.5553352278978574e-05, | |
| "loss": 0.1652, | |
| "mean_token_accuracy": 0.9458431974053383, | |
| "step": 428 | |
| }, | |
| { | |
| "epoch": 2.5102639296187683, | |
| "grad_norm": 1.2941002110451578, | |
| "learning_rate": 3.553076098164337e-05, | |
| "loss": 0.1646, | |
| "mean_token_accuracy": 0.9523252993822098, | |
| "step": 429 | |
| }, | |
| { | |
| "epoch": 2.5161290322580645, | |
| "grad_norm": 1.2879718792056065, | |
| "learning_rate": 3.5508120573110516e-05, | |
| "loss": 0.1896, | |
| "mean_token_accuracy": 0.9426534026861191, | |
| "step": 430 | |
| }, | |
| { | |
| "epoch": 2.5219941348973607, | |
| "grad_norm": 0.9901388180640722, | |
| "learning_rate": 3.548543113555557e-05, | |
| "loss": 0.1381, | |
| "mean_token_accuracy": 0.9605349376797676, | |
| "step": 431 | |
| }, | |
| { | |
| "epoch": 2.527859237536657, | |
| "grad_norm": 1.2535684304346122, | |
| "learning_rate": 3.5462692751332014e-05, | |
| "loss": 0.1853, | |
| "mean_token_accuracy": 0.9452183693647385, | |
| "step": 432 | |
| }, | |
| { | |
| "epoch": 2.533724340175953, | |
| "grad_norm": 1.019277872771832, | |
| "learning_rate": 3.5439905502970996e-05, | |
| "loss": 0.1145, | |
| "mean_token_accuracy": 0.960605651140213, | |
| "step": 433 | |
| }, | |
| { | |
| "epoch": 2.5395894428152492, | |
| "grad_norm": 1.183308994953806, | |
| "learning_rate": 3.541706947318103e-05, | |
| "loss": 0.1659, | |
| "mean_token_accuracy": 0.9463500380516052, | |
| "step": 434 | |
| }, | |
| { | |
| "epoch": 2.5454545454545454, | |
| "grad_norm": 1.7501106760732823, | |
| "learning_rate": 3.539418474484768e-05, | |
| "loss": 0.2173, | |
| "mean_token_accuracy": 0.933612234890461, | |
| "step": 435 | |
| }, | |
| { | |
| "epoch": 2.5513196480938416, | |
| "grad_norm": 1.3104294276861268, | |
| "learning_rate": 3.537125140103327e-05, | |
| "loss": 0.1884, | |
| "mean_token_accuracy": 0.9485413134098053, | |
| "step": 436 | |
| }, | |
| { | |
| "epoch": 2.557184750733138, | |
| "grad_norm": 1.2344802500119778, | |
| "learning_rate": 3.534826952497657e-05, | |
| "loss": 0.154, | |
| "mean_token_accuracy": 0.954584963619709, | |
| "step": 437 | |
| }, | |
| { | |
| "epoch": 2.563049853372434, | |
| "grad_norm": 1.2375074371168866, | |
| "learning_rate": 3.5325239200092505e-05, | |
| "loss": 0.1798, | |
| "mean_token_accuracy": 0.9420296102762222, | |
| "step": 438 | |
| }, | |
| { | |
| "epoch": 2.56891495601173, | |
| "grad_norm": 1.3498402815628565, | |
| "learning_rate": 3.5302160509971866e-05, | |
| "loss": 0.1931, | |
| "mean_token_accuracy": 0.9404741451144218, | |
| "step": 439 | |
| }, | |
| { | |
| "epoch": 2.5747800586510263, | |
| "grad_norm": 1.5153339396078678, | |
| "learning_rate": 3.5279033538380974e-05, | |
| "loss": 0.1971, | |
| "mean_token_accuracy": 0.9389586672186852, | |
| "step": 440 | |
| }, | |
| { | |
| "epoch": 2.5806451612903225, | |
| "grad_norm": 1.0927821569226148, | |
| "learning_rate": 3.5255858369261385e-05, | |
| "loss": 0.1293, | |
| "mean_token_accuracy": 0.959166519343853, | |
| "step": 441 | |
| }, | |
| { | |
| "epoch": 2.5865102639296187, | |
| "grad_norm": 1.573165687770213, | |
| "learning_rate": 3.523263508672961e-05, | |
| "loss": 0.227, | |
| "mean_token_accuracy": 0.9359922930598259, | |
| "step": 442 | |
| }, | |
| { | |
| "epoch": 2.592375366568915, | |
| "grad_norm": 1.31329236234176, | |
| "learning_rate": 3.520936377507679e-05, | |
| "loss": 0.1726, | |
| "mean_token_accuracy": 0.9415776506066322, | |
| "step": 443 | |
| }, | |
| { | |
| "epoch": 2.598240469208211, | |
| "grad_norm": 1.4360330933961905, | |
| "learning_rate": 3.5186044518768376e-05, | |
| "loss": 0.2235, | |
| "mean_token_accuracy": 0.9257130026817322, | |
| "step": 444 | |
| }, | |
| { | |
| "epoch": 2.6041055718475072, | |
| "grad_norm": 1.3874250860276218, | |
| "learning_rate": 3.5162677402443864e-05, | |
| "loss": 0.1962, | |
| "mean_token_accuracy": 0.94052804261446, | |
| "step": 445 | |
| }, | |
| { | |
| "epoch": 2.6099706744868034, | |
| "grad_norm": 1.3663549034269482, | |
| "learning_rate": 3.513926251091644e-05, | |
| "loss": 0.194, | |
| "mean_token_accuracy": 0.9420884102582932, | |
| "step": 446 | |
| }, | |
| { | |
| "epoch": 2.6158357771260996, | |
| "grad_norm": 1.2355875753890644, | |
| "learning_rate": 3.51157999291727e-05, | |
| "loss": 0.1883, | |
| "mean_token_accuracy": 0.9405950680375099, | |
| "step": 447 | |
| }, | |
| { | |
| "epoch": 2.621700879765396, | |
| "grad_norm": 1.4472757686119782, | |
| "learning_rate": 3.509228974237235e-05, | |
| "loss": 0.2317, | |
| "mean_token_accuracy": 0.930273026227951, | |
| "step": 448 | |
| }, | |
| { | |
| "epoch": 2.627565982404692, | |
| "grad_norm": 1.3499911027831195, | |
| "learning_rate": 3.506873203584787e-05, | |
| "loss": 0.1987, | |
| "mean_token_accuracy": 0.9421002045273781, | |
| "step": 449 | |
| }, | |
| { | |
| "epoch": 2.633431085043988, | |
| "grad_norm": 0.9022856812434747, | |
| "learning_rate": 3.504512689510422e-05, | |
| "loss": 0.1278, | |
| "mean_token_accuracy": 0.9621746689081192, | |
| "step": 450 | |
| }, | |
| { | |
| "epoch": 2.6392961876832843, | |
| "grad_norm": 1.270860877585178, | |
| "learning_rate": 3.5021474405818525e-05, | |
| "loss": 0.1615, | |
| "mean_token_accuracy": 0.9504728391766548, | |
| "step": 451 | |
| }, | |
| { | |
| "epoch": 2.6451612903225805, | |
| "grad_norm": 1.3903665628243658, | |
| "learning_rate": 3.499777465383977e-05, | |
| "loss": 0.2059, | |
| "mean_token_accuracy": 0.9414464309811592, | |
| "step": 452 | |
| }, | |
| { | |
| "epoch": 2.6510263929618767, | |
| "grad_norm": 1.5121941542783566, | |
| "learning_rate": 3.497402772518848e-05, | |
| "loss": 0.2191, | |
| "mean_token_accuracy": 0.930144228041172, | |
| "step": 453 | |
| }, | |
| { | |
| "epoch": 2.656891495601173, | |
| "grad_norm": 1.2291299776343065, | |
| "learning_rate": 3.4950233706056415e-05, | |
| "loss": 0.1609, | |
| "mean_token_accuracy": 0.9529666900634766, | |
| "step": 454 | |
| }, | |
| { | |
| "epoch": 2.662756598240469, | |
| "grad_norm": 1.5178328136382258, | |
| "learning_rate": 3.4926392682806265e-05, | |
| "loss": 0.2178, | |
| "mean_token_accuracy": 0.9358916729688644, | |
| "step": 455 | |
| }, | |
| { | |
| "epoch": 2.6686217008797652, | |
| "grad_norm": 1.503800251837255, | |
| "learning_rate": 3.490250474197131e-05, | |
| "loss": 0.2129, | |
| "mean_token_accuracy": 0.9409657120704651, | |
| "step": 456 | |
| }, | |
| { | |
| "epoch": 2.6744868035190614, | |
| "grad_norm": 1.1699037972947557, | |
| "learning_rate": 3.4878569970255116e-05, | |
| "loss": 0.1664, | |
| "mean_token_accuracy": 0.9472703188657761, | |
| "step": 457 | |
| }, | |
| { | |
| "epoch": 2.6803519061583576, | |
| "grad_norm": 1.3730176544696786, | |
| "learning_rate": 3.485458845453125e-05, | |
| "loss": 0.1938, | |
| "mean_token_accuracy": 0.943206362426281, | |
| "step": 458 | |
| }, | |
| { | |
| "epoch": 2.686217008797654, | |
| "grad_norm": 1.142902961416544, | |
| "learning_rate": 3.483056028184293e-05, | |
| "loss": 0.1483, | |
| "mean_token_accuracy": 0.9592099040746689, | |
| "step": 459 | |
| }, | |
| { | |
| "epoch": 2.6920821114369504, | |
| "grad_norm": 1.4858353520299903, | |
| "learning_rate": 3.4806485539402716e-05, | |
| "loss": 0.2035, | |
| "mean_token_accuracy": 0.9390188604593277, | |
| "step": 460 | |
| }, | |
| { | |
| "epoch": 2.6979472140762466, | |
| "grad_norm": 1.094863918269323, | |
| "learning_rate": 3.4782364314592186e-05, | |
| "loss": 0.142, | |
| "mean_token_accuracy": 0.9560906514525414, | |
| "step": 461 | |
| }, | |
| { | |
| "epoch": 2.703812316715543, | |
| "grad_norm": 1.350848338520518, | |
| "learning_rate": 3.475819669496167e-05, | |
| "loss": 0.1668, | |
| "mean_token_accuracy": 0.942637600004673, | |
| "step": 462 | |
| }, | |
| { | |
| "epoch": 2.709677419354839, | |
| "grad_norm": 1.2655106986574725, | |
| "learning_rate": 3.473398276822985e-05, | |
| "loss": 0.1739, | |
| "mean_token_accuracy": 0.9467576667666435, | |
| "step": 463 | |
| }, | |
| { | |
| "epoch": 2.715542521994135, | |
| "grad_norm": 1.6239688129886054, | |
| "learning_rate": 3.47097226222835e-05, | |
| "loss": 0.2034, | |
| "mean_token_accuracy": 0.9414631873369217, | |
| "step": 464 | |
| }, | |
| { | |
| "epoch": 2.7214076246334313, | |
| "grad_norm": 1.3207640172284705, | |
| "learning_rate": 3.468541634517716e-05, | |
| "loss": 0.1682, | |
| "mean_token_accuracy": 0.9518086910247803, | |
| "step": 465 | |
| }, | |
| { | |
| "epoch": 2.7272727272727275, | |
| "grad_norm": 1.0812855439860434, | |
| "learning_rate": 3.4661064025132796e-05, | |
| "loss": 0.1393, | |
| "mean_token_accuracy": 0.9551347717642784, | |
| "step": 466 | |
| }, | |
| { | |
| "epoch": 2.7331378299120237, | |
| "grad_norm": 1.669554956385613, | |
| "learning_rate": 3.463666575053949e-05, | |
| "loss": 0.2302, | |
| "mean_token_accuracy": 0.9358491003513336, | |
| "step": 467 | |
| }, | |
| { | |
| "epoch": 2.73900293255132, | |
| "grad_norm": 1.2117870250039342, | |
| "learning_rate": 3.4612221609953126e-05, | |
| "loss": 0.152, | |
| "mean_token_accuracy": 0.9533187001943588, | |
| "step": 468 | |
| }, | |
| { | |
| "epoch": 2.744868035190616, | |
| "grad_norm": 1.4016919628458295, | |
| "learning_rate": 3.4587731692096065e-05, | |
| "loss": 0.1832, | |
| "mean_token_accuracy": 0.9502892270684242, | |
| "step": 469 | |
| }, | |
| { | |
| "epoch": 2.7507331378299122, | |
| "grad_norm": 1.4977960432623156, | |
| "learning_rate": 3.4563196085856815e-05, | |
| "loss": 0.1984, | |
| "mean_token_accuracy": 0.9398622885346413, | |
| "step": 470 | |
| }, | |
| { | |
| "epoch": 2.7565982404692084, | |
| "grad_norm": 1.2989435104540572, | |
| "learning_rate": 3.4538614880289724e-05, | |
| "loss": 0.1833, | |
| "mean_token_accuracy": 0.9461094737052917, | |
| "step": 471 | |
| }, | |
| { | |
| "epoch": 2.7624633431085046, | |
| "grad_norm": 1.2689972508312564, | |
| "learning_rate": 3.4513988164614635e-05, | |
| "loss": 0.1604, | |
| "mean_token_accuracy": 0.955364815890789, | |
| "step": 472 | |
| }, | |
| { | |
| "epoch": 2.768328445747801, | |
| "grad_norm": 1.213681034144693, | |
| "learning_rate": 3.4489316028216584e-05, | |
| "loss": 0.1532, | |
| "mean_token_accuracy": 0.9536414071917534, | |
| "step": 473 | |
| }, | |
| { | |
| "epoch": 2.774193548387097, | |
| "grad_norm": 1.1162415342567564, | |
| "learning_rate": 3.446459856064545e-05, | |
| "loss": 0.1442, | |
| "mean_token_accuracy": 0.9580774754285812, | |
| "step": 474 | |
| }, | |
| { | |
| "epoch": 2.780058651026393, | |
| "grad_norm": 1.562832189465237, | |
| "learning_rate": 3.443983585161568e-05, | |
| "loss": 0.2024, | |
| "mean_token_accuracy": 0.9375625848770142, | |
| "step": 475 | |
| }, | |
| { | |
| "epoch": 2.7859237536656893, | |
| "grad_norm": 1.3440963011489055, | |
| "learning_rate": 3.441502799100588e-05, | |
| "loss": 0.168, | |
| "mean_token_accuracy": 0.9529064446687698, | |
| "step": 476 | |
| }, | |
| { | |
| "epoch": 2.7917888563049855, | |
| "grad_norm": 1.3875913315793824, | |
| "learning_rate": 3.439017506885858e-05, | |
| "loss": 0.1833, | |
| "mean_token_accuracy": 0.9444696083664894, | |
| "step": 477 | |
| }, | |
| { | |
| "epoch": 2.7976539589442817, | |
| "grad_norm": 1.3437212381186996, | |
| "learning_rate": 3.436527717537985e-05, | |
| "loss": 0.1814, | |
| "mean_token_accuracy": 0.9524684324860573, | |
| "step": 478 | |
| }, | |
| { | |
| "epoch": 2.803519061583578, | |
| "grad_norm": 1.13616471103987, | |
| "learning_rate": 3.434033440093899e-05, | |
| "loss": 0.1767, | |
| "mean_token_accuracy": 0.9441180974245071, | |
| "step": 479 | |
| }, | |
| { | |
| "epoch": 2.809384164222874, | |
| "grad_norm": 1.5699460979346578, | |
| "learning_rate": 3.431534683606818e-05, | |
| "loss": 0.2116, | |
| "mean_token_accuracy": 0.9400829449295998, | |
| "step": 480 | |
| }, | |
| { | |
| "epoch": 2.8152492668621703, | |
| "grad_norm": 1.2890357354824495, | |
| "learning_rate": 3.4290314571462214e-05, | |
| "loss": 0.1749, | |
| "mean_token_accuracy": 0.9491599574685097, | |
| "step": 481 | |
| }, | |
| { | |
| "epoch": 2.8211143695014664, | |
| "grad_norm": 1.180927092541653, | |
| "learning_rate": 3.426523769797808e-05, | |
| "loss": 0.1524, | |
| "mean_token_accuracy": 0.951946809887886, | |
| "step": 482 | |
| }, | |
| { | |
| "epoch": 2.8269794721407626, | |
| "grad_norm": 2.0281247027871636, | |
| "learning_rate": 3.424011630663472e-05, | |
| "loss": 0.1849, | |
| "mean_token_accuracy": 0.9392823651432991, | |
| "step": 483 | |
| }, | |
| { | |
| "epoch": 2.832844574780059, | |
| "grad_norm": 1.445272870309301, | |
| "learning_rate": 3.421495048861262e-05, | |
| "loss": 0.2067, | |
| "mean_token_accuracy": 0.9413087740540504, | |
| "step": 484 | |
| }, | |
| { | |
| "epoch": 2.838709677419355, | |
| "grad_norm": 1.207685907104084, | |
| "learning_rate": 3.418974033525355e-05, | |
| "loss": 0.155, | |
| "mean_token_accuracy": 0.953525498509407, | |
| "step": 485 | |
| }, | |
| { | |
| "epoch": 2.844574780058651, | |
| "grad_norm": 1.423801110337816, | |
| "learning_rate": 3.416448593806019e-05, | |
| "loss": 0.2128, | |
| "mean_token_accuracy": 0.9368026629090309, | |
| "step": 486 | |
| }, | |
| { | |
| "epoch": 2.8504398826979473, | |
| "grad_norm": 1.3447638850919599, | |
| "learning_rate": 3.4139187388695774e-05, | |
| "loss": 0.1898, | |
| "mean_token_accuracy": 0.9403852596879005, | |
| "step": 487 | |
| }, | |
| { | |
| "epoch": 2.8563049853372435, | |
| "grad_norm": 1.7242878324194963, | |
| "learning_rate": 3.411384477898385e-05, | |
| "loss": 0.1922, | |
| "mean_token_accuracy": 0.9478732720017433, | |
| "step": 488 | |
| }, | |
| { | |
| "epoch": 2.8621700879765397, | |
| "grad_norm": 1.213083493046893, | |
| "learning_rate": 3.408845820090784e-05, | |
| "loss": 0.1655, | |
| "mean_token_accuracy": 0.9495300799608231, | |
| "step": 489 | |
| }, | |
| { | |
| "epoch": 2.868035190615836, | |
| "grad_norm": 1.3410093896289859, | |
| "learning_rate": 3.406302774661077e-05, | |
| "loss": 0.2188, | |
| "mean_token_accuracy": 0.9278420805931091, | |
| "step": 490 | |
| }, | |
| { | |
| "epoch": 2.873900293255132, | |
| "grad_norm": 1.4983997591436693, | |
| "learning_rate": 3.403755350839492e-05, | |
| "loss": 0.2239, | |
| "mean_token_accuracy": 0.9401474595069885, | |
| "step": 491 | |
| }, | |
| { | |
| "epoch": 2.8797653958944283, | |
| "grad_norm": 1.0320683861366073, | |
| "learning_rate": 3.401203557872149e-05, | |
| "loss": 0.1208, | |
| "mean_token_accuracy": 0.9612024202942848, | |
| "step": 492 | |
| }, | |
| { | |
| "epoch": 2.8856304985337244, | |
| "grad_norm": 1.17865515123761, | |
| "learning_rate": 3.398647405021026e-05, | |
| "loss": 0.1716, | |
| "mean_token_accuracy": 0.9516402930021286, | |
| "step": 493 | |
| }, | |
| { | |
| "epoch": 2.8914956011730206, | |
| "grad_norm": 1.4957974553498135, | |
| "learning_rate": 3.396086901563925e-05, | |
| "loss": 0.2196, | |
| "mean_token_accuracy": 0.9332915842533112, | |
| "step": 494 | |
| }, | |
| { | |
| "epoch": 2.897360703812317, | |
| "grad_norm": 1.1314739606086441, | |
| "learning_rate": 3.3935220567944395e-05, | |
| "loss": 0.1607, | |
| "mean_token_accuracy": 0.9516676366329193, | |
| "step": 495 | |
| }, | |
| { | |
| "epoch": 2.903225806451613, | |
| "grad_norm": 1.3198388233629983, | |
| "learning_rate": 3.39095288002192e-05, | |
| "loss": 0.1988, | |
| "mean_token_accuracy": 0.9403504058718681, | |
| "step": 496 | |
| }, | |
| { | |
| "epoch": 2.909090909090909, | |
| "grad_norm": 1.1729074523420793, | |
| "learning_rate": 3.3883793805714406e-05, | |
| "loss": 0.1679, | |
| "mean_token_accuracy": 0.94942307472229, | |
| "step": 497 | |
| }, | |
| { | |
| "epoch": 2.9149560117302054, | |
| "grad_norm": 1.5546458022132672, | |
| "learning_rate": 3.3858015677837656e-05, | |
| "loss": 0.206, | |
| "mean_token_accuracy": 0.9396896436810493, | |
| "step": 498 | |
| }, | |
| { | |
| "epoch": 2.9208211143695015, | |
| "grad_norm": 1.3193613560349773, | |
| "learning_rate": 3.3832194510153126e-05, | |
| "loss": 0.2074, | |
| "mean_token_accuracy": 0.9424831047654152, | |
| "step": 499 | |
| }, | |
| { | |
| "epoch": 2.9266862170087977, | |
| "grad_norm": 1.5103842145967505, | |
| "learning_rate": 3.380633039638125e-05, | |
| "loss": 0.1967, | |
| "mean_token_accuracy": 0.9491210356354713, | |
| "step": 500 | |
| }, | |
| { | |
| "epoch": 2.932551319648094, | |
| "grad_norm": 1.3899787104350685, | |
| "learning_rate": 3.37804234303983e-05, | |
| "loss": 0.1853, | |
| "mean_token_accuracy": 0.9460192769765854, | |
| "step": 501 | |
| }, | |
| { | |
| "epoch": 2.93841642228739, | |
| "grad_norm": 1.6042983579274848, | |
| "learning_rate": 3.37544737062361e-05, | |
| "loss": 0.2159, | |
| "mean_token_accuracy": 0.9384797960519791, | |
| "step": 502 | |
| }, | |
| { | |
| "epoch": 2.9442815249266863, | |
| "grad_norm": 1.1857829239019415, | |
| "learning_rate": 3.372848131808167e-05, | |
| "loss": 0.1663, | |
| "mean_token_accuracy": 0.9543160423636436, | |
| "step": 503 | |
| }, | |
| { | |
| "epoch": 2.9501466275659824, | |
| "grad_norm": 1.2895444159726623, | |
| "learning_rate": 3.370244636027688e-05, | |
| "loss": 0.1724, | |
| "mean_token_accuracy": 0.9467265158891678, | |
| "step": 504 | |
| }, | |
| { | |
| "epoch": 2.9560117302052786, | |
| "grad_norm": 1.1995300770374142, | |
| "learning_rate": 3.367636892731812e-05, | |
| "loss": 0.1627, | |
| "mean_token_accuracy": 0.9443233907222748, | |
| "step": 505 | |
| }, | |
| { | |
| "epoch": 2.961876832844575, | |
| "grad_norm": 1.3286742396306415, | |
| "learning_rate": 3.365024911385593e-05, | |
| "loss": 0.1605, | |
| "mean_token_accuracy": 0.9559065476059914, | |
| "step": 506 | |
| }, | |
| { | |
| "epoch": 2.967741935483871, | |
| "grad_norm": 1.2272292303747936, | |
| "learning_rate": 3.362408701469469e-05, | |
| "loss": 0.1735, | |
| "mean_token_accuracy": 0.9431959837675095, | |
| "step": 507 | |
| }, | |
| { | |
| "epoch": 2.973607038123167, | |
| "grad_norm": 1.3932587517490482, | |
| "learning_rate": 3.359788272479225e-05, | |
| "loss": 0.1973, | |
| "mean_token_accuracy": 0.9413277730345726, | |
| "step": 508 | |
| }, | |
| { | |
| "epoch": 2.9794721407624634, | |
| "grad_norm": 1.2786049269699822, | |
| "learning_rate": 3.35716363392596e-05, | |
| "loss": 0.1816, | |
| "mean_token_accuracy": 0.9395741447806358, | |
| "step": 509 | |
| }, | |
| { | |
| "epoch": 2.9853372434017595, | |
| "grad_norm": 1.6116723531914925, | |
| "learning_rate": 3.354534795336052e-05, | |
| "loss": 0.2391, | |
| "mean_token_accuracy": 0.9272446259856224, | |
| "step": 510 | |
| }, | |
| { | |
| "epoch": 2.9912023460410557, | |
| "grad_norm": 1.2075082144961329, | |
| "learning_rate": 3.351901766251123e-05, | |
| "loss": 0.1909, | |
| "mean_token_accuracy": 0.9406610727310181, | |
| "step": 511 | |
| }, | |
| { | |
| "epoch": 2.997067448680352, | |
| "grad_norm": 1.7026589757148236, | |
| "learning_rate": 3.349264556228006e-05, | |
| "loss": 0.2441, | |
| "mean_token_accuracy": 0.9360670745372772, | |
| "step": 512 | |
| }, | |
| { | |
| "epoch": 3.0, | |
| "grad_norm": 1.7026589757148236, | |
| "learning_rate": 3.3466231748387077e-05, | |
| "loss": 0.2243, | |
| "mean_token_accuracy": 0.9212906956672668, | |
| "step": 513 | |
| }, | |
| { | |
| "epoch": 3.005865102639296, | |
| "grad_norm": 1.9172525119833599, | |
| "learning_rate": 3.343977631670376e-05, | |
| "loss": 0.1102, | |
| "mean_token_accuracy": 0.9692064374685287, | |
| "step": 514 | |
| }, | |
| { | |
| "epoch": 3.0117302052785924, | |
| "grad_norm": 1.1847851063603438, | |
| "learning_rate": 3.341327936325264e-05, | |
| "loss": 0.1318, | |
| "mean_token_accuracy": 0.9639127478003502, | |
| "step": 515 | |
| }, | |
| { | |
| "epoch": 3.0175953079178885, | |
| "grad_norm": 0.9061334284550635, | |
| "learning_rate": 3.338674098420695e-05, | |
| "loss": 0.1028, | |
| "mean_token_accuracy": 0.9676964953541756, | |
| "step": 516 | |
| }, | |
| { | |
| "epoch": 3.0234604105571847, | |
| "grad_norm": 0.9102322647938016, | |
| "learning_rate": 3.33601612758903e-05, | |
| "loss": 0.1144, | |
| "mean_token_accuracy": 0.9649685248732567, | |
| "step": 517 | |
| }, | |
| { | |
| "epoch": 3.029325513196481, | |
| "grad_norm": 0.9042900152145036, | |
| "learning_rate": 3.3333540334776286e-05, | |
| "loss": 0.132, | |
| "mean_token_accuracy": 0.9574764594435692, | |
| "step": 518 | |
| }, | |
| { | |
| "epoch": 3.035190615835777, | |
| "grad_norm": 1.1385844279042818, | |
| "learning_rate": 3.330687825748818e-05, | |
| "loss": 0.1194, | |
| "mean_token_accuracy": 0.9642714560031891, | |
| "step": 519 | |
| }, | |
| { | |
| "epoch": 3.0410557184750733, | |
| "grad_norm": 1.0653366992695055, | |
| "learning_rate": 3.328017514079855e-05, | |
| "loss": 0.1209, | |
| "mean_token_accuracy": 0.9662440121173859, | |
| "step": 520 | |
| }, | |
| { | |
| "epoch": 3.0469208211143695, | |
| "grad_norm": 0.9944121868967081, | |
| "learning_rate": 3.325343108162893e-05, | |
| "loss": 0.1072, | |
| "mean_token_accuracy": 0.9639370143413544, | |
| "step": 521 | |
| }, | |
| { | |
| "epoch": 3.0527859237536656, | |
| "grad_norm": 1.1494314839298039, | |
| "learning_rate": 3.3226646177049446e-05, | |
| "loss": 0.131, | |
| "mean_token_accuracy": 0.9620062932372093, | |
| "step": 522 | |
| }, | |
| { | |
| "epoch": 3.058651026392962, | |
| "grad_norm": 1.1758302687993867, | |
| "learning_rate": 3.3199820524278485e-05, | |
| "loss": 0.1349, | |
| "mean_token_accuracy": 0.9607839807868004, | |
| "step": 523 | |
| }, | |
| { | |
| "epoch": 3.064516129032258, | |
| "grad_norm": 1.2242461859748268, | |
| "learning_rate": 3.317295422068234e-05, | |
| "loss": 0.1269, | |
| "mean_token_accuracy": 0.96031753718853, | |
| "step": 524 | |
| }, | |
| { | |
| "epoch": 3.070381231671554, | |
| "grad_norm": 1.0331010016728317, | |
| "learning_rate": 3.314604736377484e-05, | |
| "loss": 0.0969, | |
| "mean_token_accuracy": 0.9681897535920143, | |
| "step": 525 | |
| }, | |
| { | |
| "epoch": 3.0762463343108504, | |
| "grad_norm": 0.7781497219146758, | |
| "learning_rate": 3.3119100051217005e-05, | |
| "loss": 0.1023, | |
| "mean_token_accuracy": 0.9731072410941124, | |
| "step": 526 | |
| }, | |
| { | |
| "epoch": 3.0821114369501466, | |
| "grad_norm": 1.2166577904207496, | |
| "learning_rate": 3.3092112380816696e-05, | |
| "loss": 0.1324, | |
| "mean_token_accuracy": 0.962135910987854, | |
| "step": 527 | |
| }, | |
| { | |
| "epoch": 3.0879765395894427, | |
| "grad_norm": 1.0974981165151148, | |
| "learning_rate": 3.306508445052826e-05, | |
| "loss": 0.1425, | |
| "mean_token_accuracy": 0.9630181938409805, | |
| "step": 528 | |
| }, | |
| { | |
| "epoch": 3.093841642228739, | |
| "grad_norm": 1.4099756052774848, | |
| "learning_rate": 3.303801635845216e-05, | |
| "loss": 0.1214, | |
| "mean_token_accuracy": 0.9584023430943489, | |
| "step": 529 | |
| }, | |
| { | |
| "epoch": 3.099706744868035, | |
| "grad_norm": 1.201123082840797, | |
| "learning_rate": 3.301090820283465e-05, | |
| "loss": 0.136, | |
| "mean_token_accuracy": 0.9598643630743027, | |
| "step": 530 | |
| }, | |
| { | |
| "epoch": 3.1055718475073313, | |
| "grad_norm": 1.176553540140124, | |
| "learning_rate": 3.298376008206739e-05, | |
| "loss": 0.1157, | |
| "mean_token_accuracy": 0.9649958238005638, | |
| "step": 531 | |
| }, | |
| { | |
| "epoch": 3.1114369501466275, | |
| "grad_norm": 0.9279228524428683, | |
| "learning_rate": 3.295657209468707e-05, | |
| "loss": 0.1091, | |
| "mean_token_accuracy": 0.9710751473903656, | |
| "step": 532 | |
| }, | |
| { | |
| "epoch": 3.1173020527859236, | |
| "grad_norm": 1.0571051553285986, | |
| "learning_rate": 3.2929344339375125e-05, | |
| "loss": 0.1245, | |
| "mean_token_accuracy": 0.9605824053287506, | |
| "step": 533 | |
| }, | |
| { | |
| "epoch": 3.12316715542522, | |
| "grad_norm": 1.1404392855763104, | |
| "learning_rate": 3.290207691495731e-05, | |
| "loss": 0.1486, | |
| "mean_token_accuracy": 0.9611983299255371, | |
| "step": 534 | |
| }, | |
| { | |
| "epoch": 3.129032258064516, | |
| "grad_norm": 1.1108140744094672, | |
| "learning_rate": 3.2874769920403355e-05, | |
| "loss": 0.1309, | |
| "mean_token_accuracy": 0.9562818929553032, | |
| "step": 535 | |
| }, | |
| { | |
| "epoch": 3.134897360703812, | |
| "grad_norm": 1.0394406562184513, | |
| "learning_rate": 3.2847423454826616e-05, | |
| "loss": 0.1072, | |
| "mean_token_accuracy": 0.9680725708603859, | |
| "step": 536 | |
| }, | |
| { | |
| "epoch": 3.1407624633431084, | |
| "grad_norm": 0.9161537395043307, | |
| "learning_rate": 3.2820037617483734e-05, | |
| "loss": 0.141, | |
| "mean_token_accuracy": 0.9604132622480392, | |
| "step": 537 | |
| }, | |
| { | |
| "epoch": 3.1466275659824046, | |
| "grad_norm": 1.3707774590187438, | |
| "learning_rate": 3.2792612507774224e-05, | |
| "loss": 0.1339, | |
| "mean_token_accuracy": 0.9593137428164482, | |
| "step": 538 | |
| }, | |
| { | |
| "epoch": 3.1524926686217007, | |
| "grad_norm": 1.012147704956255, | |
| "learning_rate": 3.2765148225240176e-05, | |
| "loss": 0.1246, | |
| "mean_token_accuracy": 0.9646853432059288, | |
| "step": 539 | |
| }, | |
| { | |
| "epoch": 3.158357771260997, | |
| "grad_norm": 1.0725542099127343, | |
| "learning_rate": 3.273764486956583e-05, | |
| "loss": 0.135, | |
| "mean_token_accuracy": 0.96214210242033, | |
| "step": 540 | |
| }, | |
| { | |
| "epoch": 3.164222873900293, | |
| "grad_norm": 1.12481951852536, | |
| "learning_rate": 3.2710102540577256e-05, | |
| "loss": 0.1182, | |
| "mean_token_accuracy": 0.9636344686150551, | |
| "step": 541 | |
| }, | |
| { | |
| "epoch": 3.1700879765395893, | |
| "grad_norm": 1.2737900941858715, | |
| "learning_rate": 3.268252133824198e-05, | |
| "loss": 0.1625, | |
| "mean_token_accuracy": 0.9540104120969772, | |
| "step": 542 | |
| }, | |
| { | |
| "epoch": 3.1759530791788855, | |
| "grad_norm": 1.212108019559075, | |
| "learning_rate": 3.2654901362668656e-05, | |
| "loss": 0.1258, | |
| "mean_token_accuracy": 0.9660171419382095, | |
| "step": 543 | |
| }, | |
| { | |
| "epoch": 3.1818181818181817, | |
| "grad_norm": 1.127815767716653, | |
| "learning_rate": 3.262724271410661e-05, | |
| "loss": 0.1404, | |
| "mean_token_accuracy": 0.9578510448336601, | |
| "step": 544 | |
| }, | |
| { | |
| "epoch": 3.187683284457478, | |
| "grad_norm": 1.2610603469789805, | |
| "learning_rate": 3.2599545492945584e-05, | |
| "loss": 0.1342, | |
| "mean_token_accuracy": 0.9645530432462692, | |
| "step": 545 | |
| }, | |
| { | |
| "epoch": 3.193548387096774, | |
| "grad_norm": 1.2319746864500574, | |
| "learning_rate": 3.257180979971529e-05, | |
| "loss": 0.1333, | |
| "mean_token_accuracy": 0.9600680395960808, | |
| "step": 546 | |
| }, | |
| { | |
| "epoch": 3.19941348973607, | |
| "grad_norm": 0.9570365722404466, | |
| "learning_rate": 3.25440357350851e-05, | |
| "loss": 0.1371, | |
| "mean_token_accuracy": 0.9588101580739021, | |
| "step": 547 | |
| }, | |
| { | |
| "epoch": 3.2052785923753664, | |
| "grad_norm": 1.1555892585291052, | |
| "learning_rate": 3.251622339986366e-05, | |
| "loss": 0.133, | |
| "mean_token_accuracy": 0.9624381512403488, | |
| "step": 548 | |
| }, | |
| { | |
| "epoch": 3.2111436950146626, | |
| "grad_norm": 1.1220972442323853, | |
| "learning_rate": 3.24883728949985e-05, | |
| "loss": 0.1353, | |
| "mean_token_accuracy": 0.9607517868280411, | |
| "step": 549 | |
| }, | |
| { | |
| "epoch": 3.2170087976539588, | |
| "grad_norm": 1.1180353918248547, | |
| "learning_rate": 3.2460484321575714e-05, | |
| "loss": 0.1023, | |
| "mean_token_accuracy": 0.9660913869738579, | |
| "step": 550 | |
| }, | |
| { | |
| "epoch": 3.222873900293255, | |
| "grad_norm": 1.1272669000126658, | |
| "learning_rate": 3.2432557780819556e-05, | |
| "loss": 0.1249, | |
| "mean_token_accuracy": 0.9630195274949074, | |
| "step": 551 | |
| }, | |
| { | |
| "epoch": 3.228739002932551, | |
| "grad_norm": 1.2021106524091134, | |
| "learning_rate": 3.240459337409209e-05, | |
| "loss": 0.1232, | |
| "mean_token_accuracy": 0.9603384956717491, | |
| "step": 552 | |
| }, | |
| { | |
| "epoch": 3.2346041055718473, | |
| "grad_norm": 1.0242118203263009, | |
| "learning_rate": 3.237659120289282e-05, | |
| "loss": 0.1291, | |
| "mean_token_accuracy": 0.9633880257606506, | |
| "step": 553 | |
| }, | |
| { | |
| "epoch": 3.2404692082111435, | |
| "grad_norm": 1.3147750355772578, | |
| "learning_rate": 3.2348551368858315e-05, | |
| "loss": 0.1281, | |
| "mean_token_accuracy": 0.9652110934257507, | |
| "step": 554 | |
| }, | |
| { | |
| "epoch": 3.2463343108504397, | |
| "grad_norm": 1.0420733903333175, | |
| "learning_rate": 3.2320473973761845e-05, | |
| "loss": 0.1275, | |
| "mean_token_accuracy": 0.9632606133818626, | |
| "step": 555 | |
| }, | |
| { | |
| "epoch": 3.252199413489736, | |
| "grad_norm": 1.1681134194309897, | |
| "learning_rate": 3.229235911951303e-05, | |
| "loss": 0.1327, | |
| "mean_token_accuracy": 0.9661369696259499, | |
| "step": 556 | |
| }, | |
| { | |
| "epoch": 3.258064516129032, | |
| "grad_norm": 1.1385374917446593, | |
| "learning_rate": 3.2264206908157425e-05, | |
| "loss": 0.1064, | |
| "mean_token_accuracy": 0.9682695493102074, | |
| "step": 557 | |
| }, | |
| { | |
| "epoch": 3.263929618768328, | |
| "grad_norm": 1.2397464547533545, | |
| "learning_rate": 3.2236017441876185e-05, | |
| "loss": 0.1317, | |
| "mean_token_accuracy": 0.9587446004152298, | |
| "step": 558 | |
| }, | |
| { | |
| "epoch": 3.2697947214076244, | |
| "grad_norm": 1.2747233600312535, | |
| "learning_rate": 3.220779082298569e-05, | |
| "loss": 0.1422, | |
| "mean_token_accuracy": 0.9570489674806595, | |
| "step": 559 | |
| }, | |
| { | |
| "epoch": 3.2756598240469206, | |
| "grad_norm": 1.263930404888225, | |
| "learning_rate": 3.2179527153937165e-05, | |
| "loss": 0.1325, | |
| "mean_token_accuracy": 0.9581855908036232, | |
| "step": 560 | |
| }, | |
| { | |
| "epoch": 3.281524926686217, | |
| "grad_norm": 1.07405022031719, | |
| "learning_rate": 3.2151226537316315e-05, | |
| "loss": 0.1116, | |
| "mean_token_accuracy": 0.9662249088287354, | |
| "step": 561 | |
| }, | |
| { | |
| "epoch": 3.2873900293255134, | |
| "grad_norm": 1.012781657621151, | |
| "learning_rate": 3.212288907584296e-05, | |
| "loss": 0.1173, | |
| "mean_token_accuracy": 0.9624657481908798, | |
| "step": 562 | |
| }, | |
| { | |
| "epoch": 3.2932551319648096, | |
| "grad_norm": 1.1273423111917455, | |
| "learning_rate": 3.209451487237062e-05, | |
| "loss": 0.1558, | |
| "mean_token_accuracy": 0.9555483162403107, | |
| "step": 563 | |
| }, | |
| { | |
| "epoch": 3.2991202346041058, | |
| "grad_norm": 1.0772108863186236, | |
| "learning_rate": 3.206610402988621e-05, | |
| "loss": 0.1246, | |
| "mean_token_accuracy": 0.9665045291185379, | |
| "step": 564 | |
| }, | |
| { | |
| "epoch": 3.304985337243402, | |
| "grad_norm": 1.283465674599968, | |
| "learning_rate": 3.20376566515096e-05, | |
| "loss": 0.1484, | |
| "mean_token_accuracy": 0.9613608345389366, | |
| "step": 565 | |
| }, | |
| { | |
| "epoch": 3.310850439882698, | |
| "grad_norm": 3.083503730186153, | |
| "learning_rate": 3.20091728404933e-05, | |
| "loss": 0.1046, | |
| "mean_token_accuracy": 0.9690384045243263, | |
| "step": 566 | |
| }, | |
| { | |
| "epoch": 3.3167155425219943, | |
| "grad_norm": 0.9347036594770055, | |
| "learning_rate": 3.1980652700222024e-05, | |
| "loss": 0.128, | |
| "mean_token_accuracy": 0.9639571607112885, | |
| "step": 567 | |
| }, | |
| { | |
| "epoch": 3.3225806451612905, | |
| "grad_norm": 0.9910476896451094, | |
| "learning_rate": 3.195209633421237e-05, | |
| "loss": 0.1214, | |
| "mean_token_accuracy": 0.956595666706562, | |
| "step": 568 | |
| }, | |
| { | |
| "epoch": 3.3284457478005867, | |
| "grad_norm": 1.063945331096459, | |
| "learning_rate": 3.192350384611242e-05, | |
| "loss": 0.15, | |
| "mean_token_accuracy": 0.9561426416039467, | |
| "step": 569 | |
| }, | |
| { | |
| "epoch": 3.334310850439883, | |
| "grad_norm": 1.0615452381591053, | |
| "learning_rate": 3.1894875339701354e-05, | |
| "loss": 0.125, | |
| "mean_token_accuracy": 0.9663048535585403, | |
| "step": 570 | |
| }, | |
| { | |
| "epoch": 3.340175953079179, | |
| "grad_norm": 1.2629558448763267, | |
| "learning_rate": 3.186621091888909e-05, | |
| "loss": 0.1461, | |
| "mean_token_accuracy": 0.957501508295536, | |
| "step": 571 | |
| }, | |
| { | |
| "epoch": 3.346041055718475, | |
| "grad_norm": 1.0679814269736243, | |
| "learning_rate": 3.183751068771588e-05, | |
| "loss": 0.144, | |
| "mean_token_accuracy": 0.9582978636026382, | |
| "step": 572 | |
| }, | |
| { | |
| "epoch": 3.3519061583577714, | |
| "grad_norm": 1.0690636455835296, | |
| "learning_rate": 3.180877475035199e-05, | |
| "loss": 0.1204, | |
| "mean_token_accuracy": 0.9598318859934807, | |
| "step": 573 | |
| }, | |
| { | |
| "epoch": 3.3577712609970676, | |
| "grad_norm": 0.9369182682129731, | |
| "learning_rate": 3.178000321109727e-05, | |
| "loss": 0.138, | |
| "mean_token_accuracy": 0.961778499186039, | |
| "step": 574 | |
| }, | |
| { | |
| "epoch": 3.3636363636363638, | |
| "grad_norm": 1.1682551194640143, | |
| "learning_rate": 3.175119617438078e-05, | |
| "loss": 0.145, | |
| "mean_token_accuracy": 0.9567500725388527, | |
| "step": 575 | |
| }, | |
| { | |
| "epoch": 3.36950146627566, | |
| "grad_norm": 1.216383932443075, | |
| "learning_rate": 3.172235374476043e-05, | |
| "loss": 0.1237, | |
| "mean_token_accuracy": 0.961734727025032, | |
| "step": 576 | |
| }, | |
| { | |
| "epoch": 3.375366568914956, | |
| "grad_norm": 1.018664041230987, | |
| "learning_rate": 3.169347602692259e-05, | |
| "loss": 0.1442, | |
| "mean_token_accuracy": 0.9599500149488449, | |
| "step": 577 | |
| }, | |
| { | |
| "epoch": 3.3812316715542523, | |
| "grad_norm": 1.4739777903748597, | |
| "learning_rate": 3.166456312568171e-05, | |
| "loss": 0.1295, | |
| "mean_token_accuracy": 0.9598701372742653, | |
| "step": 578 | |
| }, | |
| { | |
| "epoch": 3.3870967741935485, | |
| "grad_norm": 1.1249165312933065, | |
| "learning_rate": 3.1635615145979955e-05, | |
| "loss": 0.1456, | |
| "mean_token_accuracy": 0.9606313407421112, | |
| "step": 579 | |
| }, | |
| { | |
| "epoch": 3.3929618768328447, | |
| "grad_norm": 1.0071746295455668, | |
| "learning_rate": 3.160663219288679e-05, | |
| "loss": 0.1192, | |
| "mean_token_accuracy": 0.9617794305086136, | |
| "step": 580 | |
| }, | |
| { | |
| "epoch": 3.398826979472141, | |
| "grad_norm": 1.0993797961538627, | |
| "learning_rate": 3.157761437159863e-05, | |
| "loss": 0.1422, | |
| "mean_token_accuracy": 0.9568778499960899, | |
| "step": 581 | |
| }, | |
| { | |
| "epoch": 3.404692082111437, | |
| "grad_norm": 1.1322199004325646, | |
| "learning_rate": 3.1548561787438445e-05, | |
| "loss": 0.1162, | |
| "mean_token_accuracy": 0.9666043370962143, | |
| "step": 582 | |
| }, | |
| { | |
| "epoch": 3.410557184750733, | |
| "grad_norm": 0.9257710808652309, | |
| "learning_rate": 3.15194745458554e-05, | |
| "loss": 0.1177, | |
| "mean_token_accuracy": 0.9651551991701126, | |
| "step": 583 | |
| }, | |
| { | |
| "epoch": 3.4164222873900294, | |
| "grad_norm": 0.8967042705762746, | |
| "learning_rate": 3.149035275242441e-05, | |
| "loss": 0.1056, | |
| "mean_token_accuracy": 0.9653290957212448, | |
| "step": 584 | |
| }, | |
| { | |
| "epoch": 3.4222873900293256, | |
| "grad_norm": 0.9946395497540884, | |
| "learning_rate": 3.1461196512845834e-05, | |
| "loss": 0.1422, | |
| "mean_token_accuracy": 0.9611693695187569, | |
| "step": 585 | |
| }, | |
| { | |
| "epoch": 3.4281524926686218, | |
| "grad_norm": 1.1659931848306362, | |
| "learning_rate": 3.143200593294504e-05, | |
| "loss": 0.1321, | |
| "mean_token_accuracy": 0.9641125351190567, | |
| "step": 586 | |
| }, | |
| { | |
| "epoch": 3.434017595307918, | |
| "grad_norm": 1.1333867615599893, | |
| "learning_rate": 3.1402781118672065e-05, | |
| "loss": 0.1339, | |
| "mean_token_accuracy": 0.9595314189791679, | |
| "step": 587 | |
| }, | |
| { | |
| "epoch": 3.439882697947214, | |
| "grad_norm": 0.9935811808065257, | |
| "learning_rate": 3.137352217610115e-05, | |
| "loss": 0.1355, | |
| "mean_token_accuracy": 0.959190770983696, | |
| "step": 588 | |
| }, | |
| { | |
| "epoch": 3.4457478005865103, | |
| "grad_norm": 1.3506457586579745, | |
| "learning_rate": 3.1344229211430465e-05, | |
| "loss": 0.1322, | |
| "mean_token_accuracy": 0.9644437432289124, | |
| "step": 589 | |
| }, | |
| { | |
| "epoch": 3.4516129032258065, | |
| "grad_norm": 0.8601614930196996, | |
| "learning_rate": 3.131490233098164e-05, | |
| "loss": 0.0998, | |
| "mean_token_accuracy": 0.9721217975020409, | |
| "step": 590 | |
| }, | |
| { | |
| "epoch": 3.4574780058651027, | |
| "grad_norm": 1.1429108686960525, | |
| "learning_rate": 3.1285541641199383e-05, | |
| "loss": 0.136, | |
| "mean_token_accuracy": 0.9600744768977165, | |
| "step": 591 | |
| }, | |
| { | |
| "epoch": 3.463343108504399, | |
| "grad_norm": 1.1562641886722125, | |
| "learning_rate": 3.1256147248651166e-05, | |
| "loss": 0.1167, | |
| "mean_token_accuracy": 0.9670383408665657, | |
| "step": 592 | |
| }, | |
| { | |
| "epoch": 3.469208211143695, | |
| "grad_norm": 1.140404363160057, | |
| "learning_rate": 3.122671926002675e-05, | |
| "loss": 0.1379, | |
| "mean_token_accuracy": 0.9537649974226952, | |
| "step": 593 | |
| }, | |
| { | |
| "epoch": 3.4750733137829912, | |
| "grad_norm": 1.0494864326031599, | |
| "learning_rate": 3.119725778213785e-05, | |
| "loss": 0.1272, | |
| "mean_token_accuracy": 0.9602258205413818, | |
| "step": 594 | |
| }, | |
| { | |
| "epoch": 3.4809384164222874, | |
| "grad_norm": 1.3319663312850152, | |
| "learning_rate": 3.116776292191774e-05, | |
| "loss": 0.1475, | |
| "mean_token_accuracy": 0.9561516419053078, | |
| "step": 595 | |
| }, | |
| { | |
| "epoch": 3.4868035190615836, | |
| "grad_norm": 1.097406612158555, | |
| "learning_rate": 3.1138234786420834e-05, | |
| "loss": 0.1126, | |
| "mean_token_accuracy": 0.9662728533148766, | |
| "step": 596 | |
| }, | |
| { | |
| "epoch": 3.4926686217008798, | |
| "grad_norm": 0.9710678312472462, | |
| "learning_rate": 3.110867348282235e-05, | |
| "loss": 0.125, | |
| "mean_token_accuracy": 0.959438756108284, | |
| "step": 597 | |
| }, | |
| { | |
| "epoch": 3.498533724340176, | |
| "grad_norm": 1.144735358404297, | |
| "learning_rate": 3.107907911841787e-05, | |
| "loss": 0.1207, | |
| "mean_token_accuracy": 0.9607328251004219, | |
| "step": 598 | |
| }, | |
| { | |
| "epoch": 3.504398826979472, | |
| "grad_norm": 1.0324245303515245, | |
| "learning_rate": 3.104945180062301e-05, | |
| "loss": 0.1202, | |
| "mean_token_accuracy": 0.9630052521824837, | |
| "step": 599 | |
| }, | |
| { | |
| "epoch": 3.5102639296187683, | |
| "grad_norm": 0.9863915422704965, | |
| "learning_rate": 3.1019791636972936e-05, | |
| "loss": 0.1176, | |
| "mean_token_accuracy": 0.960706502199173, | |
| "step": 600 | |
| }, | |
| { | |
| "epoch": 3.5161290322580645, | |
| "grad_norm": 1.1729388689241018, | |
| "learning_rate": 3.099009873512208e-05, | |
| "loss": 0.13, | |
| "mean_token_accuracy": 0.9642725735902786, | |
| "step": 601 | |
| }, | |
| { | |
| "epoch": 3.5219941348973607, | |
| "grad_norm": 1.047671209762438, | |
| "learning_rate": 3.0960373202843685e-05, | |
| "loss": 0.1171, | |
| "mean_token_accuracy": 0.9662189558148384, | |
| "step": 602 | |
| }, | |
| { | |
| "epoch": 3.527859237536657, | |
| "grad_norm": 1.2279707434064486, | |
| "learning_rate": 3.093061514802943e-05, | |
| "loss": 0.1361, | |
| "mean_token_accuracy": 0.9595428928732872, | |
| "step": 603 | |
| }, | |
| { | |
| "epoch": 3.533724340175953, | |
| "grad_norm": 1.0214846172440661, | |
| "learning_rate": 3.090082467868901e-05, | |
| "loss": 0.1096, | |
| "mean_token_accuracy": 0.9664286971092224, | |
| "step": 604 | |
| }, | |
| { | |
| "epoch": 3.5395894428152492, | |
| "grad_norm": 1.0733131073175353, | |
| "learning_rate": 3.087100190294983e-05, | |
| "loss": 0.135, | |
| "mean_token_accuracy": 0.961511418223381, | |
| "step": 605 | |
| }, | |
| { | |
| "epoch": 3.5454545454545454, | |
| "grad_norm": 1.1130636637804585, | |
| "learning_rate": 3.0841146929056505e-05, | |
| "loss": 0.1385, | |
| "mean_token_accuracy": 0.9636392369866371, | |
| "step": 606 | |
| }, | |
| { | |
| "epoch": 3.5513196480938416, | |
| "grad_norm": 1.2231653555608124, | |
| "learning_rate": 3.0811259865370535e-05, | |
| "loss": 0.1185, | |
| "mean_token_accuracy": 0.9636795818805695, | |
| "step": 607 | |
| }, | |
| { | |
| "epoch": 3.557184750733138, | |
| "grad_norm": 1.0296766827227837, | |
| "learning_rate": 3.07813408203699e-05, | |
| "loss": 0.1138, | |
| "mean_token_accuracy": 0.966869942843914, | |
| "step": 608 | |
| }, | |
| { | |
| "epoch": 3.563049853372434, | |
| "grad_norm": 0.8726518475985359, | |
| "learning_rate": 3.075138990264863e-05, | |
| "loss": 0.1345, | |
| "mean_token_accuracy": 0.9573319032788277, | |
| "step": 609 | |
| }, | |
| { | |
| "epoch": 3.56891495601173, | |
| "grad_norm": 0.982040690626221, | |
| "learning_rate": 3.072140722091648e-05, | |
| "loss": 0.1098, | |
| "mean_token_accuracy": 0.9681224301457405, | |
| "step": 610 | |
| }, | |
| { | |
| "epoch": 3.5747800586510263, | |
| "grad_norm": 1.0451643330650482, | |
| "learning_rate": 3.0691392883998455e-05, | |
| "loss": 0.1462, | |
| "mean_token_accuracy": 0.9576861411333084, | |
| "step": 611 | |
| }, | |
| { | |
| "epoch": 3.5806451612903225, | |
| "grad_norm": 1.1110176689312552, | |
| "learning_rate": 3.0661347000834496e-05, | |
| "loss": 0.1205, | |
| "mean_token_accuracy": 0.9639183133840561, | |
| "step": 612 | |
| }, | |
| { | |
| "epoch": 3.5865102639296187, | |
| "grad_norm": 1.0329228692943557, | |
| "learning_rate": 3.063126968047901e-05, | |
| "loss": 0.1209, | |
| "mean_token_accuracy": 0.9623586907982826, | |
| "step": 613 | |
| }, | |
| { | |
| "epoch": 3.592375366568915, | |
| "grad_norm": 1.0468458895005488, | |
| "learning_rate": 3.060116103210053e-05, | |
| "loss": 0.1091, | |
| "mean_token_accuracy": 0.9651313573122025, | |
| "step": 614 | |
| }, | |
| { | |
| "epoch": 3.598240469208211, | |
| "grad_norm": 0.8132398233992888, | |
| "learning_rate": 3.057102116498129e-05, | |
| "loss": 0.1174, | |
| "mean_token_accuracy": 0.9631007760763168, | |
| "step": 615 | |
| }, | |
| { | |
| "epoch": 3.6041055718475072, | |
| "grad_norm": 1.242549002424926, | |
| "learning_rate": 3.0540850188516826e-05, | |
| "loss": 0.1425, | |
| "mean_token_accuracy": 0.959061473608017, | |
| "step": 616 | |
| }, | |
| { | |
| "epoch": 3.6099706744868034, | |
| "grad_norm": 1.0993514477355864, | |
| "learning_rate": 3.051064821221561e-05, | |
| "loss": 0.1031, | |
| "mean_token_accuracy": 0.9673552736639977, | |
| "step": 617 | |
| }, | |
| { | |
| "epoch": 3.6158357771260996, | |
| "grad_norm": 1.1657904332320168, | |
| "learning_rate": 3.0480415345698606e-05, | |
| "loss": 0.1552, | |
| "mean_token_accuracy": 0.9506959840655327, | |
| "step": 618 | |
| }, | |
| { | |
| "epoch": 3.621700879765396, | |
| "grad_norm": 1.150960665843493, | |
| "learning_rate": 3.045015169869892e-05, | |
| "loss": 0.1195, | |
| "mean_token_accuracy": 0.9686484485864639, | |
| "step": 619 | |
| }, | |
| { | |
| "epoch": 3.627565982404692, | |
| "grad_norm": 1.1076750183604511, | |
| "learning_rate": 3.0419857381061355e-05, | |
| "loss": 0.1421, | |
| "mean_token_accuracy": 0.9578369930386543, | |
| "step": 620 | |
| }, | |
| { | |
| "epoch": 3.633431085043988, | |
| "grad_norm": 0.9829376148207803, | |
| "learning_rate": 3.0389532502742066e-05, | |
| "loss": 0.1176, | |
| "mean_token_accuracy": 0.9635881930589676, | |
| "step": 621 | |
| }, | |
| { | |
| "epoch": 3.6392961876832843, | |
| "grad_norm": 1.0113335224798026, | |
| "learning_rate": 3.0359177173808104e-05, | |
| "loss": 0.1323, | |
| "mean_token_accuracy": 0.9630266651511192, | |
| "step": 622 | |
| }, | |
| { | |
| "epoch": 3.6451612903225805, | |
| "grad_norm": 0.9697653312050357, | |
| "learning_rate": 3.032879150443705e-05, | |
| "loss": 0.122, | |
| "mean_token_accuracy": 0.9638355746865273, | |
| "step": 623 | |
| }, | |
| { | |
| "epoch": 3.6510263929618767, | |
| "grad_norm": 1.024466551155938, | |
| "learning_rate": 3.029837560491662e-05, | |
| "loss": 0.124, | |
| "mean_token_accuracy": 0.9646790847182274, | |
| "step": 624 | |
| }, | |
| { | |
| "epoch": 3.656891495601173, | |
| "grad_norm": 1.124859894371915, | |
| "learning_rate": 3.0267929585644236e-05, | |
| "loss": 0.1337, | |
| "mean_token_accuracy": 0.9610456451773643, | |
| "step": 625 | |
| }, | |
| { | |
| "epoch": 3.662756598240469, | |
| "grad_norm": 0.9189572296363571, | |
| "learning_rate": 3.0237453557126656e-05, | |
| "loss": 0.1206, | |
| "mean_token_accuracy": 0.9620600938796997, | |
| "step": 626 | |
| }, | |
| { | |
| "epoch": 3.6686217008797652, | |
| "grad_norm": 0.9491940147888425, | |
| "learning_rate": 3.020694762997956e-05, | |
| "loss": 0.1331, | |
| "mean_token_accuracy": 0.9604885205626488, | |
| "step": 627 | |
| }, | |
| { | |
| "epoch": 3.6744868035190614, | |
| "grad_norm": 0.9537881007771453, | |
| "learning_rate": 3.017641191492714e-05, | |
| "loss": 0.1008, | |
| "mean_token_accuracy": 0.9705606251955032, | |
| "step": 628 | |
| }, | |
| { | |
| "epoch": 3.6803519061583576, | |
| "grad_norm": 0.9318456905477659, | |
| "learning_rate": 3.0145846522801703e-05, | |
| "loss": 0.104, | |
| "mean_token_accuracy": 0.9685833752155304, | |
| "step": 629 | |
| }, | |
| { | |
| "epoch": 3.686217008797654, | |
| "grad_norm": 1.1342752028945045, | |
| "learning_rate": 3.0115251564543287e-05, | |
| "loss": 0.1529, | |
| "mean_token_accuracy": 0.9536353126168251, | |
| "step": 630 | |
| }, | |
| { | |
| "epoch": 3.6920821114369504, | |
| "grad_norm": 1.26477365670727, | |
| "learning_rate": 3.008462715119922e-05, | |
| "loss": 0.1819, | |
| "mean_token_accuracy": 0.9487387984991074, | |
| "step": 631 | |
| }, | |
| { | |
| "epoch": 3.6979472140762466, | |
| "grad_norm": 1.1689930066757435, | |
| "learning_rate": 3.0053973393923768e-05, | |
| "loss": 0.1121, | |
| "mean_token_accuracy": 0.9648066237568855, | |
| "step": 632 | |
| }, | |
| { | |
| "epoch": 3.703812316715543, | |
| "grad_norm": 1.1899537022041637, | |
| "learning_rate": 3.0023290403977694e-05, | |
| "loss": 0.1543, | |
| "mean_token_accuracy": 0.9553825706243515, | |
| "step": 633 | |
| }, | |
| { | |
| "epoch": 3.709677419354839, | |
| "grad_norm": 1.1940929525894617, | |
| "learning_rate": 2.9992578292727842e-05, | |
| "loss": 0.133, | |
| "mean_token_accuracy": 0.9566139429807663, | |
| "step": 634 | |
| }, | |
| { | |
| "epoch": 3.715542521994135, | |
| "grad_norm": 1.021003772912016, | |
| "learning_rate": 2.9961837171646778e-05, | |
| "loss": 0.1224, | |
| "mean_token_accuracy": 0.9648139327764511, | |
| "step": 635 | |
| }, | |
| { | |
| "epoch": 3.7214076246334313, | |
| "grad_norm": 1.0968439934399599, | |
| "learning_rate": 2.993106715231237e-05, | |
| "loss": 0.1365, | |
| "mean_token_accuracy": 0.9622660800814629, | |
| "step": 636 | |
| }, | |
| { | |
| "epoch": 3.7272727272727275, | |
| "grad_norm": 1.284149147587276, | |
| "learning_rate": 2.9900268346407336e-05, | |
| "loss": 0.1491, | |
| "mean_token_accuracy": 0.9584473669528961, | |
| "step": 637 | |
| }, | |
| { | |
| "epoch": 3.7331378299120237, | |
| "grad_norm": 1.0442806693508573, | |
| "learning_rate": 2.986944086571893e-05, | |
| "loss": 0.1655, | |
| "mean_token_accuracy": 0.9546647220849991, | |
| "step": 638 | |
| }, | |
| { | |
| "epoch": 3.73900293255132, | |
| "grad_norm": 1.0873198584921593, | |
| "learning_rate": 2.983858482213843e-05, | |
| "loss": 0.1073, | |
| "mean_token_accuracy": 0.9703760594129562, | |
| "step": 639 | |
| }, | |
| { | |
| "epoch": 3.744868035190616, | |
| "grad_norm": 0.8172269296217749, | |
| "learning_rate": 2.9807700327660834e-05, | |
| "loss": 0.1367, | |
| "mean_token_accuracy": 0.9637920260429382, | |
| "step": 640 | |
| }, | |
| { | |
| "epoch": 3.7507331378299122, | |
| "grad_norm": 1.0256265023187172, | |
| "learning_rate": 2.977678749438437e-05, | |
| "loss": 0.1342, | |
| "mean_token_accuracy": 0.9610049724578857, | |
| "step": 641 | |
| }, | |
| { | |
| "epoch": 3.7565982404692084, | |
| "grad_norm": 0.9516278233131595, | |
| "learning_rate": 2.9745846434510146e-05, | |
| "loss": 0.1213, | |
| "mean_token_accuracy": 0.9634719267487526, | |
| "step": 642 | |
| }, | |
| { | |
| "epoch": 3.7624633431085046, | |
| "grad_norm": 1.1595407462435696, | |
| "learning_rate": 2.9714877260341705e-05, | |
| "loss": 0.1302, | |
| "mean_token_accuracy": 0.9575833827257156, | |
| "step": 643 | |
| }, | |
| { | |
| "epoch": 3.768328445747801, | |
| "grad_norm": 0.8386228216414092, | |
| "learning_rate": 2.9683880084284648e-05, | |
| "loss": 0.0906, | |
| "mean_token_accuracy": 0.9728109389543533, | |
| "step": 644 | |
| }, | |
| { | |
| "epoch": 3.774193548387097, | |
| "grad_norm": 0.9416321297213441, | |
| "learning_rate": 2.96528550188462e-05, | |
| "loss": 0.1395, | |
| "mean_token_accuracy": 0.9599805325269699, | |
| "step": 645 | |
| }, | |
| { | |
| "epoch": 3.780058651026393, | |
| "grad_norm": 0.96183773603705, | |
| "learning_rate": 2.962180217663483e-05, | |
| "loss": 0.1439, | |
| "mean_token_accuracy": 0.9582567065954208, | |
| "step": 646 | |
| }, | |
| { | |
| "epoch": 3.7859237536656893, | |
| "grad_norm": 1.131360501733182, | |
| "learning_rate": 2.95907216703598e-05, | |
| "loss": 0.1289, | |
| "mean_token_accuracy": 0.9588178247213364, | |
| "step": 647 | |
| }, | |
| { | |
| "epoch": 3.7917888563049855, | |
| "grad_norm": 1.1889300836649888, | |
| "learning_rate": 2.9559613612830797e-05, | |
| "loss": 0.1405, | |
| "mean_token_accuracy": 0.9560826346278191, | |
| "step": 648 | |
| }, | |
| { | |
| "epoch": 3.7976539589442817, | |
| "grad_norm": 0.9285715986177883, | |
| "learning_rate": 2.952847811695751e-05, | |
| "loss": 0.1138, | |
| "mean_token_accuracy": 0.9683981463313103, | |
| "step": 649 | |
| }, | |
| { | |
| "epoch": 3.803519061583578, | |
| "grad_norm": 0.903376412534086, | |
| "learning_rate": 2.9497315295749218e-05, | |
| "loss": 0.1312, | |
| "mean_token_accuracy": 0.9626697823405266, | |
| "step": 650 | |
| }, | |
| { | |
| "epoch": 3.809384164222874, | |
| "grad_norm": 1.4224710911985958, | |
| "learning_rate": 2.9466125262314368e-05, | |
| "loss": 0.1845, | |
| "mean_token_accuracy": 0.9472808763384819, | |
| "step": 651 | |
| }, | |
| { | |
| "epoch": 3.8152492668621703, | |
| "grad_norm": 1.1203349050086433, | |
| "learning_rate": 2.9434908129860193e-05, | |
| "loss": 0.1162, | |
| "mean_token_accuracy": 0.9646425694227219, | |
| "step": 652 | |
| }, | |
| { | |
| "epoch": 3.8211143695014664, | |
| "grad_norm": 1.039353270090754, | |
| "learning_rate": 2.9403664011692276e-05, | |
| "loss": 0.1513, | |
| "mean_token_accuracy": 0.954769104719162, | |
| "step": 653 | |
| }, | |
| { | |
| "epoch": 3.8269794721407626, | |
| "grad_norm": 1.0214390534355535, | |
| "learning_rate": 2.9372393021214134e-05, | |
| "loss": 0.1527, | |
| "mean_token_accuracy": 0.9548315033316612, | |
| "step": 654 | |
| }, | |
| { | |
| "epoch": 3.832844574780059, | |
| "grad_norm": 1.096985857955185, | |
| "learning_rate": 2.9341095271926842e-05, | |
| "loss": 0.1328, | |
| "mean_token_accuracy": 0.9606662914156914, | |
| "step": 655 | |
| }, | |
| { | |
| "epoch": 3.838709677419355, | |
| "grad_norm": 1.0291726390166944, | |
| "learning_rate": 2.930977087742859e-05, | |
| "loss": 0.1116, | |
| "mean_token_accuracy": 0.9647061824798584, | |
| "step": 656 | |
| }, | |
| { | |
| "epoch": 3.844574780058651, | |
| "grad_norm": 0.9808743642711901, | |
| "learning_rate": 2.9278419951414277e-05, | |
| "loss": 0.1268, | |
| "mean_token_accuracy": 0.9629314094781876, | |
| "step": 657 | |
| }, | |
| { | |
| "epoch": 3.8504398826979473, | |
| "grad_norm": 0.8692894059522167, | |
| "learning_rate": 2.9247042607675105e-05, | |
| "loss": 0.1229, | |
| "mean_token_accuracy": 0.9611641019582748, | |
| "step": 658 | |
| }, | |
| { | |
| "epoch": 3.8563049853372435, | |
| "grad_norm": 0.8870762370430467, | |
| "learning_rate": 2.9215638960098164e-05, | |
| "loss": 0.0913, | |
| "mean_token_accuracy": 0.9687012657523155, | |
| "step": 659 | |
| }, | |
| { | |
| "epoch": 3.8621700879765397, | |
| "grad_norm": 0.8587891520353423, | |
| "learning_rate": 2.9184209122665996e-05, | |
| "loss": 0.129, | |
| "mean_token_accuracy": 0.9611743912100792, | |
| "step": 660 | |
| }, | |
| { | |
| "epoch": 3.868035190615836, | |
| "grad_norm": 1.0641124073179626, | |
| "learning_rate": 2.915275320945623e-05, | |
| "loss": 0.1436, | |
| "mean_token_accuracy": 0.9597739949822426, | |
| "step": 661 | |
| }, | |
| { | |
| "epoch": 3.873900293255132, | |
| "grad_norm": 1.238895601397053, | |
| "learning_rate": 2.9121271334641127e-05, | |
| "loss": 0.1393, | |
| "mean_token_accuracy": 0.9588789939880371, | |
| "step": 662 | |
| }, | |
| { | |
| "epoch": 3.8797653958944283, | |
| "grad_norm": 1.0953247983118455, | |
| "learning_rate": 2.908976361248717e-05, | |
| "loss": 0.1197, | |
| "mean_token_accuracy": 0.9674094244837761, | |
| "step": 663 | |
| }, | |
| { | |
| "epoch": 3.8856304985337244, | |
| "grad_norm": 0.9234924370356041, | |
| "learning_rate": 2.9058230157354674e-05, | |
| "loss": 0.1458, | |
| "mean_token_accuracy": 0.9566569328308105, | |
| "step": 664 | |
| }, | |
| { | |
| "epoch": 3.8914956011730206, | |
| "grad_norm": 1.243221731581411, | |
| "learning_rate": 2.902667108369734e-05, | |
| "loss": 0.1562, | |
| "mean_token_accuracy": 0.9574529975652695, | |
| "step": 665 | |
| }, | |
| { | |
| "epoch": 3.897360703812317, | |
| "grad_norm": 1.1983136333100237, | |
| "learning_rate": 2.8995086506061862e-05, | |
| "loss": 0.1253, | |
| "mean_token_accuracy": 0.9618943110108376, | |
| "step": 666 | |
| }, | |
| { | |
| "epoch": 3.903225806451613, | |
| "grad_norm": 1.0812316125608272, | |
| "learning_rate": 2.896347653908749e-05, | |
| "loss": 0.1164, | |
| "mean_token_accuracy": 0.9669682160019875, | |
| "step": 667 | |
| }, | |
| { | |
| "epoch": 3.909090909090909, | |
| "grad_norm": 0.8182003202052918, | |
| "learning_rate": 2.8931841297505657e-05, | |
| "loss": 0.1224, | |
| "mean_token_accuracy": 0.9632921889424324, | |
| "step": 668 | |
| }, | |
| { | |
| "epoch": 3.9149560117302054, | |
| "grad_norm": 0.8941991483105637, | |
| "learning_rate": 2.8900180896139503e-05, | |
| "loss": 0.1024, | |
| "mean_token_accuracy": 0.9696899205446243, | |
| "step": 669 | |
| }, | |
| { | |
| "epoch": 3.9208211143695015, | |
| "grad_norm": 0.9809121287934861, | |
| "learning_rate": 2.8868495449903498e-05, | |
| "loss": 0.0985, | |
| "mean_token_accuracy": 0.9719806686043739, | |
| "step": 670 | |
| }, | |
| { | |
| "epoch": 3.9266862170087977, | |
| "grad_norm": 0.8417864765905216, | |
| "learning_rate": 2.8836785073803014e-05, | |
| "loss": 0.1086, | |
| "mean_token_accuracy": 0.9669199660420418, | |
| "step": 671 | |
| }, | |
| { | |
| "epoch": 3.932551319648094, | |
| "grad_norm": 0.8345877730462873, | |
| "learning_rate": 2.880504988293391e-05, | |
| "loss": 0.1313, | |
| "mean_token_accuracy": 0.9613889157772064, | |
| "step": 672 | |
| }, | |
| { | |
| "epoch": 3.93841642228739, | |
| "grad_norm": 0.9129539631264306, | |
| "learning_rate": 2.8773289992482115e-05, | |
| "loss": 0.1131, | |
| "mean_token_accuracy": 0.9657901525497437, | |
| "step": 673 | |
| }, | |
| { | |
| "epoch": 3.9442815249266863, | |
| "grad_norm": 0.9668219103314973, | |
| "learning_rate": 2.87415055177232e-05, | |
| "loss": 0.1194, | |
| "mean_token_accuracy": 0.9630595743656158, | |
| "step": 674 | |
| }, | |
| { | |
| "epoch": 3.9501466275659824, | |
| "grad_norm": 0.825214625411393, | |
| "learning_rate": 2.870969657402197e-05, | |
| "loss": 0.1303, | |
| "mean_token_accuracy": 0.9587484747171402, | |
| "step": 675 | |
| }, | |
| { | |
| "epoch": 3.9560117302052786, | |
| "grad_norm": 1.2778726879319384, | |
| "learning_rate": 2.867786327683205e-05, | |
| "loss": 0.1641, | |
| "mean_token_accuracy": 0.9540645256638527, | |
| "step": 676 | |
| }, | |
| { | |
| "epoch": 3.961876832844575, | |
| "grad_norm": 1.2095162631576735, | |
| "learning_rate": 2.864600574169545e-05, | |
| "loss": 0.1297, | |
| "mean_token_accuracy": 0.9639428108930588, | |
| "step": 677 | |
| }, | |
| { | |
| "epoch": 3.967741935483871, | |
| "grad_norm": 1.2245034866624815, | |
| "learning_rate": 2.861412408424216e-05, | |
| "loss": 0.1298, | |
| "mean_token_accuracy": 0.961310125887394, | |
| "step": 678 | |
| }, | |
| { | |
| "epoch": 3.973607038123167, | |
| "grad_norm": 1.0900181972818253, | |
| "learning_rate": 2.8582218420189706e-05, | |
| "loss": 0.1234, | |
| "mean_token_accuracy": 0.9650765135884285, | |
| "step": 679 | |
| }, | |
| { | |
| "epoch": 3.9794721407624634, | |
| "grad_norm": 0.995318870496006, | |
| "learning_rate": 2.855028886534278e-05, | |
| "loss": 0.149, | |
| "mean_token_accuracy": 0.9595285654067993, | |
| "step": 680 | |
| }, | |
| { | |
| "epoch": 3.9853372434017595, | |
| "grad_norm": 1.2425322980130953, | |
| "learning_rate": 2.851833553559276e-05, | |
| "loss": 0.1267, | |
| "mean_token_accuracy": 0.963188923895359, | |
| "step": 681 | |
| }, | |
| { | |
| "epoch": 3.9912023460410557, | |
| "grad_norm": 0.9820863081556225, | |
| "learning_rate": 2.848635854691733e-05, | |
| "loss": 0.1348, | |
| "mean_token_accuracy": 0.9570389837026596, | |
| "step": 682 | |
| }, | |
| { | |
| "epoch": 3.997067448680352, | |
| "grad_norm": 0.9569301662769316, | |
| "learning_rate": 2.8454358015380046e-05, | |
| "loss": 0.107, | |
| "mean_token_accuracy": 0.9685926288366318, | |
| "step": 683 | |
| }, | |
| { | |
| "epoch": 4.0, | |
| "grad_norm": 1.4219372689116887, | |
| "learning_rate": 2.8422334057129913e-05, | |
| "loss": 0.1163, | |
| "mean_token_accuracy": 0.9649382084608078, | |
| "step": 684 | |
| }, | |
| { | |
| "epoch": 4.005865102639296, | |
| "grad_norm": 0.709241476686682, | |
| "learning_rate": 2.8390286788400967e-05, | |
| "loss": 0.0887, | |
| "mean_token_accuracy": 0.9674655348062515, | |
| "step": 685 | |
| }, | |
| { | |
| "epoch": 4.011730205278592, | |
| "grad_norm": 0.6173979664464195, | |
| "learning_rate": 2.8358216325511847e-05, | |
| "loss": 0.0774, | |
| "mean_token_accuracy": 0.9766078963875771, | |
| "step": 686 | |
| }, | |
| { | |
| "epoch": 4.0175953079178885, | |
| "grad_norm": 0.9128727805263251, | |
| "learning_rate": 2.832612278486538e-05, | |
| "loss": 0.1148, | |
| "mean_token_accuracy": 0.9675131067633629, | |
| "step": 687 | |
| }, | |
| { | |
| "epoch": 4.023460410557185, | |
| "grad_norm": 0.9010122632573339, | |
| "learning_rate": 2.8294006282948165e-05, | |
| "loss": 0.1047, | |
| "mean_token_accuracy": 0.9745191410183907, | |
| "step": 688 | |
| }, | |
| { | |
| "epoch": 4.029325513196481, | |
| "grad_norm": 0.7585836610895886, | |
| "learning_rate": 2.8261866936330123e-05, | |
| "loss": 0.0892, | |
| "mean_token_accuracy": 0.9691687375307083, | |
| "step": 689 | |
| }, | |
| { | |
| "epoch": 4.035190615835777, | |
| "grad_norm": 0.7878473822842609, | |
| "learning_rate": 2.8229704861664113e-05, | |
| "loss": 0.0876, | |
| "mean_token_accuracy": 0.974379763007164, | |
| "step": 690 | |
| }, | |
| { | |
| "epoch": 4.041055718475073, | |
| "grad_norm": 0.7821495099972663, | |
| "learning_rate": 2.8197520175685462e-05, | |
| "loss": 0.0847, | |
| "mean_token_accuracy": 0.975913368165493, | |
| "step": 691 | |
| }, | |
| { | |
| "epoch": 4.0469208211143695, | |
| "grad_norm": 0.7293539456978901, | |
| "learning_rate": 2.8165312995211596e-05, | |
| "loss": 0.0831, | |
| "mean_token_accuracy": 0.9754629284143448, | |
| "step": 692 | |
| }, | |
| { | |
| "epoch": 4.052785923753666, | |
| "grad_norm": 0.8275396448134474, | |
| "learning_rate": 2.813308343714156e-05, | |
| "loss": 0.0917, | |
| "mean_token_accuracy": 0.9701932370662689, | |
| "step": 693 | |
| }, | |
| { | |
| "epoch": 4.058651026392962, | |
| "grad_norm": 1.0230174194614936, | |
| "learning_rate": 2.810083161845564e-05, | |
| "loss": 0.1045, | |
| "mean_token_accuracy": 0.9693364053964615, | |
| "step": 694 | |
| }, | |
| { | |
| "epoch": 4.064516129032258, | |
| "grad_norm": 0.9057093506395073, | |
| "learning_rate": 2.8068557656214913e-05, | |
| "loss": 0.0939, | |
| "mean_token_accuracy": 0.96998730301857, | |
| "step": 695 | |
| }, | |
| { | |
| "epoch": 4.070381231671554, | |
| "grad_norm": 0.6599318416871625, | |
| "learning_rate": 2.8036261667560826e-05, | |
| "loss": 0.088, | |
| "mean_token_accuracy": 0.9751304760575294, | |
| "step": 696 | |
| }, | |
| { | |
| "epoch": 4.07624633431085, | |
| "grad_norm": 1.0436927596250734, | |
| "learning_rate": 2.8003943769714776e-05, | |
| "loss": 0.1212, | |
| "mean_token_accuracy": 0.9670998901128769, | |
| "step": 697 | |
| }, | |
| { | |
| "epoch": 4.0821114369501466, | |
| "grad_norm": 1.1198652536954283, | |
| "learning_rate": 2.7971604079977673e-05, | |
| "loss": 0.1195, | |
| "mean_token_accuracy": 0.9634276106953621, | |
| "step": 698 | |
| }, | |
| { | |
| "epoch": 4.087976539589443, | |
| "grad_norm": 1.2320403179620185, | |
| "learning_rate": 2.793924271572954e-05, | |
| "loss": 0.1039, | |
| "mean_token_accuracy": 0.9721868559718132, | |
| "step": 699 | |
| }, | |
| { | |
| "epoch": 4.093841642228739, | |
| "grad_norm": 1.3210237101709343, | |
| "learning_rate": 2.7906859794429047e-05, | |
| "loss": 0.1096, | |
| "mean_token_accuracy": 0.9650368392467499, | |
| "step": 700 | |
| }, | |
| { | |
| "epoch": 4.099706744868035, | |
| "grad_norm": 0.8181651139051924, | |
| "learning_rate": 2.787445543361313e-05, | |
| "loss": 0.0839, | |
| "mean_token_accuracy": 0.9707164391875267, | |
| "step": 701 | |
| }, | |
| { | |
| "epoch": 4.105571847507331, | |
| "grad_norm": 0.8473790019730275, | |
| "learning_rate": 2.7842029750896525e-05, | |
| "loss": 0.1009, | |
| "mean_token_accuracy": 0.9700881019234657, | |
| "step": 702 | |
| }, | |
| { | |
| "epoch": 4.1114369501466275, | |
| "grad_norm": 0.9904668060663937, | |
| "learning_rate": 2.7809582863971373e-05, | |
| "loss": 0.1043, | |
| "mean_token_accuracy": 0.9687797203660011, | |
| "step": 703 | |
| }, | |
| { | |
| "epoch": 4.117302052785924, | |
| "grad_norm": 0.9380936164168817, | |
| "learning_rate": 2.777711489060676e-05, | |
| "loss": 0.1144, | |
| "mean_token_accuracy": 0.9675206765532494, | |
| "step": 704 | |
| }, | |
| { | |
| "epoch": 4.12316715542522, | |
| "grad_norm": 0.9031920501562392, | |
| "learning_rate": 2.7744625948648316e-05, | |
| "loss": 0.1035, | |
| "mean_token_accuracy": 0.9714310243725777, | |
| "step": 705 | |
| }, | |
| { | |
| "epoch": 4.129032258064516, | |
| "grad_norm": 0.8428938896716028, | |
| "learning_rate": 2.7712116156017783e-05, | |
| "loss": 0.097, | |
| "mean_token_accuracy": 0.9733357578516006, | |
| "step": 706 | |
| }, | |
| { | |
| "epoch": 4.134897360703812, | |
| "grad_norm": 0.9412808030019473, | |
| "learning_rate": 2.7679585630712585e-05, | |
| "loss": 0.1113, | |
| "mean_token_accuracy": 0.9657137170433998, | |
| "step": 707 | |
| }, | |
| { | |
| "epoch": 4.140762463343108, | |
| "grad_norm": 0.7779470899780662, | |
| "learning_rate": 2.764703449080538e-05, | |
| "loss": 0.0978, | |
| "mean_token_accuracy": 0.97261843085289, | |
| "step": 708 | |
| }, | |
| { | |
| "epoch": 4.146627565982405, | |
| "grad_norm": 0.8327366946019571, | |
| "learning_rate": 2.761446285444366e-05, | |
| "loss": 0.099, | |
| "mean_token_accuracy": 0.9688172861933708, | |
| "step": 709 | |
| }, | |
| { | |
| "epoch": 4.152492668621701, | |
| "grad_norm": 0.6166216833854606, | |
| "learning_rate": 2.758187083984931e-05, | |
| "loss": 0.0795, | |
| "mean_token_accuracy": 0.9770351052284241, | |
| "step": 710 | |
| }, | |
| { | |
| "epoch": 4.158357771260997, | |
| "grad_norm": 0.981652746856198, | |
| "learning_rate": 2.754925856531819e-05, | |
| "loss": 0.1155, | |
| "mean_token_accuracy": 0.9650738835334778, | |
| "step": 711 | |
| }, | |
| { | |
| "epoch": 4.164222873900293, | |
| "grad_norm": 0.8621583383424518, | |
| "learning_rate": 2.7516626149219678e-05, | |
| "loss": 0.094, | |
| "mean_token_accuracy": 0.9712882563471794, | |
| "step": 712 | |
| }, | |
| { | |
| "epoch": 4.170087976539589, | |
| "grad_norm": 0.8715573279290548, | |
| "learning_rate": 2.7483973709996267e-05, | |
| "loss": 0.1067, | |
| "mean_token_accuracy": 0.9686557427048683, | |
| "step": 713 | |
| }, | |
| { | |
| "epoch": 4.1759530791788855, | |
| "grad_norm": 0.9830720772685587, | |
| "learning_rate": 2.7451301366163116e-05, | |
| "loss": 0.1157, | |
| "mean_token_accuracy": 0.9641370326280594, | |
| "step": 714 | |
| }, | |
| { | |
| "epoch": 4.181818181818182, | |
| "grad_norm": 0.614999654305874, | |
| "learning_rate": 2.741860923630765e-05, | |
| "loss": 0.0798, | |
| "mean_token_accuracy": 0.97749213129282, | |
| "step": 715 | |
| }, | |
| { | |
| "epoch": 4.187683284457478, | |
| "grad_norm": 0.8528408299395791, | |
| "learning_rate": 2.7385897439089086e-05, | |
| "loss": 0.1054, | |
| "mean_token_accuracy": 0.9677549675107002, | |
| "step": 716 | |
| }, | |
| { | |
| "epoch": 4.193548387096774, | |
| "grad_norm": 0.9483311566184626, | |
| "learning_rate": 2.735316609323804e-05, | |
| "loss": 0.1161, | |
| "mean_token_accuracy": 0.9652642160654068, | |
| "step": 717 | |
| }, | |
| { | |
| "epoch": 4.19941348973607, | |
| "grad_norm": 0.9247175808359333, | |
| "learning_rate": 2.7320415317556085e-05, | |
| "loss": 0.1038, | |
| "mean_token_accuracy": 0.9713630378246307, | |
| "step": 718 | |
| }, | |
| { | |
| "epoch": 4.205278592375366, | |
| "grad_norm": 0.7419985712721145, | |
| "learning_rate": 2.72876452309153e-05, | |
| "loss": 0.0766, | |
| "mean_token_accuracy": 0.9739850908517838, | |
| "step": 719 | |
| }, | |
| { | |
| "epoch": 4.211143695014663, | |
| "grad_norm": 1.0296466250591831, | |
| "learning_rate": 2.7254855952257867e-05, | |
| "loss": 0.1103, | |
| "mean_token_accuracy": 0.9655221551656723, | |
| "step": 720 | |
| }, | |
| { | |
| "epoch": 4.217008797653959, | |
| "grad_norm": 1.1384543982681885, | |
| "learning_rate": 2.7222047600595626e-05, | |
| "loss": 0.1285, | |
| "mean_token_accuracy": 0.9629199728369713, | |
| "step": 721 | |
| }, | |
| { | |
| "epoch": 4.222873900293255, | |
| "grad_norm": 0.8447805103003336, | |
| "learning_rate": 2.718922029500965e-05, | |
| "loss": 0.1048, | |
| "mean_token_accuracy": 0.9712353870272636, | |
| "step": 722 | |
| }, | |
| { | |
| "epoch": 4.228739002932551, | |
| "grad_norm": 0.7431719179679741, | |
| "learning_rate": 2.7156374154649787e-05, | |
| "loss": 0.0924, | |
| "mean_token_accuracy": 0.9682846069335938, | |
| "step": 723 | |
| }, | |
| { | |
| "epoch": 4.234604105571847, | |
| "grad_norm": 0.7009163129349052, | |
| "learning_rate": 2.7123509298734267e-05, | |
| "loss": 0.0918, | |
| "mean_token_accuracy": 0.9705768898129463, | |
| "step": 724 | |
| }, | |
| { | |
| "epoch": 4.2404692082111435, | |
| "grad_norm": 0.931316392486653, | |
| "learning_rate": 2.7090625846549247e-05, | |
| "loss": 0.103, | |
| "mean_token_accuracy": 0.9691152274608612, | |
| "step": 725 | |
| }, | |
| { | |
| "epoch": 4.24633431085044, | |
| "grad_norm": 0.8894668079937573, | |
| "learning_rate": 2.705772391744837e-05, | |
| "loss": 0.1023, | |
| "mean_token_accuracy": 0.9710854515433311, | |
| "step": 726 | |
| }, | |
| { | |
| "epoch": 4.252199413489736, | |
| "grad_norm": 0.886947309997863, | |
| "learning_rate": 2.7024803630852362e-05, | |
| "loss": 0.1117, | |
| "mean_token_accuracy": 0.9700823053717613, | |
| "step": 727 | |
| }, | |
| { | |
| "epoch": 4.258064516129032, | |
| "grad_norm": 0.9113850424773436, | |
| "learning_rate": 2.699186510624856e-05, | |
| "loss": 0.1251, | |
| "mean_token_accuracy": 0.9644127413630486, | |
| "step": 728 | |
| }, | |
| { | |
| "epoch": 4.263929618768328, | |
| "grad_norm": 0.849019273066941, | |
| "learning_rate": 2.6958908463190506e-05, | |
| "loss": 0.1051, | |
| "mean_token_accuracy": 0.9653807803988457, | |
| "step": 729 | |
| }, | |
| { | |
| "epoch": 4.269794721407624, | |
| "grad_norm": 0.8066154131479902, | |
| "learning_rate": 2.6925933821297497e-05, | |
| "loss": 0.1044, | |
| "mean_token_accuracy": 0.9673965945839882, | |
| "step": 730 | |
| }, | |
| { | |
| "epoch": 4.275659824046921, | |
| "grad_norm": 0.9500006710768306, | |
| "learning_rate": 2.6892941300254176e-05, | |
| "loss": 0.1043, | |
| "mean_token_accuracy": 0.9693391621112823, | |
| "step": 731 | |
| }, | |
| { | |
| "epoch": 4.281524926686217, | |
| "grad_norm": 0.8393073675779207, | |
| "learning_rate": 2.685993101981007e-05, | |
| "loss": 0.0986, | |
| "mean_token_accuracy": 0.9713418409228325, | |
| "step": 732 | |
| }, | |
| { | |
| "epoch": 4.287390029325513, | |
| "grad_norm": 0.7433667011265701, | |
| "learning_rate": 2.6826903099779157e-05, | |
| "loss": 0.0919, | |
| "mean_token_accuracy": 0.9730396121740341, | |
| "step": 733 | |
| }, | |
| { | |
| "epoch": 4.293255131964809, | |
| "grad_norm": 1.040976292438877, | |
| "learning_rate": 2.679385766003945e-05, | |
| "loss": 0.1116, | |
| "mean_token_accuracy": 0.9655982628464699, | |
| "step": 734 | |
| }, | |
| { | |
| "epoch": 4.299120234604105, | |
| "grad_norm": 0.7644680704316061, | |
| "learning_rate": 2.676079482053255e-05, | |
| "loss": 0.1094, | |
| "mean_token_accuracy": 0.9659839272499084, | |
| "step": 735 | |
| }, | |
| { | |
| "epoch": 4.3049853372434015, | |
| "grad_norm": 0.864902634731277, | |
| "learning_rate": 2.6727714701263212e-05, | |
| "loss": 0.1011, | |
| "mean_token_accuracy": 0.9684811905026436, | |
| "step": 736 | |
| }, | |
| { | |
| "epoch": 4.310850439882698, | |
| "grad_norm": 0.9630405184292459, | |
| "learning_rate": 2.669461742229891e-05, | |
| "loss": 0.1145, | |
| "mean_token_accuracy": 0.9672865197062492, | |
| "step": 737 | |
| }, | |
| { | |
| "epoch": 4.316715542521994, | |
| "grad_norm": 0.870823765290408, | |
| "learning_rate": 2.6661503103769404e-05, | |
| "loss": 0.1023, | |
| "mean_token_accuracy": 0.9726337790489197, | |
| "step": 738 | |
| }, | |
| { | |
| "epoch": 4.32258064516129, | |
| "grad_norm": 0.9663765543260392, | |
| "learning_rate": 2.6628371865866286e-05, | |
| "loss": 0.1185, | |
| "mean_token_accuracy": 0.965896911919117, | |
| "step": 739 | |
| }, | |
| { | |
| "epoch": 4.328445747800586, | |
| "grad_norm": 0.8933323190339318, | |
| "learning_rate": 2.6595223828842578e-05, | |
| "loss": 0.1087, | |
| "mean_token_accuracy": 0.9675631076097488, | |
| "step": 740 | |
| }, | |
| { | |
| "epoch": 4.334310850439882, | |
| "grad_norm": 0.8062191969539717, | |
| "learning_rate": 2.6562059113012253e-05, | |
| "loss": 0.1017, | |
| "mean_token_accuracy": 0.9707746133208275, | |
| "step": 741 | |
| }, | |
| { | |
| "epoch": 4.340175953079179, | |
| "grad_norm": 0.7386260737331057, | |
| "learning_rate": 2.6528877838749853e-05, | |
| "loss": 0.0901, | |
| "mean_token_accuracy": 0.9732140600681305, | |
| "step": 742 | |
| }, | |
| { | |
| "epoch": 4.346041055718475, | |
| "grad_norm": 0.9540370649859513, | |
| "learning_rate": 2.6495680126489984e-05, | |
| "loss": 0.0996, | |
| "mean_token_accuracy": 0.9677102938294411, | |
| "step": 743 | |
| }, | |
| { | |
| "epoch": 4.351906158357771, | |
| "grad_norm": 0.9249637570107009, | |
| "learning_rate": 2.6462466096726954e-05, | |
| "loss": 0.1219, | |
| "mean_token_accuracy": 0.9667763486504555, | |
| "step": 744 | |
| }, | |
| { | |
| "epoch": 4.357771260997067, | |
| "grad_norm": 0.7792653229818829, | |
| "learning_rate": 2.6429235870014256e-05, | |
| "loss": 0.0903, | |
| "mean_token_accuracy": 0.9705804735422134, | |
| "step": 745 | |
| }, | |
| { | |
| "epoch": 4.363636363636363, | |
| "grad_norm": 0.8140317445393424, | |
| "learning_rate": 2.639598956696421e-05, | |
| "loss": 0.1107, | |
| "mean_token_accuracy": 0.9697221592068672, | |
| "step": 746 | |
| }, | |
| { | |
| "epoch": 4.3695014662756595, | |
| "grad_norm": 0.6129404128094667, | |
| "learning_rate": 2.6362727308247458e-05, | |
| "loss": 0.0846, | |
| "mean_token_accuracy": 0.9690393060445786, | |
| "step": 747 | |
| }, | |
| { | |
| "epoch": 4.375366568914956, | |
| "grad_norm": 0.8363655869915996, | |
| "learning_rate": 2.6329449214592568e-05, | |
| "loss": 0.11, | |
| "mean_token_accuracy": 0.9687171280384064, | |
| "step": 748 | |
| }, | |
| { | |
| "epoch": 4.381231671554252, | |
| "grad_norm": 0.8829130626858406, | |
| "learning_rate": 2.6296155406785578e-05, | |
| "loss": 0.1064, | |
| "mean_token_accuracy": 0.9672866985201836, | |
| "step": 749 | |
| }, | |
| { | |
| "epoch": 4.387096774193548, | |
| "grad_norm": 0.8266185050112379, | |
| "learning_rate": 2.6262846005669572e-05, | |
| "loss": 0.0966, | |
| "mean_token_accuracy": 0.968304455280304, | |
| "step": 750 | |
| }, | |
| { | |
| "epoch": 4.392961876832844, | |
| "grad_norm": 0.7916713310234549, | |
| "learning_rate": 2.6229521132144212e-05, | |
| "loss": 0.0939, | |
| "mean_token_accuracy": 0.9701255336403847, | |
| "step": 751 | |
| }, | |
| { | |
| "epoch": 4.39882697947214, | |
| "grad_norm": 0.7701614711517824, | |
| "learning_rate": 2.619618090716534e-05, | |
| "loss": 0.0943, | |
| "mean_token_accuracy": 0.9703449010848999, | |
| "step": 752 | |
| }, | |
| { | |
| "epoch": 4.404692082111437, | |
| "grad_norm": 0.6968343695166375, | |
| "learning_rate": 2.61628254517445e-05, | |
| "loss": 0.0843, | |
| "mean_token_accuracy": 0.971925251185894, | |
| "step": 753 | |
| }, | |
| { | |
| "epoch": 4.410557184750733, | |
| "grad_norm": 0.9629492868220234, | |
| "learning_rate": 2.612945488694853e-05, | |
| "loss": 0.1134, | |
| "mean_token_accuracy": 0.9676346555352211, | |
| "step": 754 | |
| }, | |
| { | |
| "epoch": 4.416422287390029, | |
| "grad_norm": 0.854167238685368, | |
| "learning_rate": 2.6096069333899094e-05, | |
| "loss": 0.0898, | |
| "mean_token_accuracy": 0.9729126617312431, | |
| "step": 755 | |
| }, | |
| { | |
| "epoch": 4.422287390029325, | |
| "grad_norm": 1.1094574088373235, | |
| "learning_rate": 2.6062668913772275e-05, | |
| "loss": 0.1299, | |
| "mean_token_accuracy": 0.9626433402299881, | |
| "step": 756 | |
| }, | |
| { | |
| "epoch": 4.428152492668621, | |
| "grad_norm": 0.9014170089707675, | |
| "learning_rate": 2.60292537477981e-05, | |
| "loss": 0.1019, | |
| "mean_token_accuracy": 0.9680602997541428, | |
| "step": 757 | |
| }, | |
| { | |
| "epoch": 4.4340175953079175, | |
| "grad_norm": 0.9810775727763513, | |
| "learning_rate": 2.5995823957260132e-05, | |
| "loss": 0.1279, | |
| "mean_token_accuracy": 0.9605071842670441, | |
| "step": 758 | |
| }, | |
| { | |
| "epoch": 4.439882697947214, | |
| "grad_norm": 0.7869832998345235, | |
| "learning_rate": 2.596237966349501e-05, | |
| "loss": 0.0985, | |
| "mean_token_accuracy": 0.9684450924396515, | |
| "step": 759 | |
| }, | |
| { | |
| "epoch": 4.44574780058651, | |
| "grad_norm": 0.7758876264350651, | |
| "learning_rate": 2.592892098789201e-05, | |
| "loss": 0.0865, | |
| "mean_token_accuracy": 0.9744958430528641, | |
| "step": 760 | |
| }, | |
| { | |
| "epoch": 4.451612903225806, | |
| "grad_norm": 0.8860224269922468, | |
| "learning_rate": 2.589544805189261e-05, | |
| "loss": 0.1083, | |
| "mean_token_accuracy": 0.9676248952746391, | |
| "step": 761 | |
| }, | |
| { | |
| "epoch": 4.457478005865102, | |
| "grad_norm": 0.7409093088530903, | |
| "learning_rate": 2.5861960976990056e-05, | |
| "loss": 0.093, | |
| "mean_token_accuracy": 0.9732578694820404, | |
| "step": 762 | |
| }, | |
| { | |
| "epoch": 4.463343108504398, | |
| "grad_norm": 1.1676717217139776, | |
| "learning_rate": 2.5828459884728898e-05, | |
| "loss": 0.1169, | |
| "mean_token_accuracy": 0.9647529572248459, | |
| "step": 763 | |
| }, | |
| { | |
| "epoch": 4.469208211143695, | |
| "grad_norm": 0.6625765738343412, | |
| "learning_rate": 2.5794944896704572e-05, | |
| "loss": 0.0819, | |
| "mean_token_accuracy": 0.9757863134145737, | |
| "step": 764 | |
| }, | |
| { | |
| "epoch": 4.475073313782991, | |
| "grad_norm": 0.6874296342022658, | |
| "learning_rate": 2.5761416134562955e-05, | |
| "loss": 0.0911, | |
| "mean_token_accuracy": 0.9720703139901161, | |
| "step": 765 | |
| }, | |
| { | |
| "epoch": 4.480938416422287, | |
| "grad_norm": 0.7247716827978176, | |
| "learning_rate": 2.5727873719999904e-05, | |
| "loss": 0.093, | |
| "mean_token_accuracy": 0.9728222563862801, | |
| "step": 766 | |
| }, | |
| { | |
| "epoch": 4.486803519061583, | |
| "grad_norm": 0.8452909187598238, | |
| "learning_rate": 2.569431777476084e-05, | |
| "loss": 0.1102, | |
| "mean_token_accuracy": 0.9691123142838478, | |
| "step": 767 | |
| }, | |
| { | |
| "epoch": 4.492668621700879, | |
| "grad_norm": 0.6269007242451082, | |
| "learning_rate": 2.566074842064029e-05, | |
| "loss": 0.0825, | |
| "mean_token_accuracy": 0.9736087173223495, | |
| "step": 768 | |
| }, | |
| { | |
| "epoch": 4.4985337243401755, | |
| "grad_norm": 0.6833634732485709, | |
| "learning_rate": 2.562716577948145e-05, | |
| "loss": 0.0894, | |
| "mean_token_accuracy": 0.9721810445189476, | |
| "step": 769 | |
| }, | |
| { | |
| "epoch": 4.504398826979472, | |
| "grad_norm": 0.9016470000451788, | |
| "learning_rate": 2.5593569973175757e-05, | |
| "loss": 0.099, | |
| "mean_token_accuracy": 0.9684557020664215, | |
| "step": 770 | |
| }, | |
| { | |
| "epoch": 4.510263929618768, | |
| "grad_norm": 0.8214283949891577, | |
| "learning_rate": 2.5559961123662405e-05, | |
| "loss": 0.09, | |
| "mean_token_accuracy": 0.9735552668571472, | |
| "step": 771 | |
| }, | |
| { | |
| "epoch": 4.516129032258064, | |
| "grad_norm": 0.8539772217032783, | |
| "learning_rate": 2.5526339352927956e-05, | |
| "loss": 0.0985, | |
| "mean_token_accuracy": 0.9696103483438492, | |
| "step": 772 | |
| }, | |
| { | |
| "epoch": 4.52199413489736, | |
| "grad_norm": 0.7730054145378401, | |
| "learning_rate": 2.5492704783005847e-05, | |
| "loss": 0.0979, | |
| "mean_token_accuracy": 0.9686698243021965, | |
| "step": 773 | |
| }, | |
| { | |
| "epoch": 4.527859237536656, | |
| "grad_norm": 1.2854864103824832, | |
| "learning_rate": 2.5459057535975985e-05, | |
| "loss": 0.127, | |
| "mean_token_accuracy": 0.9684014022350311, | |
| "step": 774 | |
| }, | |
| { | |
| "epoch": 4.533724340175953, | |
| "grad_norm": 1.0322132404220972, | |
| "learning_rate": 2.542539773396429e-05, | |
| "loss": 0.1147, | |
| "mean_token_accuracy": 0.9624549448490143, | |
| "step": 775 | |
| }, | |
| { | |
| "epoch": 4.539589442815249, | |
| "grad_norm": 1.0925929672195411, | |
| "learning_rate": 2.5391725499142253e-05, | |
| "loss": 0.1274, | |
| "mean_token_accuracy": 0.9652435928583145, | |
| "step": 776 | |
| }, | |
| { | |
| "epoch": 4.545454545454545, | |
| "grad_norm": 0.7645119510316551, | |
| "learning_rate": 2.535804095372648e-05, | |
| "loss": 0.0925, | |
| "mean_token_accuracy": 0.9722094312310219, | |
| "step": 777 | |
| }, | |
| { | |
| "epoch": 4.551319648093841, | |
| "grad_norm": 0.8311326236464569, | |
| "learning_rate": 2.5324344219978273e-05, | |
| "loss": 0.1036, | |
| "mean_token_accuracy": 0.9671364203095436, | |
| "step": 778 | |
| }, | |
| { | |
| "epoch": 4.557184750733137, | |
| "grad_norm": 0.8292493397463394, | |
| "learning_rate": 2.5290635420203162e-05, | |
| "loss": 0.1043, | |
| "mean_token_accuracy": 0.9715534523129463, | |
| "step": 779 | |
| }, | |
| { | |
| "epoch": 4.563049853372434, | |
| "grad_norm": 0.8608211600298957, | |
| "learning_rate": 2.525691467675048e-05, | |
| "loss": 0.1093, | |
| "mean_token_accuracy": 0.9706666171550751, | |
| "step": 780 | |
| }, | |
| { | |
| "epoch": 4.568914956011731, | |
| "grad_norm": 0.6784230146610292, | |
| "learning_rate": 2.5223182112012897e-05, | |
| "loss": 0.0939, | |
| "mean_token_accuracy": 0.9699116349220276, | |
| "step": 781 | |
| }, | |
| { | |
| "epoch": 4.574780058651027, | |
| "grad_norm": 0.7343542604693692, | |
| "learning_rate": 2.5189437848426016e-05, | |
| "loss": 0.0839, | |
| "mean_token_accuracy": 0.9741816371679306, | |
| "step": 782 | |
| }, | |
| { | |
| "epoch": 4.580645161290323, | |
| "grad_norm": 0.8193787549551812, | |
| "learning_rate": 2.515568200846787e-05, | |
| "loss": 0.1126, | |
| "mean_token_accuracy": 0.965341791510582, | |
| "step": 783 | |
| }, | |
| { | |
| "epoch": 4.586510263929619, | |
| "grad_norm": 0.7686994711379227, | |
| "learning_rate": 2.5121914714658526e-05, | |
| "loss": 0.1026, | |
| "mean_token_accuracy": 0.9705801084637642, | |
| "step": 784 | |
| }, | |
| { | |
| "epoch": 4.592375366568915, | |
| "grad_norm": 0.6846639175891126, | |
| "learning_rate": 2.5088136089559636e-05, | |
| "loss": 0.0881, | |
| "mean_token_accuracy": 0.9729414209723473, | |
| "step": 785 | |
| }, | |
| { | |
| "epoch": 4.5982404692082115, | |
| "grad_norm": 0.6457900343034328, | |
| "learning_rate": 2.5054346255773952e-05, | |
| "loss": 0.0814, | |
| "mean_token_accuracy": 0.9756535366177559, | |
| "step": 786 | |
| }, | |
| { | |
| "epoch": 4.604105571847508, | |
| "grad_norm": 0.8155908776106904, | |
| "learning_rate": 2.502054533594493e-05, | |
| "loss": 0.0949, | |
| "mean_token_accuracy": 0.9718875661492348, | |
| "step": 787 | |
| }, | |
| { | |
| "epoch": 4.609970674486804, | |
| "grad_norm": 0.7295520284793742, | |
| "learning_rate": 2.4986733452756264e-05, | |
| "loss": 0.0939, | |
| "mean_token_accuracy": 0.9746653810143471, | |
| "step": 788 | |
| }, | |
| { | |
| "epoch": 4.6158357771261, | |
| "grad_norm": 0.7882592432293024, | |
| "learning_rate": 2.495291072893142e-05, | |
| "loss": 0.1069, | |
| "mean_token_accuracy": 0.9730554893612862, | |
| "step": 789 | |
| }, | |
| { | |
| "epoch": 4.621700879765396, | |
| "grad_norm": 0.7762536144917583, | |
| "learning_rate": 2.4919077287233237e-05, | |
| "loss": 0.1011, | |
| "mean_token_accuracy": 0.9697251468896866, | |
| "step": 790 | |
| }, | |
| { | |
| "epoch": 4.627565982404692, | |
| "grad_norm": 0.7501153105236112, | |
| "learning_rate": 2.4885233250463445e-05, | |
| "loss": 0.0981, | |
| "mean_token_accuracy": 0.971202477812767, | |
| "step": 791 | |
| }, | |
| { | |
| "epoch": 4.633431085043989, | |
| "grad_norm": 0.7971158503889687, | |
| "learning_rate": 2.485137874146222e-05, | |
| "loss": 0.1001, | |
| "mean_token_accuracy": 0.9674383029341698, | |
| "step": 792 | |
| }, | |
| { | |
| "epoch": 4.639296187683285, | |
| "grad_norm": 0.7887660045681235, | |
| "learning_rate": 2.4817513883107762e-05, | |
| "loss": 0.1037, | |
| "mean_token_accuracy": 0.9634637236595154, | |
| "step": 793 | |
| }, | |
| { | |
| "epoch": 4.645161290322581, | |
| "grad_norm": 0.8063096426550572, | |
| "learning_rate": 2.4783638798315822e-05, | |
| "loss": 0.0992, | |
| "mean_token_accuracy": 0.9716535285115242, | |
| "step": 794 | |
| }, | |
| { | |
| "epoch": 4.651026392961877, | |
| "grad_norm": 0.8102967123391493, | |
| "learning_rate": 2.4749753610039288e-05, | |
| "loss": 0.0846, | |
| "mean_token_accuracy": 0.9733820334076881, | |
| "step": 795 | |
| }, | |
| { | |
| "epoch": 4.656891495601173, | |
| "grad_norm": 0.734265742655663, | |
| "learning_rate": 2.4715858441267706e-05, | |
| "loss": 0.0932, | |
| "mean_token_accuracy": 0.9707584381103516, | |
| "step": 796 | |
| }, | |
| { | |
| "epoch": 4.6627565982404695, | |
| "grad_norm": 1.0708816496376599, | |
| "learning_rate": 2.4681953415026845e-05, | |
| "loss": 0.1151, | |
| "mean_token_accuracy": 0.9674431756138802, | |
| "step": 797 | |
| }, | |
| { | |
| "epoch": 4.668621700879766, | |
| "grad_norm": 0.7632236114224711, | |
| "learning_rate": 2.464803865437826e-05, | |
| "loss": 0.0938, | |
| "mean_token_accuracy": 0.96937146037817, | |
| "step": 798 | |
| }, | |
| { | |
| "epoch": 4.674486803519062, | |
| "grad_norm": 1.0123704280545058, | |
| "learning_rate": 2.461411428241883e-05, | |
| "loss": 0.1211, | |
| "mean_token_accuracy": 0.9677753895521164, | |
| "step": 799 | |
| }, | |
| { | |
| "epoch": 4.680351906158358, | |
| "grad_norm": 0.7585877502897928, | |
| "learning_rate": 2.4580180422280325e-05, | |
| "loss": 0.0974, | |
| "mean_token_accuracy": 0.9710182920098305, | |
| "step": 800 | |
| }, | |
| { | |
| "epoch": 4.686217008797654, | |
| "grad_norm": 0.7599802618930964, | |
| "learning_rate": 2.4546237197128955e-05, | |
| "loss": 0.0964, | |
| "mean_token_accuracy": 0.9718500971794128, | |
| "step": 801 | |
| }, | |
| { | |
| "epoch": 4.69208211143695, | |
| "grad_norm": 0.8785070865398006, | |
| "learning_rate": 2.451228473016492e-05, | |
| "loss": 0.0983, | |
| "mean_token_accuracy": 0.9714755862951279, | |
| "step": 802 | |
| }, | |
| { | |
| "epoch": 4.697947214076247, | |
| "grad_norm": 0.7923717791867461, | |
| "learning_rate": 2.447832314462196e-05, | |
| "loss": 0.1023, | |
| "mean_token_accuracy": 0.9698025360703468, | |
| "step": 803 | |
| }, | |
| { | |
| "epoch": 4.703812316715543, | |
| "grad_norm": 0.6198673256208855, | |
| "learning_rate": 2.444435256376692e-05, | |
| "loss": 0.0874, | |
| "mean_token_accuracy": 0.9722053855657578, | |
| "step": 804 | |
| }, | |
| { | |
| "epoch": 4.709677419354839, | |
| "grad_norm": 0.7258817559918743, | |
| "learning_rate": 2.4410373110899278e-05, | |
| "loss": 0.0772, | |
| "mean_token_accuracy": 0.9761781170964241, | |
| "step": 805 | |
| }, | |
| { | |
| "epoch": 4.715542521994135, | |
| "grad_norm": 0.9172924248024136, | |
| "learning_rate": 2.4376384909350735e-05, | |
| "loss": 0.1043, | |
| "mean_token_accuracy": 0.9676323756575584, | |
| "step": 806 | |
| }, | |
| { | |
| "epoch": 4.721407624633431, | |
| "grad_norm": 0.7471141708163742, | |
| "learning_rate": 2.434238808248472e-05, | |
| "loss": 0.0947, | |
| "mean_token_accuracy": 0.9704129174351692, | |
| "step": 807 | |
| }, | |
| { | |
| "epoch": 4.7272727272727275, | |
| "grad_norm": 0.9028932892020963, | |
| "learning_rate": 2.4308382753696e-05, | |
| "loss": 0.1092, | |
| "mean_token_accuracy": 0.9671567976474762, | |
| "step": 808 | |
| }, | |
| { | |
| "epoch": 4.733137829912024, | |
| "grad_norm": 0.9404310972887582, | |
| "learning_rate": 2.4274369046410183e-05, | |
| "loss": 0.1148, | |
| "mean_token_accuracy": 0.969983272254467, | |
| "step": 809 | |
| }, | |
| { | |
| "epoch": 4.73900293255132, | |
| "grad_norm": 0.7335061132763945, | |
| "learning_rate": 2.4240347084083284e-05, | |
| "loss": 0.0909, | |
| "mean_token_accuracy": 0.9720699489116669, | |
| "step": 810 | |
| }, | |
| { | |
| "epoch": 4.744868035190616, | |
| "grad_norm": 0.8135650291115837, | |
| "learning_rate": 2.4206316990201288e-05, | |
| "loss": 0.1011, | |
| "mean_token_accuracy": 0.9665744081139565, | |
| "step": 811 | |
| }, | |
| { | |
| "epoch": 4.750733137829912, | |
| "grad_norm": 1.006450006617362, | |
| "learning_rate": 2.4172278888279686e-05, | |
| "loss": 0.1203, | |
| "mean_token_accuracy": 0.9652376249432564, | |
| "step": 812 | |
| }, | |
| { | |
| "epoch": 4.756598240469208, | |
| "grad_norm": 0.8383972213721953, | |
| "learning_rate": 2.4138232901863053e-05, | |
| "loss": 0.0988, | |
| "mean_token_accuracy": 0.9674033001065254, | |
| "step": 813 | |
| }, | |
| { | |
| "epoch": 4.762463343108505, | |
| "grad_norm": 0.7477269136479021, | |
| "learning_rate": 2.4104179154524557e-05, | |
| "loss": 0.0794, | |
| "mean_token_accuracy": 0.9763186648488045, | |
| "step": 814 | |
| }, | |
| { | |
| "epoch": 4.768328445747801, | |
| "grad_norm": 0.7443883106412051, | |
| "learning_rate": 2.4070117769865554e-05, | |
| "loss": 0.0948, | |
| "mean_token_accuracy": 0.9707931205630302, | |
| "step": 815 | |
| }, | |
| { | |
| "epoch": 4.774193548387097, | |
| "grad_norm": 0.764340980807826, | |
| "learning_rate": 2.403604887151512e-05, | |
| "loss": 0.0972, | |
| "mean_token_accuracy": 0.9693472981452942, | |
| "step": 816 | |
| }, | |
| { | |
| "epoch": 4.780058651026393, | |
| "grad_norm": 0.9296757955451029, | |
| "learning_rate": 2.400197258312959e-05, | |
| "loss": 0.085, | |
| "mean_token_accuracy": 0.9715523645281792, | |
| "step": 817 | |
| }, | |
| { | |
| "epoch": 4.785923753665689, | |
| "grad_norm": 0.757736033011415, | |
| "learning_rate": 2.3967889028392115e-05, | |
| "loss": 0.0939, | |
| "mean_token_accuracy": 0.9713739231228828, | |
| "step": 818 | |
| }, | |
| { | |
| "epoch": 4.7917888563049855, | |
| "grad_norm": 0.8872201408127831, | |
| "learning_rate": 2.3933798331012255e-05, | |
| "loss": 0.1071, | |
| "mean_token_accuracy": 0.9671370834112167, | |
| "step": 819 | |
| }, | |
| { | |
| "epoch": 4.797653958944282, | |
| "grad_norm": 0.7207454265279954, | |
| "learning_rate": 2.3899700614725458e-05, | |
| "loss": 0.097, | |
| "mean_token_accuracy": 0.9650105834007263, | |
| "step": 820 | |
| }, | |
| { | |
| "epoch": 4.803519061583578, | |
| "grad_norm": 0.7908055653055925, | |
| "learning_rate": 2.3865596003292674e-05, | |
| "loss": 0.1028, | |
| "mean_token_accuracy": 0.9701355323195457, | |
| "step": 821 | |
| }, | |
| { | |
| "epoch": 4.809384164222874, | |
| "grad_norm": 0.6486808078141573, | |
| "learning_rate": 2.3831484620499867e-05, | |
| "loss": 0.0952, | |
| "mean_token_accuracy": 0.9729474484920502, | |
| "step": 822 | |
| }, | |
| { | |
| "epoch": 4.81524926686217, | |
| "grad_norm": 0.9229266535389385, | |
| "learning_rate": 2.3797366590157565e-05, | |
| "loss": 0.1095, | |
| "mean_token_accuracy": 0.9646266847848892, | |
| "step": 823 | |
| }, | |
| { | |
| "epoch": 4.821114369501466, | |
| "grad_norm": 0.9127539327987895, | |
| "learning_rate": 2.3763242036100457e-05, | |
| "loss": 0.0988, | |
| "mean_token_accuracy": 0.9674626067280769, | |
| "step": 824 | |
| }, | |
| { | |
| "epoch": 4.826979472140763, | |
| "grad_norm": 0.8258037643370993, | |
| "learning_rate": 2.372911108218688e-05, | |
| "loss": 0.0933, | |
| "mean_token_accuracy": 0.9713954329490662, | |
| "step": 825 | |
| }, | |
| { | |
| "epoch": 4.832844574780059, | |
| "grad_norm": 0.9162371634383302, | |
| "learning_rate": 2.3694973852298425e-05, | |
| "loss": 0.1043, | |
| "mean_token_accuracy": 0.9656035676598549, | |
| "step": 826 | |
| }, | |
| { | |
| "epoch": 4.838709677419355, | |
| "grad_norm": 0.684629551090319, | |
| "learning_rate": 2.3660830470339436e-05, | |
| "loss": 0.0896, | |
| "mean_token_accuracy": 0.9730357006192207, | |
| "step": 827 | |
| }, | |
| { | |
| "epoch": 4.844574780058651, | |
| "grad_norm": 0.8047131016250728, | |
| "learning_rate": 2.362668106023661e-05, | |
| "loss": 0.0991, | |
| "mean_token_accuracy": 0.9705889150500298, | |
| "step": 828 | |
| }, | |
| { | |
| "epoch": 4.850439882697947, | |
| "grad_norm": 0.8042236766203159, | |
| "learning_rate": 2.3592525745938515e-05, | |
| "loss": 0.0932, | |
| "mean_token_accuracy": 0.9739318490028381, | |
| "step": 829 | |
| }, | |
| { | |
| "epoch": 4.8563049853372435, | |
| "grad_norm": 0.6823089370978507, | |
| "learning_rate": 2.355836465141513e-05, | |
| "loss": 0.0809, | |
| "mean_token_accuracy": 0.975515104830265, | |
| "step": 830 | |
| }, | |
| { | |
| "epoch": 4.86217008797654, | |
| "grad_norm": 1.056676551862205, | |
| "learning_rate": 2.3524197900657447e-05, | |
| "loss": 0.1181, | |
| "mean_token_accuracy": 0.9661965668201447, | |
| "step": 831 | |
| }, | |
| { | |
| "epoch": 4.868035190615836, | |
| "grad_norm": 0.6719408329574679, | |
| "learning_rate": 2.3490025617676966e-05, | |
| "loss": 0.0788, | |
| "mean_token_accuracy": 0.9769859835505486, | |
| "step": 832 | |
| }, | |
| { | |
| "epoch": 4.873900293255132, | |
| "grad_norm": 0.9361482521021528, | |
| "learning_rate": 2.3455847926505283e-05, | |
| "loss": 0.1215, | |
| "mean_token_accuracy": 0.9638217911124229, | |
| "step": 833 | |
| }, | |
| { | |
| "epoch": 4.879765395894428, | |
| "grad_norm": 0.9148451630035029, | |
| "learning_rate": 2.3421664951193596e-05, | |
| "loss": 0.1082, | |
| "mean_token_accuracy": 0.9696593731641769, | |
| "step": 834 | |
| }, | |
| { | |
| "epoch": 4.885630498533724, | |
| "grad_norm": 0.7778350466276233, | |
| "learning_rate": 2.3387476815812313e-05, | |
| "loss": 0.1035, | |
| "mean_token_accuracy": 0.9713735282421112, | |
| "step": 835 | |
| }, | |
| { | |
| "epoch": 4.891495601173021, | |
| "grad_norm": 0.861960176959738, | |
| "learning_rate": 2.3353283644450556e-05, | |
| "loss": 0.1102, | |
| "mean_token_accuracy": 0.9678222835063934, | |
| "step": 836 | |
| }, | |
| { | |
| "epoch": 4.897360703812317, | |
| "grad_norm": 0.8318096305591833, | |
| "learning_rate": 2.3319085561215724e-05, | |
| "loss": 0.1018, | |
| "mean_token_accuracy": 0.9691039249300957, | |
| "step": 837 | |
| }, | |
| { | |
| "epoch": 4.903225806451613, | |
| "grad_norm": 0.7242772575603745, | |
| "learning_rate": 2.328488269023305e-05, | |
| "loss": 0.0913, | |
| "mean_token_accuracy": 0.9706755951046944, | |
| "step": 838 | |
| }, | |
| { | |
| "epoch": 4.909090909090909, | |
| "grad_norm": 0.6533623048792406, | |
| "learning_rate": 2.3250675155645136e-05, | |
| "loss": 0.0822, | |
| "mean_token_accuracy": 0.9732807353138924, | |
| "step": 839 | |
| }, | |
| { | |
| "epoch": 4.914956011730205, | |
| "grad_norm": 0.6998592106148985, | |
| "learning_rate": 2.3216463081611525e-05, | |
| "loss": 0.0806, | |
| "mean_token_accuracy": 0.9738926067948341, | |
| "step": 840 | |
| }, | |
| { | |
| "epoch": 4.9208211143695015, | |
| "grad_norm": 0.8997130023440117, | |
| "learning_rate": 2.3182246592308235e-05, | |
| "loss": 0.1088, | |
| "mean_token_accuracy": 0.9683014452457428, | |
| "step": 841 | |
| }, | |
| { | |
| "epoch": 4.926686217008798, | |
| "grad_norm": 0.8073979598994243, | |
| "learning_rate": 2.314802581192728e-05, | |
| "loss": 0.0938, | |
| "mean_token_accuracy": 0.96952273696661, | |
| "step": 842 | |
| }, | |
| { | |
| "epoch": 4.932551319648094, | |
| "grad_norm": 0.9604499998077138, | |
| "learning_rate": 2.311380086467629e-05, | |
| "loss": 0.1168, | |
| "mean_token_accuracy": 0.9669440165162086, | |
| "step": 843 | |
| }, | |
| { | |
| "epoch": 4.93841642228739, | |
| "grad_norm": 0.70913935151314, | |
| "learning_rate": 2.3079571874778e-05, | |
| "loss": 0.0996, | |
| "mean_token_accuracy": 0.9702354818582535, | |
| "step": 844 | |
| }, | |
| { | |
| "epoch": 4.944281524926686, | |
| "grad_norm": 0.8614283401684019, | |
| "learning_rate": 2.304533896646981e-05, | |
| "loss": 0.0965, | |
| "mean_token_accuracy": 0.9706253558397293, | |
| "step": 845 | |
| }, | |
| { | |
| "epoch": 4.9501466275659824, | |
| "grad_norm": 0.6715033958091532, | |
| "learning_rate": 2.3011102264003354e-05, | |
| "loss": 0.0902, | |
| "mean_token_accuracy": 0.9727234393358231, | |
| "step": 846 | |
| }, | |
| { | |
| "epoch": 4.956011730205279, | |
| "grad_norm": 0.8437834609592847, | |
| "learning_rate": 2.2976861891644045e-05, | |
| "loss": 0.0899, | |
| "mean_token_accuracy": 0.9689526185393333, | |
| "step": 847 | |
| }, | |
| { | |
| "epoch": 4.961876832844575, | |
| "grad_norm": 0.7677466012630041, | |
| "learning_rate": 2.2942617973670596e-05, | |
| "loss": 0.091, | |
| "mean_token_accuracy": 0.9746912643313408, | |
| "step": 848 | |
| }, | |
| { | |
| "epoch": 4.967741935483871, | |
| "grad_norm": 1.3044373985496345, | |
| "learning_rate": 2.2908370634374603e-05, | |
| "loss": 0.1369, | |
| "mean_token_accuracy": 0.9604402333498001, | |
| "step": 849 | |
| }, | |
| { | |
| "epoch": 4.973607038123167, | |
| "grad_norm": 0.6899750785431267, | |
| "learning_rate": 2.287411999806007e-05, | |
| "loss": 0.0886, | |
| "mean_token_accuracy": 0.9728878736495972, | |
| "step": 850 | |
| }, | |
| { | |
| "epoch": 4.979472140762463, | |
| "grad_norm": 0.8347658349866184, | |
| "learning_rate": 2.2839866189042983e-05, | |
| "loss": 0.0935, | |
| "mean_token_accuracy": 0.9728017598390579, | |
| "step": 851 | |
| }, | |
| { | |
| "epoch": 4.9853372434017595, | |
| "grad_norm": 0.7517247253149162, | |
| "learning_rate": 2.2805609331650826e-05, | |
| "loss": 0.1027, | |
| "mean_token_accuracy": 0.9682365879416466, | |
| "step": 852 | |
| }, | |
| { | |
| "epoch": 4.991202346041056, | |
| "grad_norm": 0.7521709150117921, | |
| "learning_rate": 2.2771349550222158e-05, | |
| "loss": 0.0906, | |
| "mean_token_accuracy": 0.9731875956058502, | |
| "step": 853 | |
| }, | |
| { | |
| "epoch": 4.997067448680352, | |
| "grad_norm": 0.7112238021631059, | |
| "learning_rate": 2.273708696910616e-05, | |
| "loss": 0.0862, | |
| "mean_token_accuracy": 0.973906010389328, | |
| "step": 854 | |
| }, | |
| { | |
| "epoch": 5.0, | |
| "grad_norm": 0.7112238021631059, | |
| "learning_rate": 2.2702821712662147e-05, | |
| "loss": 0.0764, | |
| "mean_token_accuracy": 0.9778192937374115, | |
| "step": 855 | |
| }, | |
| { | |
| "epoch": 5.005865102639296, | |
| "grad_norm": 1.0128576576804946, | |
| "learning_rate": 2.2668553905259168e-05, | |
| "loss": 0.0795, | |
| "mean_token_accuracy": 0.974254384636879, | |
| "step": 856 | |
| }, | |
| { | |
| "epoch": 5.011730205278592, | |
| "grad_norm": 0.608267020682671, | |
| "learning_rate": 2.2634283671275523e-05, | |
| "loss": 0.0698, | |
| "mean_token_accuracy": 0.9795678928494453, | |
| "step": 857 | |
| }, | |
| { | |
| "epoch": 5.0175953079178885, | |
| "grad_norm": 0.5622048670870978, | |
| "learning_rate": 2.2600011135098323e-05, | |
| "loss": 0.0744, | |
| "mean_token_accuracy": 0.9774943068623543, | |
| "step": 858 | |
| }, | |
| { | |
| "epoch": 5.023460410557185, | |
| "grad_norm": 0.5921735306162148, | |
| "learning_rate": 2.2565736421123035e-05, | |
| "loss": 0.0853, | |
| "mean_token_accuracy": 0.97476976364851, | |
| "step": 859 | |
| }, | |
| { | |
| "epoch": 5.029325513196481, | |
| "grad_norm": 0.7814112686967922, | |
| "learning_rate": 2.253145965375302e-05, | |
| "loss": 0.1095, | |
| "mean_token_accuracy": 0.9669553339481354, | |
| "step": 860 | |
| }, | |
| { | |
| "epoch": 5.035190615835777, | |
| "grad_norm": 0.7427252443009537, | |
| "learning_rate": 2.2497180957399108e-05, | |
| "loss": 0.0905, | |
| "mean_token_accuracy": 0.9688437804579735, | |
| "step": 861 | |
| }, | |
| { | |
| "epoch": 5.041055718475073, | |
| "grad_norm": 0.7799570398100676, | |
| "learning_rate": 2.246290045647912e-05, | |
| "loss": 0.0756, | |
| "mean_token_accuracy": 0.9773503243923187, | |
| "step": 862 | |
| }, | |
| { | |
| "epoch": 5.0469208211143695, | |
| "grad_norm": 0.5605487780130354, | |
| "learning_rate": 2.242861827541742e-05, | |
| "loss": 0.069, | |
| "mean_token_accuracy": 0.9778739288449287, | |
| "step": 863 | |
| }, | |
| { | |
| "epoch": 5.052785923753666, | |
| "grad_norm": 0.5680136427773851, | |
| "learning_rate": 2.2394334538644494e-05, | |
| "loss": 0.0842, | |
| "mean_token_accuracy": 0.9757679551839828, | |
| "step": 864 | |
| }, | |
| { | |
| "epoch": 5.058651026392962, | |
| "grad_norm": 0.8673912297345526, | |
| "learning_rate": 2.2360049370596454e-05, | |
| "loss": 0.0867, | |
| "mean_token_accuracy": 0.9756646528840065, | |
| "step": 865 | |
| }, | |
| { | |
| "epoch": 5.064516129032258, | |
| "grad_norm": 0.7257313253397972, | |
| "learning_rate": 2.2325762895714616e-05, | |
| "loss": 0.0742, | |
| "mean_token_accuracy": 0.9739532098174095, | |
| "step": 866 | |
| }, | |
| { | |
| "epoch": 5.070381231671554, | |
| "grad_norm": 0.7275380843304412, | |
| "learning_rate": 2.2291475238445033e-05, | |
| "loss": 0.0906, | |
| "mean_token_accuracy": 0.9731993973255157, | |
| "step": 867 | |
| }, | |
| { | |
| "epoch": 5.07624633431085, | |
| "grad_norm": 0.7539545393297449, | |
| "learning_rate": 2.225718652323805e-05, | |
| "loss": 0.0816, | |
| "mean_token_accuracy": 0.9701853692531586, | |
| "step": 868 | |
| }, | |
| { | |
| "epoch": 5.0821114369501466, | |
| "grad_norm": 0.8405685160148454, | |
| "learning_rate": 2.2222896874547856e-05, | |
| "loss": 0.0983, | |
| "mean_token_accuracy": 0.9731288999319077, | |
| "step": 869 | |
| }, | |
| { | |
| "epoch": 5.087976539589443, | |
| "grad_norm": 0.7993482343128961, | |
| "learning_rate": 2.2188606416832035e-05, | |
| "loss": 0.0796, | |
| "mean_token_accuracy": 0.977057509124279, | |
| "step": 870 | |
| }, | |
| { | |
| "epoch": 5.093841642228739, | |
| "grad_norm": 0.8172411599541392, | |
| "learning_rate": 2.2154315274551093e-05, | |
| "loss": 0.0889, | |
| "mean_token_accuracy": 0.9734968990087509, | |
| "step": 871 | |
| }, | |
| { | |
| "epoch": 5.099706744868035, | |
| "grad_norm": 0.6520924706945902, | |
| "learning_rate": 2.2120023572168026e-05, | |
| "loss": 0.0796, | |
| "mean_token_accuracy": 0.97675970941782, | |
| "step": 872 | |
| }, | |
| { | |
| "epoch": 5.105571847507331, | |
| "grad_norm": 0.6628656863287348, | |
| "learning_rate": 2.208573143414787e-05, | |
| "loss": 0.0715, | |
| "mean_token_accuracy": 0.9797274470329285, | |
| "step": 873 | |
| }, | |
| { | |
| "epoch": 5.1114369501466275, | |
| "grad_norm": 0.5465670696067368, | |
| "learning_rate": 2.2051438984957234e-05, | |
| "loss": 0.0806, | |
| "mean_token_accuracy": 0.9757656827569008, | |
| "step": 874 | |
| }, | |
| { | |
| "epoch": 5.117302052785924, | |
| "grad_norm": 0.7747976458168354, | |
| "learning_rate": 2.2017146349063855e-05, | |
| "loss": 0.0903, | |
| "mean_token_accuracy": 0.9750406444072723, | |
| "step": 875 | |
| }, | |
| { | |
| "epoch": 5.12316715542522, | |
| "grad_norm": 0.6154236649307834, | |
| "learning_rate": 2.1982853650936154e-05, | |
| "loss": 0.0835, | |
| "mean_token_accuracy": 0.9757610484957695, | |
| "step": 876 | |
| }, | |
| { | |
| "epoch": 5.129032258064516, | |
| "grad_norm": 0.6386021245757174, | |
| "learning_rate": 2.1948561015042772e-05, | |
| "loss": 0.0849, | |
| "mean_token_accuracy": 0.9769806042313576, | |
| "step": 877 | |
| }, | |
| { | |
| "epoch": 5.134897360703812, | |
| "grad_norm": 0.8485634110906592, | |
| "learning_rate": 2.1914268565852134e-05, | |
| "loss": 0.1005, | |
| "mean_token_accuracy": 0.9702042117714882, | |
| "step": 878 | |
| }, | |
| { | |
| "epoch": 5.140762463343108, | |
| "grad_norm": 0.6417040051521462, | |
| "learning_rate": 2.1879976427831983e-05, | |
| "loss": 0.0811, | |
| "mean_token_accuracy": 0.9777021408081055, | |
| "step": 879 | |
| }, | |
| { | |
| "epoch": 5.146627565982405, | |
| "grad_norm": 0.9496873079801262, | |
| "learning_rate": 2.1845684725448916e-05, | |
| "loss": 0.0986, | |
| "mean_token_accuracy": 0.9698187112808228, | |
| "step": 880 | |
| }, | |
| { | |
| "epoch": 5.152492668621701, | |
| "grad_norm": 0.7318435813184204, | |
| "learning_rate": 2.181139358316797e-05, | |
| "loss": 0.0819, | |
| "mean_token_accuracy": 0.9735628068447113, | |
| "step": 881 | |
| }, | |
| { | |
| "epoch": 5.158357771260997, | |
| "grad_norm": 0.7132143176614368, | |
| "learning_rate": 2.1777103125452146e-05, | |
| "loss": 0.0858, | |
| "mean_token_accuracy": 0.9724550470709801, | |
| "step": 882 | |
| }, | |
| { | |
| "epoch": 5.164222873900293, | |
| "grad_norm": 0.8291763986470454, | |
| "learning_rate": 2.1742813476761958e-05, | |
| "loss": 0.0995, | |
| "mean_token_accuracy": 0.9693228304386139, | |
| "step": 883 | |
| }, | |
| { | |
| "epoch": 5.170087976539589, | |
| "grad_norm": 0.7365191888977923, | |
| "learning_rate": 2.1708524761554973e-05, | |
| "loss": 0.0858, | |
| "mean_token_accuracy": 0.9737670198082924, | |
| "step": 884 | |
| }, | |
| { | |
| "epoch": 5.1759530791788855, | |
| "grad_norm": 0.6229037109477557, | |
| "learning_rate": 2.1674237104285393e-05, | |
| "loss": 0.0784, | |
| "mean_token_accuracy": 0.975468099117279, | |
| "step": 885 | |
| }, | |
| { | |
| "epoch": 5.181818181818182, | |
| "grad_norm": 0.5698465859203371, | |
| "learning_rate": 2.1639950629403552e-05, | |
| "loss": 0.0666, | |
| "mean_token_accuracy": 0.9791614338755608, | |
| "step": 886 | |
| }, | |
| { | |
| "epoch": 5.187683284457478, | |
| "grad_norm": 0.5736934209439896, | |
| "learning_rate": 2.1605665461355515e-05, | |
| "loss": 0.0799, | |
| "mean_token_accuracy": 0.975306861102581, | |
| "step": 887 | |
| }, | |
| { | |
| "epoch": 5.193548387096774, | |
| "grad_norm": 0.642707306614463, | |
| "learning_rate": 2.1571381724582588e-05, | |
| "loss": 0.0951, | |
| "mean_token_accuracy": 0.9715821146965027, | |
| "step": 888 | |
| }, | |
| { | |
| "epoch": 5.19941348973607, | |
| "grad_norm": 0.9214123192091717, | |
| "learning_rate": 2.153709954352089e-05, | |
| "loss": 0.0925, | |
| "mean_token_accuracy": 0.973389632999897, | |
| "step": 889 | |
| }, | |
| { | |
| "epoch": 5.205278592375366, | |
| "grad_norm": 0.754059027278609, | |
| "learning_rate": 2.15028190426009e-05, | |
| "loss": 0.0902, | |
| "mean_token_accuracy": 0.9721227064728737, | |
| "step": 890 | |
| }, | |
| { | |
| "epoch": 5.211143695014663, | |
| "grad_norm": 0.6893716853875429, | |
| "learning_rate": 2.1468540346246986e-05, | |
| "loss": 0.0804, | |
| "mean_token_accuracy": 0.9698435142636299, | |
| "step": 891 | |
| }, | |
| { | |
| "epoch": 5.217008797653959, | |
| "grad_norm": 0.6795703629568569, | |
| "learning_rate": 2.143426357887697e-05, | |
| "loss": 0.0915, | |
| "mean_token_accuracy": 0.9739984050393105, | |
| "step": 892 | |
| }, | |
| { | |
| "epoch": 5.222873900293255, | |
| "grad_norm": 0.7702418917677822, | |
| "learning_rate": 2.139998886490169e-05, | |
| "loss": 0.0784, | |
| "mean_token_accuracy": 0.9761215299367905, | |
| "step": 893 | |
| }, | |
| { | |
| "epoch": 5.228739002932551, | |
| "grad_norm": 0.6046235507122676, | |
| "learning_rate": 2.136571632872449e-05, | |
| "loss": 0.0889, | |
| "mean_token_accuracy": 0.9733590260148048, | |
| "step": 894 | |
| }, | |
| { | |
| "epoch": 5.234604105571847, | |
| "grad_norm": 0.8257469175456973, | |
| "learning_rate": 2.1331446094740845e-05, | |
| "loss": 0.1044, | |
| "mean_token_accuracy": 0.9706998839974403, | |
| "step": 895 | |
| }, | |
| { | |
| "epoch": 5.2404692082111435, | |
| "grad_norm": 0.9294165820978288, | |
| "learning_rate": 2.1297178287337865e-05, | |
| "loss": 0.0878, | |
| "mean_token_accuracy": 0.972227543592453, | |
| "step": 896 | |
| }, | |
| { | |
| "epoch": 5.24633431085044, | |
| "grad_norm": 0.6893445425870071, | |
| "learning_rate": 2.1262913030893855e-05, | |
| "loss": 0.1045, | |
| "mean_token_accuracy": 0.9695899188518524, | |
| "step": 897 | |
| }, | |
| { | |
| "epoch": 5.252199413489736, | |
| "grad_norm": 1.104947276779602, | |
| "learning_rate": 2.1228650449777848e-05, | |
| "loss": 0.0982, | |
| "mean_token_accuracy": 0.9720256999135017, | |
| "step": 898 | |
| }, | |
| { | |
| "epoch": 5.258064516129032, | |
| "grad_norm": 0.7524610114511344, | |
| "learning_rate": 2.1194390668349186e-05, | |
| "loss": 0.0935, | |
| "mean_token_accuracy": 0.9729163721203804, | |
| "step": 899 | |
| }, | |
| { | |
| "epoch": 5.263929618768328, | |
| "grad_norm": 0.7362606630635441, | |
| "learning_rate": 2.116013381095703e-05, | |
| "loss": 0.0777, | |
| "mean_token_accuracy": 0.9786986038088799, | |
| "step": 900 | |
| }, | |
| { | |
| "epoch": 5.269794721407624, | |
| "grad_norm": 0.5211179939843894, | |
| "learning_rate": 2.112588000193994e-05, | |
| "loss": 0.0791, | |
| "mean_token_accuracy": 0.9741416275501251, | |
| "step": 901 | |
| }, | |
| { | |
| "epoch": 5.275659824046921, | |
| "grad_norm": 0.7209693940538987, | |
| "learning_rate": 2.1091629365625403e-05, | |
| "loss": 0.0706, | |
| "mean_token_accuracy": 0.9762564152479172, | |
| "step": 902 | |
| }, | |
| { | |
| "epoch": 5.281524926686217, | |
| "grad_norm": 0.7741807909355801, | |
| "learning_rate": 2.105738202632941e-05, | |
| "loss": 0.0993, | |
| "mean_token_accuracy": 0.9709812626242638, | |
| "step": 903 | |
| }, | |
| { | |
| "epoch": 5.287390029325513, | |
| "grad_norm": 0.6747396182625481, | |
| "learning_rate": 2.1023138108355957e-05, | |
| "loss": 0.0713, | |
| "mean_token_accuracy": 0.9794772490859032, | |
| "step": 904 | |
| }, | |
| { | |
| "epoch": 5.293255131964809, | |
| "grad_norm": 0.5803581197055916, | |
| "learning_rate": 2.098889773599665e-05, | |
| "loss": 0.0934, | |
| "mean_token_accuracy": 0.9710723906755447, | |
| "step": 905 | |
| }, | |
| { | |
| "epoch": 5.299120234604105, | |
| "grad_norm": 0.7312865747989487, | |
| "learning_rate": 2.0954661033530193e-05, | |
| "loss": 0.0694, | |
| "mean_token_accuracy": 0.9799710288643837, | |
| "step": 906 | |
| }, | |
| { | |
| "epoch": 5.3049853372434015, | |
| "grad_norm": 0.6023379146658334, | |
| "learning_rate": 2.0920428125222004e-05, | |
| "loss": 0.0844, | |
| "mean_token_accuracy": 0.9746310263872147, | |
| "step": 907 | |
| }, | |
| { | |
| "epoch": 5.310850439882698, | |
| "grad_norm": 0.6922686013988262, | |
| "learning_rate": 2.0886199135323712e-05, | |
| "loss": 0.0901, | |
| "mean_token_accuracy": 0.9730349406599998, | |
| "step": 908 | |
| }, | |
| { | |
| "epoch": 5.316715542521994, | |
| "grad_norm": 0.8095953938881245, | |
| "learning_rate": 2.085197418807272e-05, | |
| "loss": 0.0839, | |
| "mean_token_accuracy": 0.9751959219574928, | |
| "step": 909 | |
| }, | |
| { | |
| "epoch": 5.32258064516129, | |
| "grad_norm": 0.7756873814279036, | |
| "learning_rate": 2.0817753407691774e-05, | |
| "loss": 0.0843, | |
| "mean_token_accuracy": 0.9740110486745834, | |
| "step": 910 | |
| }, | |
| { | |
| "epoch": 5.328445747800586, | |
| "grad_norm": 0.6614025346901433, | |
| "learning_rate": 2.0783536918388477e-05, | |
| "loss": 0.1002, | |
| "mean_token_accuracy": 0.9684435129165649, | |
| "step": 911 | |
| }, | |
| { | |
| "epoch": 5.334310850439882, | |
| "grad_norm": 0.9972663221317221, | |
| "learning_rate": 2.0749324844354867e-05, | |
| "loss": 0.0856, | |
| "mean_token_accuracy": 0.9751722291111946, | |
| "step": 912 | |
| }, | |
| { | |
| "epoch": 5.340175953079179, | |
| "grad_norm": 0.6195183258111225, | |
| "learning_rate": 2.0715117309766953e-05, | |
| "loss": 0.0749, | |
| "mean_token_accuracy": 0.9773119017481804, | |
| "step": 913 | |
| }, | |
| { | |
| "epoch": 5.346041055718475, | |
| "grad_norm": 0.5891900151530527, | |
| "learning_rate": 2.068091443878428e-05, | |
| "loss": 0.0924, | |
| "mean_token_accuracy": 0.9727002307772636, | |
| "step": 914 | |
| }, | |
| { | |
| "epoch": 5.351906158357771, | |
| "grad_norm": 0.6577743321968366, | |
| "learning_rate": 2.064671635554945e-05, | |
| "loss": 0.1042, | |
| "mean_token_accuracy": 0.973984107375145, | |
| "step": 915 | |
| }, | |
| { | |
| "epoch": 5.357771260997067, | |
| "grad_norm": 0.6432081149097092, | |
| "learning_rate": 2.0612523184187693e-05, | |
| "loss": 0.0722, | |
| "mean_token_accuracy": 0.9766345173120499, | |
| "step": 916 | |
| }, | |
| { | |
| "epoch": 5.363636363636363, | |
| "grad_norm": 0.5940654858986845, | |
| "learning_rate": 2.057833504880641e-05, | |
| "loss": 0.0907, | |
| "mean_token_accuracy": 0.970424547791481, | |
| "step": 917 | |
| }, | |
| { | |
| "epoch": 5.3695014662756595, | |
| "grad_norm": 0.7978700792585309, | |
| "learning_rate": 2.054415207349473e-05, | |
| "loss": 0.0986, | |
| "mean_token_accuracy": 0.9700106978416443, | |
| "step": 918 | |
| }, | |
| { | |
| "epoch": 5.375366568914956, | |
| "grad_norm": 0.7734624846466107, | |
| "learning_rate": 2.0509974382323043e-05, | |
| "loss": 0.0911, | |
| "mean_token_accuracy": 0.9722676649689674, | |
| "step": 919 | |
| }, | |
| { | |
| "epoch": 5.381231671554252, | |
| "grad_norm": 0.6025196729714031, | |
| "learning_rate": 2.047580209934256e-05, | |
| "loss": 0.0819, | |
| "mean_token_accuracy": 0.9750821068882942, | |
| "step": 920 | |
| }, | |
| { | |
| "epoch": 5.387096774193548, | |
| "grad_norm": 0.8058624103581123, | |
| "learning_rate": 2.0441635348584876e-05, | |
| "loss": 0.0844, | |
| "mean_token_accuracy": 0.9749607369303703, | |
| "step": 921 | |
| }, | |
| { | |
| "epoch": 5.392961876832844, | |
| "grad_norm": 0.5601275527868476, | |
| "learning_rate": 2.0407474254061498e-05, | |
| "loss": 0.0939, | |
| "mean_token_accuracy": 0.9728451818227768, | |
| "step": 922 | |
| }, | |
| { | |
| "epoch": 5.39882697947214, | |
| "grad_norm": 0.6533125937693671, | |
| "learning_rate": 2.0373318939763397e-05, | |
| "loss": 0.0819, | |
| "mean_token_accuracy": 0.9742006808519363, | |
| "step": 923 | |
| }, | |
| { | |
| "epoch": 5.404692082111437, | |
| "grad_norm": 0.5319127388615715, | |
| "learning_rate": 2.033916952966057e-05, | |
| "loss": 0.0821, | |
| "mean_token_accuracy": 0.9746419414877892, | |
| "step": 924 | |
| }, | |
| { | |
| "epoch": 5.410557184750733, | |
| "grad_norm": 0.8970156927927059, | |
| "learning_rate": 2.0305026147701584e-05, | |
| "loss": 0.0874, | |
| "mean_token_accuracy": 0.9721054211258888, | |
| "step": 925 | |
| }, | |
| { | |
| "epoch": 5.416422287390029, | |
| "grad_norm": 0.7073601134598491, | |
| "learning_rate": 2.0270888917813124e-05, | |
| "loss": 0.0826, | |
| "mean_token_accuracy": 0.9741240218281746, | |
| "step": 926 | |
| }, | |
| { | |
| "epoch": 5.422287390029325, | |
| "grad_norm": 0.6846811877738305, | |
| "learning_rate": 2.0236757963899548e-05, | |
| "loss": 0.0867, | |
| "mean_token_accuracy": 0.9731223210692406, | |
| "step": 927 | |
| }, | |
| { | |
| "epoch": 5.428152492668621, | |
| "grad_norm": 0.6728499001788334, | |
| "learning_rate": 2.020263340984244e-05, | |
| "loss": 0.111, | |
| "mean_token_accuracy": 0.9723155945539474, | |
| "step": 928 | |
| }, | |
| { | |
| "epoch": 5.4340175953079175, | |
| "grad_norm": 4.80499137254916, | |
| "learning_rate": 2.0168515379500145e-05, | |
| "loss": 0.0758, | |
| "mean_token_accuracy": 0.9738029539585114, | |
| "step": 929 | |
| }, | |
| { | |
| "epoch": 5.439882697947214, | |
| "grad_norm": 0.5659705242146589, | |
| "learning_rate": 2.0134403996707338e-05, | |
| "loss": 0.082, | |
| "mean_token_accuracy": 0.9739853367209435, | |
| "step": 930 | |
| }, | |
| { | |
| "epoch": 5.44574780058651, | |
| "grad_norm": 0.5441226945035449, | |
| "learning_rate": 2.0100299385274547e-05, | |
| "loss": 0.083, | |
| "mean_token_accuracy": 0.9742333069443703, | |
| "step": 931 | |
| }, | |
| { | |
| "epoch": 5.451612903225806, | |
| "grad_norm": 0.7050420063651641, | |
| "learning_rate": 2.0066201668987757e-05, | |
| "loss": 0.0916, | |
| "mean_token_accuracy": 0.9706050902605057, | |
| "step": 932 | |
| }, | |
| { | |
| "epoch": 5.457478005865102, | |
| "grad_norm": 0.6357536693584852, | |
| "learning_rate": 2.0032110971607894e-05, | |
| "loss": 0.0831, | |
| "mean_token_accuracy": 0.9762316793203354, | |
| "step": 933 | |
| }, | |
| { | |
| "epoch": 5.463343108504398, | |
| "grad_norm": 0.704077597095192, | |
| "learning_rate": 1.999802741687042e-05, | |
| "loss": 0.0922, | |
| "mean_token_accuracy": 0.9715845510363579, | |
| "step": 934 | |
| }, | |
| { | |
| "epoch": 5.469208211143695, | |
| "grad_norm": 0.6093082482082577, | |
| "learning_rate": 1.9963951128484886e-05, | |
| "loss": 0.072, | |
| "mean_token_accuracy": 0.9773004725575447, | |
| "step": 935 | |
| }, | |
| { | |
| "epoch": 5.475073313782991, | |
| "grad_norm": 0.6051590416188609, | |
| "learning_rate": 1.9929882230134452e-05, | |
| "loss": 0.0874, | |
| "mean_token_accuracy": 0.971544623374939, | |
| "step": 936 | |
| }, | |
| { | |
| "epoch": 5.480938416422287, | |
| "grad_norm": 0.5593716193262593, | |
| "learning_rate": 1.9895820845475445e-05, | |
| "loss": 0.0861, | |
| "mean_token_accuracy": 0.974859744310379, | |
| "step": 937 | |
| }, | |
| { | |
| "epoch": 5.486803519061583, | |
| "grad_norm": 0.6403030492297582, | |
| "learning_rate": 1.9861767098136956e-05, | |
| "loss": 0.0738, | |
| "mean_token_accuracy": 0.9763150438666344, | |
| "step": 938 | |
| }, | |
| { | |
| "epoch": 5.492668621700879, | |
| "grad_norm": 0.5364067645042216, | |
| "learning_rate": 1.982772111172032e-05, | |
| "loss": 0.0798, | |
| "mean_token_accuracy": 0.9747820049524307, | |
| "step": 939 | |
| }, | |
| { | |
| "epoch": 5.4985337243401755, | |
| "grad_norm": 0.6269041173082894, | |
| "learning_rate": 1.9793683009798718e-05, | |
| "loss": 0.0789, | |
| "mean_token_accuracy": 0.9758682772517204, | |
| "step": 940 | |
| }, | |
| { | |
| "epoch": 5.504398826979472, | |
| "grad_norm": 0.7056273697944462, | |
| "learning_rate": 1.975965291591672e-05, | |
| "loss": 0.0949, | |
| "mean_token_accuracy": 0.9714619740843773, | |
| "step": 941 | |
| }, | |
| { | |
| "epoch": 5.510263929618768, | |
| "grad_norm": 0.724661005419617, | |
| "learning_rate": 1.9725630953589823e-05, | |
| "loss": 0.0814, | |
| "mean_token_accuracy": 0.976491242647171, | |
| "step": 942 | |
| }, | |
| { | |
| "epoch": 5.516129032258064, | |
| "grad_norm": 0.6126314687561928, | |
| "learning_rate": 1.9691617246304007e-05, | |
| "loss": 0.0801, | |
| "mean_token_accuracy": 0.9713068008422852, | |
| "step": 943 | |
| }, | |
| { | |
| "epoch": 5.52199413489736, | |
| "grad_norm": 0.6306172258207623, | |
| "learning_rate": 1.9657611917515287e-05, | |
| "loss": 0.0917, | |
| "mean_token_accuracy": 0.9739942848682404, | |
| "step": 944 | |
| }, | |
| { | |
| "epoch": 5.527859237536656, | |
| "grad_norm": 0.683564829563686, | |
| "learning_rate": 1.962361509064928e-05, | |
| "loss": 0.0769, | |
| "mean_token_accuracy": 0.9768866300582886, | |
| "step": 945 | |
| }, | |
| { | |
| "epoch": 5.533724340175953, | |
| "grad_norm": 0.6547403928859, | |
| "learning_rate": 1.958962688910073e-05, | |
| "loss": 0.0701, | |
| "mean_token_accuracy": 0.9767028167843819, | |
| "step": 946 | |
| }, | |
| { | |
| "epoch": 5.539589442815249, | |
| "grad_norm": 0.5491533717582177, | |
| "learning_rate": 1.9555647436233093e-05, | |
| "loss": 0.0826, | |
| "mean_token_accuracy": 0.9770181030035019, | |
| "step": 947 | |
| }, | |
| { | |
| "epoch": 5.545454545454545, | |
| "grad_norm": 0.834495690001552, | |
| "learning_rate": 1.9521676855378045e-05, | |
| "loss": 0.0765, | |
| "mean_token_accuracy": 0.9766438603401184, | |
| "step": 948 | |
| }, | |
| { | |
| "epoch": 5.551319648093841, | |
| "grad_norm": 0.5028599783842282, | |
| "learning_rate": 1.9487715269835082e-05, | |
| "loss": 0.0695, | |
| "mean_token_accuracy": 0.9753778874874115, | |
| "step": 949 | |
| }, | |
| { | |
| "epoch": 5.557184750733137, | |
| "grad_norm": 0.584768110856237, | |
| "learning_rate": 1.945376280287105e-05, | |
| "loss": 0.0888, | |
| "mean_token_accuracy": 0.9721104651689529, | |
| "step": 950 | |
| }, | |
| { | |
| "epoch": 5.563049853372434, | |
| "grad_norm": 0.6577704782595352, | |
| "learning_rate": 1.9419819577719684e-05, | |
| "loss": 0.0862, | |
| "mean_token_accuracy": 0.9739580526947975, | |
| "step": 951 | |
| }, | |
| { | |
| "epoch": 5.568914956011731, | |
| "grad_norm": 0.7577193714244259, | |
| "learning_rate": 1.9385885717581182e-05, | |
| "loss": 0.1004, | |
| "mean_token_accuracy": 0.9689623937010765, | |
| "step": 952 | |
| }, | |
| { | |
| "epoch": 5.574780058651027, | |
| "grad_norm": 0.7853657951677275, | |
| "learning_rate": 1.935196134562175e-05, | |
| "loss": 0.0791, | |
| "mean_token_accuracy": 0.9788500145077705, | |
| "step": 953 | |
| }, | |
| { | |
| "epoch": 5.580645161290323, | |
| "grad_norm": 0.49572118058637943, | |
| "learning_rate": 1.931804658497316e-05, | |
| "loss": 0.0809, | |
| "mean_token_accuracy": 0.9770422652363777, | |
| "step": 954 | |
| }, | |
| { | |
| "epoch": 5.586510263929619, | |
| "grad_norm": 0.7179542536188659, | |
| "learning_rate": 1.9284141558732296e-05, | |
| "loss": 0.0878, | |
| "mean_token_accuracy": 0.9743879362940788, | |
| "step": 955 | |
| }, | |
| { | |
| "epoch": 5.592375366568915, | |
| "grad_norm": 0.7608646421982846, | |
| "learning_rate": 1.925024638996071e-05, | |
| "loss": 0.0853, | |
| "mean_token_accuracy": 0.9749633818864822, | |
| "step": 956 | |
| }, | |
| { | |
| "epoch": 5.5982404692082115, | |
| "grad_norm": 0.5043029695894444, | |
| "learning_rate": 1.9216361201684174e-05, | |
| "loss": 0.0818, | |
| "mean_token_accuracy": 0.9793736785650253, | |
| "step": 957 | |
| }, | |
| { | |
| "epoch": 5.604105571847508, | |
| "grad_norm": 0.7737524412575688, | |
| "learning_rate": 1.918248611689224e-05, | |
| "loss": 0.0774, | |
| "mean_token_accuracy": 0.9757226184010506, | |
| "step": 958 | |
| }, | |
| { | |
| "epoch": 5.609970674486804, | |
| "grad_norm": 0.7067522445553732, | |
| "learning_rate": 1.9148621258537782e-05, | |
| "loss": 0.0921, | |
| "mean_token_accuracy": 0.9713330194354057, | |
| "step": 959 | |
| }, | |
| { | |
| "epoch": 5.6158357771261, | |
| "grad_norm": 1.1027290922626314, | |
| "learning_rate": 1.911476674953656e-05, | |
| "loss": 0.0666, | |
| "mean_token_accuracy": 0.9776544645428658, | |
| "step": 960 | |
| }, | |
| { | |
| "epoch": 5.621700879765396, | |
| "grad_norm": 0.6408775647360502, | |
| "learning_rate": 1.9080922712766762e-05, | |
| "loss": 0.0859, | |
| "mean_token_accuracy": 0.9723746255040169, | |
| "step": 961 | |
| }, | |
| { | |
| "epoch": 5.627565982404692, | |
| "grad_norm": 0.6998113063143537, | |
| "learning_rate": 1.904708927106858e-05, | |
| "loss": 0.0938, | |
| "mean_token_accuracy": 0.970834031701088, | |
| "step": 962 | |
| }, | |
| { | |
| "epoch": 5.633431085043989, | |
| "grad_norm": 0.7806561332961518, | |
| "learning_rate": 1.9013266547243742e-05, | |
| "loss": 0.0757, | |
| "mean_token_accuracy": 0.9781305715441704, | |
| "step": 963 | |
| }, | |
| { | |
| "epoch": 5.639296187683285, | |
| "grad_norm": 0.4889372143222534, | |
| "learning_rate": 1.8979454664055068e-05, | |
| "loss": 0.0829, | |
| "mean_token_accuracy": 0.9758122488856316, | |
| "step": 964 | |
| }, | |
| { | |
| "epoch": 5.645161290322581, | |
| "grad_norm": 0.7863504294726643, | |
| "learning_rate": 1.894565374422605e-05, | |
| "loss": 0.0751, | |
| "mean_token_accuracy": 0.9774143546819687, | |
| "step": 965 | |
| }, | |
| { | |
| "epoch": 5.651026392961877, | |
| "grad_norm": 0.5448893258732672, | |
| "learning_rate": 1.891186391044037e-05, | |
| "loss": 0.091, | |
| "mean_token_accuracy": 0.971643827855587, | |
| "step": 966 | |
| }, | |
| { | |
| "epoch": 5.656891495601173, | |
| "grad_norm": 0.6727468767040634, | |
| "learning_rate": 1.887808528534148e-05, | |
| "loss": 0.083, | |
| "mean_token_accuracy": 0.9740595147013664, | |
| "step": 967 | |
| }, | |
| { | |
| "epoch": 5.6627565982404695, | |
| "grad_norm": 0.5251070795859373, | |
| "learning_rate": 1.884431799153214e-05, | |
| "loss": 0.078, | |
| "mean_token_accuracy": 0.9773188605904579, | |
| "step": 968 | |
| }, | |
| { | |
| "epoch": 5.668621700879766, | |
| "grad_norm": 0.69715074867768, | |
| "learning_rate": 1.8810562151573993e-05, | |
| "loss": 0.0953, | |
| "mean_token_accuracy": 0.9737943336367607, | |
| "step": 969 | |
| }, | |
| { | |
| "epoch": 5.674486803519062, | |
| "grad_norm": 0.8247784064007003, | |
| "learning_rate": 1.8776817887987105e-05, | |
| "loss": 0.0867, | |
| "mean_token_accuracy": 0.974157802760601, | |
| "step": 970 | |
| }, | |
| { | |
| "epoch": 5.680351906158358, | |
| "grad_norm": 0.6397418169092066, | |
| "learning_rate": 1.8743085323249527e-05, | |
| "loss": 0.0872, | |
| "mean_token_accuracy": 0.9712603464722633, | |
| "step": 971 | |
| }, | |
| { | |
| "epoch": 5.686217008797654, | |
| "grad_norm": 0.6415808131766715, | |
| "learning_rate": 1.870936457979684e-05, | |
| "loss": 0.0937, | |
| "mean_token_accuracy": 0.976512186229229, | |
| "step": 972 | |
| }, | |
| { | |
| "epoch": 5.69208211143695, | |
| "grad_norm": 0.6482792559594592, | |
| "learning_rate": 1.8675655780021733e-05, | |
| "loss": 0.0709, | |
| "mean_token_accuracy": 0.9780119732022285, | |
| "step": 973 | |
| }, | |
| { | |
| "epoch": 5.697947214076247, | |
| "grad_norm": 0.6985497602725447, | |
| "learning_rate": 1.8641959046273525e-05, | |
| "loss": 0.0945, | |
| "mean_token_accuracy": 0.9735280200839043, | |
| "step": 974 | |
| }, | |
| { | |
| "epoch": 5.703812316715543, | |
| "grad_norm": 0.649656727914582, | |
| "learning_rate": 1.8608274500857756e-05, | |
| "loss": 0.0863, | |
| "mean_token_accuracy": 0.9729173704981804, | |
| "step": 975 | |
| }, | |
| { | |
| "epoch": 5.709677419354839, | |
| "grad_norm": 0.6212445716603099, | |
| "learning_rate": 1.8574602266035714e-05, | |
| "loss": 0.0726, | |
| "mean_token_accuracy": 0.977060005068779, | |
| "step": 976 | |
| }, | |
| { | |
| "epoch": 5.715542521994135, | |
| "grad_norm": 0.6146079314603976, | |
| "learning_rate": 1.854094246402402e-05, | |
| "loss": 0.0924, | |
| "mean_token_accuracy": 0.9719849154353142, | |
| "step": 977 | |
| }, | |
| { | |
| "epoch": 5.721407624633431, | |
| "grad_norm": 0.6788062565147814, | |
| "learning_rate": 1.8507295216994162e-05, | |
| "loss": 0.0723, | |
| "mean_token_accuracy": 0.979626290500164, | |
| "step": 978 | |
| }, | |
| { | |
| "epoch": 5.7272727272727275, | |
| "grad_norm": 0.6423915968454387, | |
| "learning_rate": 1.8473660647072053e-05, | |
| "loss": 0.0871, | |
| "mean_token_accuracy": 0.9703501686453819, | |
| "step": 979 | |
| }, | |
| { | |
| "epoch": 5.733137829912024, | |
| "grad_norm": 0.558677799000313, | |
| "learning_rate": 1.8440038876337597e-05, | |
| "loss": 0.0663, | |
| "mean_token_accuracy": 0.9777565002441406, | |
| "step": 980 | |
| }, | |
| { | |
| "epoch": 5.73900293255132, | |
| "grad_norm": 1.2785599487996107, | |
| "learning_rate": 1.8406430026824252e-05, | |
| "loss": 0.0879, | |
| "mean_token_accuracy": 0.972544752061367, | |
| "step": 981 | |
| }, | |
| { | |
| "epoch": 5.744868035190616, | |
| "grad_norm": 0.7056011251581465, | |
| "learning_rate": 1.837283422051855e-05, | |
| "loss": 0.0852, | |
| "mean_token_accuracy": 0.9765041247010231, | |
| "step": 982 | |
| }, | |
| { | |
| "epoch": 5.750733137829912, | |
| "grad_norm": 0.6694236578308186, | |
| "learning_rate": 1.8339251579359713e-05, | |
| "loss": 0.0803, | |
| "mean_token_accuracy": 0.9750206470489502, | |
| "step": 983 | |
| }, | |
| { | |
| "epoch": 5.756598240469208, | |
| "grad_norm": 0.4853704060978793, | |
| "learning_rate": 1.8305682225239167e-05, | |
| "loss": 0.0722, | |
| "mean_token_accuracy": 0.9784371107816696, | |
| "step": 984 | |
| }, | |
| { | |
| "epoch": 5.762463343108505, | |
| "grad_norm": 0.7741700176568748, | |
| "learning_rate": 1.8272126280000102e-05, | |
| "loss": 0.0895, | |
| "mean_token_accuracy": 0.972519189119339, | |
| "step": 985 | |
| }, | |
| { | |
| "epoch": 5.768328445747801, | |
| "grad_norm": 0.7875155571594459, | |
| "learning_rate": 1.823858386543705e-05, | |
| "loss": 0.0817, | |
| "mean_token_accuracy": 0.9750584587454796, | |
| "step": 986 | |
| }, | |
| { | |
| "epoch": 5.774193548387097, | |
| "grad_norm": 1.1668685106669274, | |
| "learning_rate": 1.8205055103295434e-05, | |
| "loss": 0.088, | |
| "mean_token_accuracy": 0.971601165831089, | |
| "step": 987 | |
| }, | |
| { | |
| "epoch": 5.780058651026393, | |
| "grad_norm": 0.6957342162162067, | |
| "learning_rate": 1.8171540115271108e-05, | |
| "loss": 0.0884, | |
| "mean_token_accuracy": 0.9688088670372963, | |
| "step": 988 | |
| }, | |
| { | |
| "epoch": 5.785923753665689, | |
| "grad_norm": 0.5858122379925254, | |
| "learning_rate": 1.813803902300995e-05, | |
| "loss": 0.0828, | |
| "mean_token_accuracy": 0.9737322106957436, | |
| "step": 989 | |
| }, | |
| { | |
| "epoch": 5.7917888563049855, | |
| "grad_norm": 0.598739315562945, | |
| "learning_rate": 1.8104551948107395e-05, | |
| "loss": 0.0875, | |
| "mean_token_accuracy": 0.9778008908033371, | |
| "step": 990 | |
| }, | |
| { | |
| "epoch": 5.797653958944282, | |
| "grad_norm": 0.7749756964271326, | |
| "learning_rate": 1.8071079012107997e-05, | |
| "loss": 0.0742, | |
| "mean_token_accuracy": 0.9746625646948814, | |
| "step": 991 | |
| }, | |
| { | |
| "epoch": 5.803519061583578, | |
| "grad_norm": 0.650221789959287, | |
| "learning_rate": 1.8037620336504993e-05, | |
| "loss": 0.0876, | |
| "mean_token_accuracy": 0.9753068014979362, | |
| "step": 992 | |
| }, | |
| { | |
| "epoch": 5.809384164222874, | |
| "grad_norm": 0.6479507750610883, | |
| "learning_rate": 1.8004176042739877e-05, | |
| "loss": 0.0845, | |
| "mean_token_accuracy": 0.9764060229063034, | |
| "step": 993 | |
| }, | |
| { | |
| "epoch": 5.81524926686217, | |
| "grad_norm": 0.7040519035215548, | |
| "learning_rate": 1.797074625220191e-05, | |
| "loss": 0.0802, | |
| "mean_token_accuracy": 0.9756477698683739, | |
| "step": 994 | |
| }, | |
| { | |
| "epoch": 5.821114369501466, | |
| "grad_norm": 0.5324611517794914, | |
| "learning_rate": 1.7937331086227737e-05, | |
| "loss": 0.0844, | |
| "mean_token_accuracy": 0.9716223925352097, | |
| "step": 995 | |
| }, | |
| { | |
| "epoch": 5.826979472140763, | |
| "grad_norm": 0.6583142672501221, | |
| "learning_rate": 1.790393066610091e-05, | |
| "loss": 0.0863, | |
| "mean_token_accuracy": 0.9739385768771172, | |
| "step": 996 | |
| }, | |
| { | |
| "epoch": 5.832844574780059, | |
| "grad_norm": 0.6840541212276599, | |
| "learning_rate": 1.787054511305148e-05, | |
| "loss": 0.0859, | |
| "mean_token_accuracy": 0.9745663329958916, | |
| "step": 997 | |
| }, | |
| { | |
| "epoch": 5.838709677419355, | |
| "grad_norm": 0.6422182464609407, | |
| "learning_rate": 1.7837174548255504e-05, | |
| "loss": 0.0855, | |
| "mean_token_accuracy": 0.9756312891840935, | |
| "step": 998 | |
| }, | |
| { | |
| "epoch": 5.844574780058651, | |
| "grad_norm": 0.551776661055142, | |
| "learning_rate": 1.7803819092834668e-05, | |
| "loss": 0.0737, | |
| "mean_token_accuracy": 0.9777807220816612, | |
| "step": 999 | |
| }, | |
| { | |
| "epoch": 5.850439882697947, | |
| "grad_norm": 0.734547203404358, | |
| "learning_rate": 1.7770478867855797e-05, | |
| "loss": 0.0866, | |
| "mean_token_accuracy": 0.9762368500232697, | |
| "step": 1000 | |
| }, | |
| { | |
| "epoch": 5.8563049853372435, | |
| "grad_norm": 0.7438149569779208, | |
| "learning_rate": 1.7737153994330437e-05, | |
| "loss": 0.1011, | |
| "mean_token_accuracy": 0.9705447629094124, | |
| "step": 1001 | |
| }, | |
| { | |
| "epoch": 5.86217008797654, | |
| "grad_norm": 0.5989262037995564, | |
| "learning_rate": 1.7703844593214427e-05, | |
| "loss": 0.0734, | |
| "mean_token_accuracy": 0.9772868156433105, | |
| "step": 1002 | |
| }, | |
| { | |
| "epoch": 5.868035190615836, | |
| "grad_norm": 0.5387325185762245, | |
| "learning_rate": 1.7670550785407444e-05, | |
| "loss": 0.0659, | |
| "mean_token_accuracy": 0.9779915139079094, | |
| "step": 1003 | |
| }, | |
| { | |
| "epoch": 5.873900293255132, | |
| "grad_norm": 0.5529203720914665, | |
| "learning_rate": 1.7637272691752548e-05, | |
| "loss": 0.0824, | |
| "mean_token_accuracy": 0.9724158346652985, | |
| "step": 1004 | |
| }, | |
| { | |
| "epoch": 5.879765395894428, | |
| "grad_norm": 0.5081691050646204, | |
| "learning_rate": 1.7604010433035793e-05, | |
| "loss": 0.0893, | |
| "mean_token_accuracy": 0.9720709770917892, | |
| "step": 1005 | |
| }, | |
| { | |
| "epoch": 5.885630498533724, | |
| "grad_norm": 0.756502417771231, | |
| "learning_rate": 1.7570764129985747e-05, | |
| "loss": 0.0772, | |
| "mean_token_accuracy": 0.9759951457381248, | |
| "step": 1006 | |
| }, | |
| { | |
| "epoch": 5.891495601173021, | |
| "grad_norm": 0.5358224744677316, | |
| "learning_rate": 1.7537533903273055e-05, | |
| "loss": 0.0771, | |
| "mean_token_accuracy": 0.9745610728859901, | |
| "step": 1007 | |
| }, | |
| { | |
| "epoch": 5.897360703812317, | |
| "grad_norm": 0.7031782400736435, | |
| "learning_rate": 1.7504319873510014e-05, | |
| "loss": 0.0938, | |
| "mean_token_accuracy": 0.9742460176348686, | |
| "step": 1008 | |
| }, | |
| { | |
| "epoch": 5.903225806451613, | |
| "grad_norm": 0.7417007125975534, | |
| "learning_rate": 1.7471122161250153e-05, | |
| "loss": 0.0873, | |
| "mean_token_accuracy": 0.9722920358181, | |
| "step": 1009 | |
| }, | |
| { | |
| "epoch": 5.909090909090909, | |
| "grad_norm": 0.766852011186241, | |
| "learning_rate": 1.743794088698775e-05, | |
| "loss": 0.0853, | |
| "mean_token_accuracy": 0.9757792204618454, | |
| "step": 1010 | |
| }, | |
| { | |
| "epoch": 5.914956011730205, | |
| "grad_norm": 0.6482473650913794, | |
| "learning_rate": 1.7404776171157428e-05, | |
| "loss": 0.0989, | |
| "mean_token_accuracy": 0.9704331830143929, | |
| "step": 1011 | |
| }, | |
| { | |
| "epoch": 5.9208211143695015, | |
| "grad_norm": 0.7240261222776643, | |
| "learning_rate": 1.7371628134133716e-05, | |
| "loss": 0.0968, | |
| "mean_token_accuracy": 0.972036175429821, | |
| "step": 1012 | |
| }, | |
| { | |
| "epoch": 5.926686217008798, | |
| "grad_norm": 0.628892680365792, | |
| "learning_rate": 1.73384968962306e-05, | |
| "loss": 0.0855, | |
| "mean_token_accuracy": 0.9731899350881577, | |
| "step": 1013 | |
| }, | |
| { | |
| "epoch": 5.932551319648094, | |
| "grad_norm": 0.7106790161768256, | |
| "learning_rate": 1.7305382577701088e-05, | |
| "loss": 0.0956, | |
| "mean_token_accuracy": 0.9716820046305656, | |
| "step": 1014 | |
| }, | |
| { | |
| "epoch": 5.93841642228739, | |
| "grad_norm": 0.6838469394111186, | |
| "learning_rate": 1.7272285298736787e-05, | |
| "loss": 0.0778, | |
| "mean_token_accuracy": 0.9754338562488556, | |
| "step": 1015 | |
| }, | |
| { | |
| "epoch": 5.944281524926686, | |
| "grad_norm": 0.6872214896209663, | |
| "learning_rate": 1.7239205179467453e-05, | |
| "loss": 0.1001, | |
| "mean_token_accuracy": 0.972577653825283, | |
| "step": 1016 | |
| }, | |
| { | |
| "epoch": 5.9501466275659824, | |
| "grad_norm": 0.6557151291069184, | |
| "learning_rate": 1.720614233996056e-05, | |
| "loss": 0.1091, | |
| "mean_token_accuracy": 0.9667370170354843, | |
| "step": 1017 | |
| }, | |
| { | |
| "epoch": 5.956011730205279, | |
| "grad_norm": 0.7872367392433527, | |
| "learning_rate": 1.7173096900220852e-05, | |
| "loss": 0.0852, | |
| "mean_token_accuracy": 0.972346842288971, | |
| "step": 1018 | |
| }, | |
| { | |
| "epoch": 5.961876832844575, | |
| "grad_norm": 0.5800535877487003, | |
| "learning_rate": 1.7140068980189943e-05, | |
| "loss": 0.0922, | |
| "mean_token_accuracy": 0.9726891592144966, | |
| "step": 1019 | |
| }, | |
| { | |
| "epoch": 5.967741935483871, | |
| "grad_norm": 0.7407124629984413, | |
| "learning_rate": 1.710705869974583e-05, | |
| "loss": 0.082, | |
| "mean_token_accuracy": 0.9731708765029907, | |
| "step": 1020 | |
| }, | |
| { | |
| "epoch": 5.973607038123167, | |
| "grad_norm": 0.48961326813702305, | |
| "learning_rate": 1.7074066178702512e-05, | |
| "loss": 0.069, | |
| "mean_token_accuracy": 0.977909691631794, | |
| "step": 1021 | |
| }, | |
| { | |
| "epoch": 5.979472140762463, | |
| "grad_norm": 0.5617989892061567, | |
| "learning_rate": 1.7041091536809506e-05, | |
| "loss": 0.084, | |
| "mean_token_accuracy": 0.9753151684999466, | |
| "step": 1022 | |
| }, | |
| { | |
| "epoch": 5.9853372434017595, | |
| "grad_norm": 0.6833500434939148, | |
| "learning_rate": 1.7008134893751446e-05, | |
| "loss": 0.0767, | |
| "mean_token_accuracy": 0.9776782914996147, | |
| "step": 1023 | |
| }, | |
| { | |
| "epoch": 5.991202346041056, | |
| "grad_norm": 0.3999962713670006, | |
| "learning_rate": 1.697519636914765e-05, | |
| "loss": 0.0692, | |
| "mean_token_accuracy": 0.978028692305088, | |
| "step": 1024 | |
| }, | |
| { | |
| "epoch": 5.997067448680352, | |
| "grad_norm": 0.6396083535043195, | |
| "learning_rate": 1.6942276082551634e-05, | |
| "loss": 0.0895, | |
| "mean_token_accuracy": 0.9712998270988464, | |
| "step": 1025 | |
| }, | |
| { | |
| "epoch": 6.0, | |
| "grad_norm": 1.1147420985091527, | |
| "learning_rate": 1.6909374153450762e-05, | |
| "loss": 0.1048, | |
| "mean_token_accuracy": 0.9730786234140396, | |
| "step": 1026 | |
| }, | |
| { | |
| "epoch": 6.005865102639296, | |
| "grad_norm": 0.5202288917529572, | |
| "learning_rate": 1.6876490701265736e-05, | |
| "loss": 0.0631, | |
| "mean_token_accuracy": 0.9801736921072006, | |
| "step": 1027 | |
| }, | |
| { | |
| "epoch": 6.011730205278592, | |
| "grad_norm": 0.5739041631655252, | |
| "learning_rate": 1.684362584535022e-05, | |
| "loss": 0.0774, | |
| "mean_token_accuracy": 0.976397916674614, | |
| "step": 1028 | |
| }, | |
| { | |
| "epoch": 6.0175953079178885, | |
| "grad_norm": 0.5383910419538716, | |
| "learning_rate": 1.6810779704990358e-05, | |
| "loss": 0.0739, | |
| "mean_token_accuracy": 0.9765185192227364, | |
| "step": 1029 | |
| }, | |
| { | |
| "epoch": 6.023460410557185, | |
| "grad_norm": 0.5494319540629348, | |
| "learning_rate": 1.677795239940438e-05, | |
| "loss": 0.0618, | |
| "mean_token_accuracy": 0.9817954376339912, | |
| "step": 1030 | |
| }, | |
| { | |
| "epoch": 6.029325513196481, | |
| "grad_norm": 0.5433321296105504, | |
| "learning_rate": 1.674514404774214e-05, | |
| "loss": 0.0752, | |
| "mean_token_accuracy": 0.9775493144989014, | |
| "step": 1031 | |
| }, | |
| { | |
| "epoch": 6.035190615835777, | |
| "grad_norm": 0.5151829348351332, | |
| "learning_rate": 1.671235476908471e-05, | |
| "loss": 0.0728, | |
| "mean_token_accuracy": 0.9772488325834274, | |
| "step": 1032 | |
| }, | |
| { | |
| "epoch": 6.041055718475073, | |
| "grad_norm": 0.4972524129628317, | |
| "learning_rate": 1.6679584682443924e-05, | |
| "loss": 0.0663, | |
| "mean_token_accuracy": 0.9799492433667183, | |
| "step": 1033 | |
| }, | |
| { | |
| "epoch": 6.0469208211143695, | |
| "grad_norm": 0.46952005304609734, | |
| "learning_rate": 1.6646833906761965e-05, | |
| "loss": 0.065, | |
| "mean_token_accuracy": 0.9775111898779869, | |
| "step": 1034 | |
| }, | |
| { | |
| "epoch": 6.052785923753666, | |
| "grad_norm": 0.44315338871603027, | |
| "learning_rate": 1.661410256091092e-05, | |
| "loss": 0.0681, | |
| "mean_token_accuracy": 0.9794765636324883, | |
| "step": 1035 | |
| }, | |
| { | |
| "epoch": 6.058651026392962, | |
| "grad_norm": 0.5676214069603677, | |
| "learning_rate": 1.658139076369236e-05, | |
| "loss": 0.0793, | |
| "mean_token_accuracy": 0.9757615253329277, | |
| "step": 1036 | |
| }, | |
| { | |
| "epoch": 6.064516129032258, | |
| "grad_norm": 0.6429377797747571, | |
| "learning_rate": 1.6548698633836893e-05, | |
| "loss": 0.0793, | |
| "mean_token_accuracy": 0.972226656973362, | |
| "step": 1037 | |
| }, | |
| { | |
| "epoch": 6.070381231671554, | |
| "grad_norm": 0.5357438467822505, | |
| "learning_rate": 1.6516026290003746e-05, | |
| "loss": 0.0652, | |
| "mean_token_accuracy": 0.9822089746594429, | |
| "step": 1038 | |
| }, | |
| { | |
| "epoch": 6.07624633431085, | |
| "grad_norm": 0.5993690137250399, | |
| "learning_rate": 1.6483373850780328e-05, | |
| "loss": 0.0717, | |
| "mean_token_accuracy": 0.9774091765284538, | |
| "step": 1039 | |
| }, | |
| { | |
| "epoch": 6.0821114369501466, | |
| "grad_norm": 0.4190984495167291, | |
| "learning_rate": 1.645074143468181e-05, | |
| "loss": 0.0597, | |
| "mean_token_accuracy": 0.9807809740304947, | |
| "step": 1040 | |
| }, | |
| { | |
| "epoch": 6.087976539589443, | |
| "grad_norm": 0.6597809731293542, | |
| "learning_rate": 1.6418129160150692e-05, | |
| "loss": 0.0731, | |
| "mean_token_accuracy": 0.9749625474214554, | |
| "step": 1041 | |
| }, | |
| { | |
| "epoch": 6.093841642228739, | |
| "grad_norm": 0.5336205207180447, | |
| "learning_rate": 1.6385537145556346e-05, | |
| "loss": 0.0666, | |
| "mean_token_accuracy": 0.9818411692976952, | |
| "step": 1042 | |
| }, | |
| { | |
| "epoch": 6.099706744868035, | |
| "grad_norm": 0.47928728223975, | |
| "learning_rate": 1.6352965509194634e-05, | |
| "loss": 0.0621, | |
| "mean_token_accuracy": 0.9807577580213547, | |
| "step": 1043 | |
| }, | |
| { | |
| "epoch": 6.105571847507331, | |
| "grad_norm": 0.48355345719317167, | |
| "learning_rate": 1.6320414369287427e-05, | |
| "loss": 0.0606, | |
| "mean_token_accuracy": 0.9786029979586601, | |
| "step": 1044 | |
| }, | |
| { | |
| "epoch": 6.1114369501466275, | |
| "grad_norm": 0.6144852297951959, | |
| "learning_rate": 1.6287883843982223e-05, | |
| "loss": 0.0733, | |
| "mean_token_accuracy": 0.9777380973100662, | |
| "step": 1045 | |
| }, | |
| { | |
| "epoch": 6.117302052785924, | |
| "grad_norm": 0.6340714518585042, | |
| "learning_rate": 1.625537405135169e-05, | |
| "loss": 0.0872, | |
| "mean_token_accuracy": 0.9729229807853699, | |
| "step": 1046 | |
| }, | |
| { | |
| "epoch": 6.12316715542522, | |
| "grad_norm": 0.5568337235876012, | |
| "learning_rate": 1.622288510939325e-05, | |
| "loss": 0.0736, | |
| "mean_token_accuracy": 0.9760441929101944, | |
| "step": 1047 | |
| }, | |
| { | |
| "epoch": 6.129032258064516, | |
| "grad_norm": 0.5939116225097025, | |
| "learning_rate": 1.619041713602864e-05, | |
| "loss": 0.0811, | |
| "mean_token_accuracy": 0.9749231860041618, | |
| "step": 1048 | |
| }, | |
| { | |
| "epoch": 6.134897360703812, | |
| "grad_norm": 0.49431495980866813, | |
| "learning_rate": 1.6157970249103484e-05, | |
| "loss": 0.0758, | |
| "mean_token_accuracy": 0.9751162528991699, | |
| "step": 1049 | |
| }, | |
| { | |
| "epoch": 6.140762463343108, | |
| "grad_norm": 0.5876887177758083, | |
| "learning_rate": 1.612554456638688e-05, | |
| "loss": 0.078, | |
| "mean_token_accuracy": 0.9751496464014053, | |
| "step": 1050 | |
| }, | |
| { | |
| "epoch": 6.146627565982405, | |
| "grad_norm": 0.744948657520097, | |
| "learning_rate": 1.6093140205570962e-05, | |
| "loss": 0.0877, | |
| "mean_token_accuracy": 0.9704322144389153, | |
| "step": 1051 | |
| }, | |
| { | |
| "epoch": 6.152492668621701, | |
| "grad_norm": 0.6326737035901645, | |
| "learning_rate": 1.6060757284270474e-05, | |
| "loss": 0.0882, | |
| "mean_token_accuracy": 0.9718341752886772, | |
| "step": 1052 | |
| }, | |
| { | |
| "epoch": 6.158357771260997, | |
| "grad_norm": 0.41910883057148673, | |
| "learning_rate": 1.6028395920022336e-05, | |
| "loss": 0.0574, | |
| "mean_token_accuracy": 0.9783205166459084, | |
| "step": 1053 | |
| }, | |
| { | |
| "epoch": 6.164222873900293, | |
| "grad_norm": 0.48067835067712417, | |
| "learning_rate": 1.5996056230285237e-05, | |
| "loss": 0.0671, | |
| "mean_token_accuracy": 0.9760030433535576, | |
| "step": 1054 | |
| }, | |
| { | |
| "epoch": 6.170087976539589, | |
| "grad_norm": 0.562185298887884, | |
| "learning_rate": 1.596373833243918e-05, | |
| "loss": 0.0732, | |
| "mean_token_accuracy": 0.9750814661383629, | |
| "step": 1055 | |
| }, | |
| { | |
| "epoch": 6.1759530791788855, | |
| "grad_norm": 0.5522973486708462, | |
| "learning_rate": 1.593144234378509e-05, | |
| "loss": 0.0798, | |
| "mean_token_accuracy": 0.9756905287504196, | |
| "step": 1056 | |
| }, | |
| { | |
| "epoch": 6.181818181818182, | |
| "grad_norm": 0.4779934912831462, | |
| "learning_rate": 1.5899168381544362e-05, | |
| "loss": 0.0683, | |
| "mean_token_accuracy": 0.9793584048748016, | |
| "step": 1057 | |
| }, | |
| { | |
| "epoch": 6.187683284457478, | |
| "grad_norm": 0.518336118011279, | |
| "learning_rate": 1.5866916562858444e-05, | |
| "loss": 0.0665, | |
| "mean_token_accuracy": 0.9778865575790405, | |
| "step": 1058 | |
| }, | |
| { | |
| "epoch": 6.193548387096774, | |
| "grad_norm": 0.4666524808220914, | |
| "learning_rate": 1.5834687004788406e-05, | |
| "loss": 0.0686, | |
| "mean_token_accuracy": 0.9767828062176704, | |
| "step": 1059 | |
| }, | |
| { | |
| "epoch": 6.19941348973607, | |
| "grad_norm": 0.6436163754016054, | |
| "learning_rate": 1.5802479824314537e-05, | |
| "loss": 0.0808, | |
| "mean_token_accuracy": 0.9713993892073631, | |
| "step": 1060 | |
| }, | |
| { | |
| "epoch": 6.205278592375366, | |
| "grad_norm": 0.49132240296176427, | |
| "learning_rate": 1.5770295138335896e-05, | |
| "loss": 0.0591, | |
| "mean_token_accuracy": 0.9808058738708496, | |
| "step": 1061 | |
| }, | |
| { | |
| "epoch": 6.211143695014663, | |
| "grad_norm": 0.4753775907536774, | |
| "learning_rate": 1.573813306366988e-05, | |
| "loss": 0.0666, | |
| "mean_token_accuracy": 0.981878250837326, | |
| "step": 1062 | |
| }, | |
| { | |
| "epoch": 6.217008797653959, | |
| "grad_norm": 0.5673472220422602, | |
| "learning_rate": 1.5705993717051838e-05, | |
| "loss": 0.0833, | |
| "mean_token_accuracy": 0.9748826399445534, | |
| "step": 1063 | |
| }, | |
| { | |
| "epoch": 6.222873900293255, | |
| "grad_norm": 0.526645969703994, | |
| "learning_rate": 1.567387721513462e-05, | |
| "loss": 0.071, | |
| "mean_token_accuracy": 0.9779791384935379, | |
| "step": 1064 | |
| }, | |
| { | |
| "epoch": 6.228739002932551, | |
| "grad_norm": 0.5825082631105734, | |
| "learning_rate": 1.5641783674488155e-05, | |
| "loss": 0.073, | |
| "mean_token_accuracy": 0.9779115691781044, | |
| "step": 1065 | |
| }, | |
| { | |
| "epoch": 6.234604105571847, | |
| "grad_norm": 0.5677888358987312, | |
| "learning_rate": 1.5609713211599035e-05, | |
| "loss": 0.0832, | |
| "mean_token_accuracy": 0.9752690866589546, | |
| "step": 1066 | |
| }, | |
| { | |
| "epoch": 6.2404692082111435, | |
| "grad_norm": 0.5709450414456684, | |
| "learning_rate": 1.557766594287009e-05, | |
| "loss": 0.0882, | |
| "mean_token_accuracy": 0.9753549993038177, | |
| "step": 1067 | |
| }, | |
| { | |
| "epoch": 6.24633431085044, | |
| "grad_norm": 0.7430276028327776, | |
| "learning_rate": 1.554564198461996e-05, | |
| "loss": 0.0938, | |
| "mean_token_accuracy": 0.9685186669230461, | |
| "step": 1068 | |
| }, | |
| { | |
| "epoch": 6.252199413489736, | |
| "grad_norm": 0.6168855856289802, | |
| "learning_rate": 1.5513641453082672e-05, | |
| "loss": 0.0758, | |
| "mean_token_accuracy": 0.9780802503228188, | |
| "step": 1069 | |
| }, | |
| { | |
| "epoch": 6.258064516129032, | |
| "grad_norm": 0.6379867283389812, | |
| "learning_rate": 1.5481664464407246e-05, | |
| "loss": 0.0689, | |
| "mean_token_accuracy": 0.9805227667093277, | |
| "step": 1070 | |
| }, | |
| { | |
| "epoch": 6.263929618768328, | |
| "grad_norm": 0.5302609136875379, | |
| "learning_rate": 1.5449711134657224e-05, | |
| "loss": 0.0815, | |
| "mean_token_accuracy": 0.9738188460469246, | |
| "step": 1071 | |
| }, | |
| { | |
| "epoch": 6.269794721407624, | |
| "grad_norm": 0.5668666714707467, | |
| "learning_rate": 1.5417781579810296e-05, | |
| "loss": 0.0817, | |
| "mean_token_accuracy": 0.975812129676342, | |
| "step": 1072 | |
| }, | |
| { | |
| "epoch": 6.275659824046921, | |
| "grad_norm": 0.5530723747905679, | |
| "learning_rate": 1.5385875915757846e-05, | |
| "loss": 0.0627, | |
| "mean_token_accuracy": 0.9774937778711319, | |
| "step": 1073 | |
| }, | |
| { | |
| "epoch": 6.281524926686217, | |
| "grad_norm": 0.6109099842720831, | |
| "learning_rate": 1.535399425830456e-05, | |
| "loss": 0.0715, | |
| "mean_token_accuracy": 0.9768806248903275, | |
| "step": 1074 | |
| }, | |
| { | |
| "epoch": 6.287390029325513, | |
| "grad_norm": 0.5557413076689061, | |
| "learning_rate": 1.5322136723167957e-05, | |
| "loss": 0.0748, | |
| "mean_token_accuracy": 0.9749912396073341, | |
| "step": 1075 | |
| }, | |
| { | |
| "epoch": 6.293255131964809, | |
| "grad_norm": 0.5155497604370435, | |
| "learning_rate": 1.5290303425978036e-05, | |
| "loss": 0.0694, | |
| "mean_token_accuracy": 0.9790643975138664, | |
| "step": 1076 | |
| }, | |
| { | |
| "epoch": 6.299120234604105, | |
| "grad_norm": 0.5991722826094168, | |
| "learning_rate": 1.525849448227681e-05, | |
| "loss": 0.077, | |
| "mean_token_accuracy": 0.9764990583062172, | |
| "step": 1077 | |
| }, | |
| { | |
| "epoch": 6.3049853372434015, | |
| "grad_norm": 0.5400898196299592, | |
| "learning_rate": 1.5226710007517894e-05, | |
| "loss": 0.0846, | |
| "mean_token_accuracy": 0.971705824136734, | |
| "step": 1078 | |
| }, | |
| { | |
| "epoch": 6.310850439882698, | |
| "grad_norm": 0.46671688299599096, | |
| "learning_rate": 1.5194950117066097e-05, | |
| "loss": 0.0634, | |
| "mean_token_accuracy": 0.9766178503632545, | |
| "step": 1079 | |
| }, | |
| { | |
| "epoch": 6.316715542521994, | |
| "grad_norm": 0.5638659890737318, | |
| "learning_rate": 1.5163214926196995e-05, | |
| "loss": 0.0893, | |
| "mean_token_accuracy": 0.9723503813147545, | |
| "step": 1080 | |
| }, | |
| { | |
| "epoch": 6.32258064516129, | |
| "grad_norm": 0.6696695090363065, | |
| "learning_rate": 1.5131504550096515e-05, | |
| "loss": 0.0796, | |
| "mean_token_accuracy": 0.9719719514250755, | |
| "step": 1081 | |
| }, | |
| { | |
| "epoch": 6.328445747800586, | |
| "grad_norm": 0.5086536558966609, | |
| "learning_rate": 1.5099819103860504e-05, | |
| "loss": 0.072, | |
| "mean_token_accuracy": 0.9775032550096512, | |
| "step": 1082 | |
| }, | |
| { | |
| "epoch": 6.334310850439882, | |
| "grad_norm": 0.471688139385336, | |
| "learning_rate": 1.5068158702494348e-05, | |
| "loss": 0.0609, | |
| "mean_token_accuracy": 0.9810898527503014, | |
| "step": 1083 | |
| }, | |
| { | |
| "epoch": 6.340175953079179, | |
| "grad_norm": 0.4175822937840527, | |
| "learning_rate": 1.5036523460912511e-05, | |
| "loss": 0.0621, | |
| "mean_token_accuracy": 0.9814708605408669, | |
| "step": 1084 | |
| }, | |
| { | |
| "epoch": 6.346041055718475, | |
| "grad_norm": 0.4686690712794084, | |
| "learning_rate": 1.5004913493938147e-05, | |
| "loss": 0.0697, | |
| "mean_token_accuracy": 0.9775428622961044, | |
| "step": 1085 | |
| }, | |
| { | |
| "epoch": 6.351906158357771, | |
| "grad_norm": 0.5230553954992002, | |
| "learning_rate": 1.4973328916302667e-05, | |
| "loss": 0.0804, | |
| "mean_token_accuracy": 0.9741173461079597, | |
| "step": 1086 | |
| }, | |
| { | |
| "epoch": 6.357771260997067, | |
| "grad_norm": 0.6417142179248468, | |
| "learning_rate": 1.4941769842645335e-05, | |
| "loss": 0.0744, | |
| "mean_token_accuracy": 0.9732955172657967, | |
| "step": 1087 | |
| }, | |
| { | |
| "epoch": 6.363636363636363, | |
| "grad_norm": 0.5595604356693998, | |
| "learning_rate": 1.4910236387512837e-05, | |
| "loss": 0.0692, | |
| "mean_token_accuracy": 0.9762078300118446, | |
| "step": 1088 | |
| }, | |
| { | |
| "epoch": 6.3695014662756595, | |
| "grad_norm": 0.5383764197029854, | |
| "learning_rate": 1.487872866535888e-05, | |
| "loss": 0.0732, | |
| "mean_token_accuracy": 0.978789784014225, | |
| "step": 1089 | |
| }, | |
| { | |
| "epoch": 6.375366568914956, | |
| "grad_norm": 0.5947779638063944, | |
| "learning_rate": 1.4847246790543773e-05, | |
| "loss": 0.0701, | |
| "mean_token_accuracy": 0.9754747003316879, | |
| "step": 1090 | |
| }, | |
| { | |
| "epoch": 6.381231671554252, | |
| "grad_norm": 0.523837387093672, | |
| "learning_rate": 1.4815790877334007e-05, | |
| "loss": 0.0662, | |
| "mean_token_accuracy": 0.9779769256711006, | |
| "step": 1091 | |
| }, | |
| { | |
| "epoch": 6.387096774193548, | |
| "grad_norm": 0.6428925856092318, | |
| "learning_rate": 1.4784361039901844e-05, | |
| "loss": 0.0816, | |
| "mean_token_accuracy": 0.9758491739630699, | |
| "step": 1092 | |
| }, | |
| { | |
| "epoch": 6.392961876832844, | |
| "grad_norm": 0.47087577274198766, | |
| "learning_rate": 1.47529573923249e-05, | |
| "loss": 0.0661, | |
| "mean_token_accuracy": 0.9800596609711647, | |
| "step": 1093 | |
| }, | |
| { | |
| "epoch": 6.39882697947214, | |
| "grad_norm": 0.4926712898849946, | |
| "learning_rate": 1.472158004858573e-05, | |
| "loss": 0.0698, | |
| "mean_token_accuracy": 0.9772643595933914, | |
| "step": 1094 | |
| }, | |
| { | |
| "epoch": 6.404692082111437, | |
| "grad_norm": 0.6392683738272947, | |
| "learning_rate": 1.4690229122571419e-05, | |
| "loss": 0.0868, | |
| "mean_token_accuracy": 0.9720509201288223, | |
| "step": 1095 | |
| }, | |
| { | |
| "epoch": 6.410557184750733, | |
| "grad_norm": 0.4332296080546844, | |
| "learning_rate": 1.4658904728073169e-05, | |
| "loss": 0.0609, | |
| "mean_token_accuracy": 0.9797382056713104, | |
| "step": 1096 | |
| }, | |
| { | |
| "epoch": 6.416422287390029, | |
| "grad_norm": 0.6285097290934989, | |
| "learning_rate": 1.4627606978785878e-05, | |
| "loss": 0.0844, | |
| "mean_token_accuracy": 0.9753040000796318, | |
| "step": 1097 | |
| }, | |
| { | |
| "epoch": 6.422287390029325, | |
| "grad_norm": 0.5639248567038654, | |
| "learning_rate": 1.4596335988307736e-05, | |
| "loss": 0.0806, | |
| "mean_token_accuracy": 0.9763555377721786, | |
| "step": 1098 | |
| }, | |
| { | |
| "epoch": 6.428152492668621, | |
| "grad_norm": 0.47561096888160737, | |
| "learning_rate": 1.4565091870139814e-05, | |
| "loss": 0.0639, | |
| "mean_token_accuracy": 0.9785003736615181, | |
| "step": 1099 | |
| }, | |
| { | |
| "epoch": 6.4340175953079175, | |
| "grad_norm": 0.6245194955076678, | |
| "learning_rate": 1.4533874737685638e-05, | |
| "loss": 0.0954, | |
| "mean_token_accuracy": 0.9722826331853867, | |
| "step": 1100 | |
| }, | |
| { | |
| "epoch": 6.439882697947214, | |
| "grad_norm": 0.5706226716255113, | |
| "learning_rate": 1.450268470425079e-05, | |
| "loss": 0.0756, | |
| "mean_token_accuracy": 0.9781828373670578, | |
| "step": 1101 | |
| }, | |
| { | |
| "epoch": 6.44574780058651, | |
| "grad_norm": 0.48168970485274204, | |
| "learning_rate": 1.4471521883042492e-05, | |
| "loss": 0.0716, | |
| "mean_token_accuracy": 0.9778439179062843, | |
| "step": 1102 | |
| }, | |
| { | |
| "epoch": 6.451612903225806, | |
| "grad_norm": 0.5020122286087533, | |
| "learning_rate": 1.4440386387169207e-05, | |
| "loss": 0.0756, | |
| "mean_token_accuracy": 0.9767275303602219, | |
| "step": 1103 | |
| }, | |
| { | |
| "epoch": 6.457478005865102, | |
| "grad_norm": 0.6034064679765079, | |
| "learning_rate": 1.4409278329640218e-05, | |
| "loss": 0.078, | |
| "mean_token_accuracy": 0.9737183228135109, | |
| "step": 1104 | |
| }, | |
| { | |
| "epoch": 6.463343108504398, | |
| "grad_norm": 0.5745820857166307, | |
| "learning_rate": 1.4378197823365186e-05, | |
| "loss": 0.0784, | |
| "mean_token_accuracy": 0.976705364882946, | |
| "step": 1105 | |
| }, | |
| { | |
| "epoch": 6.469208211143695, | |
| "grad_norm": 0.5625957570348558, | |
| "learning_rate": 1.4347144981153807e-05, | |
| "loss": 0.0868, | |
| "mean_token_accuracy": 0.9726732224225998, | |
| "step": 1106 | |
| }, | |
| { | |
| "epoch": 6.475073313782991, | |
| "grad_norm": 0.4930814898729793, | |
| "learning_rate": 1.4316119915715363e-05, | |
| "loss": 0.066, | |
| "mean_token_accuracy": 0.9789433181285858, | |
| "step": 1107 | |
| }, | |
| { | |
| "epoch": 6.480938416422287, | |
| "grad_norm": 0.7989269250156616, | |
| "learning_rate": 1.42851227396583e-05, | |
| "loss": 0.0876, | |
| "mean_token_accuracy": 0.973343163728714, | |
| "step": 1108 | |
| }, | |
| { | |
| "epoch": 6.486803519061583, | |
| "grad_norm": 0.6763402109132646, | |
| "learning_rate": 1.4254153565489861e-05, | |
| "loss": 0.0854, | |
| "mean_token_accuracy": 0.9718771129846573, | |
| "step": 1109 | |
| }, | |
| { | |
| "epoch": 6.492668621700879, | |
| "grad_norm": 0.539486021642433, | |
| "learning_rate": 1.4223212505615634e-05, | |
| "loss": 0.0758, | |
| "mean_token_accuracy": 0.9774969145655632, | |
| "step": 1110 | |
| }, | |
| { | |
| "epoch": 6.4985337243401755, | |
| "grad_norm": 0.5251586229610573, | |
| "learning_rate": 1.4192299672339167e-05, | |
| "loss": 0.064, | |
| "mean_token_accuracy": 0.9784964397549629, | |
| "step": 1111 | |
| }, | |
| { | |
| "epoch": 6.504398826979472, | |
| "grad_norm": 0.5309826113002859, | |
| "learning_rate": 1.4161415177861568e-05, | |
| "loss": 0.0743, | |
| "mean_token_accuracy": 0.9744265750050545, | |
| "step": 1112 | |
| }, | |
| { | |
| "epoch": 6.510263929618768, | |
| "grad_norm": 0.7047987370200421, | |
| "learning_rate": 1.4130559134281074e-05, | |
| "loss": 0.0723, | |
| "mean_token_accuracy": 0.9792541116476059, | |
| "step": 1113 | |
| }, | |
| { | |
| "epoch": 6.516129032258064, | |
| "grad_norm": 0.515674317504861, | |
| "learning_rate": 1.4099731653592668e-05, | |
| "loss": 0.0668, | |
| "mean_token_accuracy": 0.9781686887145042, | |
| "step": 1114 | |
| }, | |
| { | |
| "epoch": 6.52199413489736, | |
| "grad_norm": 0.5947129051770558, | |
| "learning_rate": 1.406893284768764e-05, | |
| "loss": 0.0888, | |
| "mean_token_accuracy": 0.9724589213728905, | |
| "step": 1115 | |
| }, | |
| { | |
| "epoch": 6.527859237536656, | |
| "grad_norm": 0.500411834167771, | |
| "learning_rate": 1.4038162828353223e-05, | |
| "loss": 0.0808, | |
| "mean_token_accuracy": 0.9722086489200592, | |
| "step": 1116 | |
| }, | |
| { | |
| "epoch": 6.533724340175953, | |
| "grad_norm": 0.5342882535928477, | |
| "learning_rate": 1.4007421707272167e-05, | |
| "loss": 0.0792, | |
| "mean_token_accuracy": 0.9754097685217857, | |
| "step": 1117 | |
| }, | |
| { | |
| "epoch": 6.539589442815249, | |
| "grad_norm": 0.5937463096071416, | |
| "learning_rate": 1.3976709596022313e-05, | |
| "loss": 0.0787, | |
| "mean_token_accuracy": 0.9737225919961929, | |
| "step": 1118 | |
| }, | |
| { | |
| "epoch": 6.545454545454545, | |
| "grad_norm": 0.5066873416271005, | |
| "learning_rate": 1.3946026606076232e-05, | |
| "loss": 0.0741, | |
| "mean_token_accuracy": 0.9775821045041084, | |
| "step": 1119 | |
| }, | |
| { | |
| "epoch": 6.551319648093841, | |
| "grad_norm": 0.45523037490484686, | |
| "learning_rate": 1.3915372848800784e-05, | |
| "loss": 0.069, | |
| "mean_token_accuracy": 0.9771781116724014, | |
| "step": 1120 | |
| }, | |
| { | |
| "epoch": 6.557184750733137, | |
| "grad_norm": 0.4600174852995777, | |
| "learning_rate": 1.388474843545672e-05, | |
| "loss": 0.0642, | |
| "mean_token_accuracy": 0.9782451391220093, | |
| "step": 1121 | |
| }, | |
| { | |
| "epoch": 6.563049853372434, | |
| "grad_norm": 0.6097336256321833, | |
| "learning_rate": 1.3854153477198305e-05, | |
| "loss": 0.0989, | |
| "mean_token_accuracy": 0.9664244577288628, | |
| "step": 1122 | |
| }, | |
| { | |
| "epoch": 6.568914956011731, | |
| "grad_norm": 0.552123315680262, | |
| "learning_rate": 1.3823588085072865e-05, | |
| "loss": 0.0681, | |
| "mean_token_accuracy": 0.9755944460630417, | |
| "step": 1123 | |
| }, | |
| { | |
| "epoch": 6.574780058651027, | |
| "grad_norm": 0.5457720092668241, | |
| "learning_rate": 1.3793052370020441e-05, | |
| "loss": 0.0843, | |
| "mean_token_accuracy": 0.9757449254393578, | |
| "step": 1124 | |
| }, | |
| { | |
| "epoch": 6.580645161290323, | |
| "grad_norm": 0.43562839981910295, | |
| "learning_rate": 1.3762546442873343e-05, | |
| "loss": 0.0717, | |
| "mean_token_accuracy": 0.9786180108785629, | |
| "step": 1125 | |
| }, | |
| { | |
| "epoch": 6.586510263929619, | |
| "grad_norm": 0.6472330415954715, | |
| "learning_rate": 1.3732070414355766e-05, | |
| "loss": 0.0796, | |
| "mean_token_accuracy": 0.9755496978759766, | |
| "step": 1126 | |
| }, | |
| { | |
| "epoch": 6.592375366568915, | |
| "grad_norm": 0.4756335587174598, | |
| "learning_rate": 1.370162439508339e-05, | |
| "loss": 0.0673, | |
| "mean_token_accuracy": 0.9780619740486145, | |
| "step": 1127 | |
| }, | |
| { | |
| "epoch": 6.5982404692082115, | |
| "grad_norm": 0.5634011215648543, | |
| "learning_rate": 1.367120849556296e-05, | |
| "loss": 0.0752, | |
| "mean_token_accuracy": 0.9766998812556267, | |
| "step": 1128 | |
| }, | |
| { | |
| "epoch": 6.604105571847508, | |
| "grad_norm": 0.33348339382452186, | |
| "learning_rate": 1.3640822826191907e-05, | |
| "loss": 0.0566, | |
| "mean_token_accuracy": 0.9821174144744873, | |
| "step": 1129 | |
| }, | |
| { | |
| "epoch": 6.609970674486804, | |
| "grad_norm": 0.5058118088375012, | |
| "learning_rate": 1.361046749725794e-05, | |
| "loss": 0.0729, | |
| "mean_token_accuracy": 0.973750501871109, | |
| "step": 1130 | |
| }, | |
| { | |
| "epoch": 6.6158357771261, | |
| "grad_norm": 0.3513140028655011, | |
| "learning_rate": 1.3580142618938647e-05, | |
| "loss": 0.0557, | |
| "mean_token_accuracy": 0.9823877438902855, | |
| "step": 1131 | |
| }, | |
| { | |
| "epoch": 6.621700879765396, | |
| "grad_norm": 0.6010898695910089, | |
| "learning_rate": 1.354984830130109e-05, | |
| "loss": 0.0817, | |
| "mean_token_accuracy": 0.9730056077241898, | |
| "step": 1132 | |
| }, | |
| { | |
| "epoch": 6.627565982404692, | |
| "grad_norm": 0.42939858042799467, | |
| "learning_rate": 1.3519584654301401e-05, | |
| "loss": 0.0718, | |
| "mean_token_accuracy": 0.9771179631352425, | |
| "step": 1133 | |
| }, | |
| { | |
| "epoch": 6.633431085043989, | |
| "grad_norm": 0.5852411490645988, | |
| "learning_rate": 1.3489351787784398e-05, | |
| "loss": 0.0748, | |
| "mean_token_accuracy": 0.9754769504070282, | |
| "step": 1134 | |
| }, | |
| { | |
| "epoch": 6.639296187683285, | |
| "grad_norm": 0.5712867655130404, | |
| "learning_rate": 1.3459149811483178e-05, | |
| "loss": 0.0803, | |
| "mean_token_accuracy": 0.974666953086853, | |
| "step": 1135 | |
| }, | |
| { | |
| "epoch": 6.645161290322581, | |
| "grad_norm": 0.5264774383671567, | |
| "learning_rate": 1.342897883501872e-05, | |
| "loss": 0.0792, | |
| "mean_token_accuracy": 0.9758988320827484, | |
| "step": 1136 | |
| }, | |
| { | |
| "epoch": 6.651026392961877, | |
| "grad_norm": 0.5392883097179687, | |
| "learning_rate": 1.3398838967899477e-05, | |
| "loss": 0.0673, | |
| "mean_token_accuracy": 0.9775514528155327, | |
| "step": 1137 | |
| }, | |
| { | |
| "epoch": 6.656891495601173, | |
| "grad_norm": 0.5844035196310134, | |
| "learning_rate": 1.3368730319520992e-05, | |
| "loss": 0.0806, | |
| "mean_token_accuracy": 0.978425644338131, | |
| "step": 1138 | |
| }, | |
| { | |
| "epoch": 6.6627565982404695, | |
| "grad_norm": 0.6218435188737981, | |
| "learning_rate": 1.3338652999165511e-05, | |
| "loss": 0.0798, | |
| "mean_token_accuracy": 0.9742521271109581, | |
| "step": 1139 | |
| }, | |
| { | |
| "epoch": 6.668621700879766, | |
| "grad_norm": 0.5031460449388016, | |
| "learning_rate": 1.3308607116001549e-05, | |
| "loss": 0.068, | |
| "mean_token_accuracy": 0.9794870689511299, | |
| "step": 1140 | |
| }, | |
| { | |
| "epoch": 6.674486803519062, | |
| "grad_norm": 0.5079929252121882, | |
| "learning_rate": 1.3278592779083534e-05, | |
| "loss": 0.0628, | |
| "mean_token_accuracy": 0.983228825032711, | |
| "step": 1141 | |
| }, | |
| { | |
| "epoch": 6.680351906158358, | |
| "grad_norm": 0.5029542646309768, | |
| "learning_rate": 1.324861009735138e-05, | |
| "loss": 0.0713, | |
| "mean_token_accuracy": 0.978003516793251, | |
| "step": 1142 | |
| }, | |
| { | |
| "epoch": 6.686217008797654, | |
| "grad_norm": 0.4683913564742217, | |
| "learning_rate": 1.3218659179630112e-05, | |
| "loss": 0.0767, | |
| "mean_token_accuracy": 0.9770939275622368, | |
| "step": 1143 | |
| }, | |
| { | |
| "epoch": 6.69208211143695, | |
| "grad_norm": 0.4988823991555784, | |
| "learning_rate": 1.3188740134629469e-05, | |
| "loss": 0.0708, | |
| "mean_token_accuracy": 0.9781245067715645, | |
| "step": 1144 | |
| }, | |
| { | |
| "epoch": 6.697947214076247, | |
| "grad_norm": 0.45606696729566554, | |
| "learning_rate": 1.3158853070943499e-05, | |
| "loss": 0.0582, | |
| "mean_token_accuracy": 0.9791443273425102, | |
| "step": 1145 | |
| }, | |
| { | |
| "epoch": 6.703812316715543, | |
| "grad_norm": 0.5221547094850065, | |
| "learning_rate": 1.3128998097050174e-05, | |
| "loss": 0.0706, | |
| "mean_token_accuracy": 0.9777532517910004, | |
| "step": 1146 | |
| }, | |
| { | |
| "epoch": 6.709677419354839, | |
| "grad_norm": 0.5490116094072529, | |
| "learning_rate": 1.3099175321310993e-05, | |
| "loss": 0.0734, | |
| "mean_token_accuracy": 0.9768932908773422, | |
| "step": 1147 | |
| }, | |
| { | |
| "epoch": 6.715542521994135, | |
| "grad_norm": 0.4176562136333769, | |
| "learning_rate": 1.3069384851970584e-05, | |
| "loss": 0.0631, | |
| "mean_token_accuracy": 0.9779625982046127, | |
| "step": 1148 | |
| }, | |
| { | |
| "epoch": 6.721407624633431, | |
| "grad_norm": 0.4980689337176214, | |
| "learning_rate": 1.3039626797156321e-05, | |
| "loss": 0.0697, | |
| "mean_token_accuracy": 0.97952900826931, | |
| "step": 1149 | |
| }, | |
| { | |
| "epoch": 6.7272727272727275, | |
| "grad_norm": 0.36731730926429257, | |
| "learning_rate": 1.3009901264877924e-05, | |
| "loss": 0.0656, | |
| "mean_token_accuracy": 0.9820658639073372, | |
| "step": 1150 | |
| }, | |
| { | |
| "epoch": 6.733137829912024, | |
| "grad_norm": 0.3967447841170825, | |
| "learning_rate": 1.298020836302707e-05, | |
| "loss": 0.0706, | |
| "mean_token_accuracy": 0.9772286713123322, | |
| "step": 1151 | |
| }, | |
| { | |
| "epoch": 6.73900293255132, | |
| "grad_norm": 0.42726350650755013, | |
| "learning_rate": 1.2950548199376999e-05, | |
| "loss": 0.0657, | |
| "mean_token_accuracy": 0.9814448952674866, | |
| "step": 1152 | |
| }, | |
| { | |
| "epoch": 6.744868035190616, | |
| "grad_norm": 0.546805668816822, | |
| "learning_rate": 1.292092088158213e-05, | |
| "loss": 0.08, | |
| "mean_token_accuracy": 0.9743905514478683, | |
| "step": 1153 | |
| }, | |
| { | |
| "epoch": 6.750733137829912, | |
| "grad_norm": 0.4879375760409544, | |
| "learning_rate": 1.2891326517177663e-05, | |
| "loss": 0.0587, | |
| "mean_token_accuracy": 0.9839038699865341, | |
| "step": 1154 | |
| }, | |
| { | |
| "epoch": 6.756598240469208, | |
| "grad_norm": 0.5831663350172862, | |
| "learning_rate": 1.2861765213579177e-05, | |
| "loss": 0.0824, | |
| "mean_token_accuracy": 0.9725939705967903, | |
| "step": 1155 | |
| }, | |
| { | |
| "epoch": 6.762463343108505, | |
| "grad_norm": 0.6866773053308828, | |
| "learning_rate": 1.2832237078082272e-05, | |
| "loss": 0.0748, | |
| "mean_token_accuracy": 0.9758478626608849, | |
| "step": 1156 | |
| }, | |
| { | |
| "epoch": 6.768328445747801, | |
| "grad_norm": 0.5665523571164475, | |
| "learning_rate": 1.2802742217862156e-05, | |
| "loss": 0.0782, | |
| "mean_token_accuracy": 0.974516935646534, | |
| "step": 1157 | |
| }, | |
| { | |
| "epoch": 6.774193548387097, | |
| "grad_norm": 0.6844894848581596, | |
| "learning_rate": 1.2773280739973255e-05, | |
| "loss": 0.0798, | |
| "mean_token_accuracy": 0.9749132618308067, | |
| "step": 1158 | |
| }, | |
| { | |
| "epoch": 6.780058651026393, | |
| "grad_norm": 0.4389537006132902, | |
| "learning_rate": 1.2743852751348833e-05, | |
| "loss": 0.0615, | |
| "mean_token_accuracy": 0.9811219125986099, | |
| "step": 1159 | |
| }, | |
| { | |
| "epoch": 6.785923753665689, | |
| "grad_norm": 0.5507569863518943, | |
| "learning_rate": 1.2714458358800612e-05, | |
| "loss": 0.0588, | |
| "mean_token_accuracy": 0.9827919527888298, | |
| "step": 1160 | |
| }, | |
| { | |
| "epoch": 6.7917888563049855, | |
| "grad_norm": 0.7106774876218656, | |
| "learning_rate": 1.2685097669018362e-05, | |
| "loss": 0.09, | |
| "mean_token_accuracy": 0.9727823808789253, | |
| "step": 1161 | |
| }, | |
| { | |
| "epoch": 6.797653958944282, | |
| "grad_norm": 0.6314362358470937, | |
| "learning_rate": 1.265577078856953e-05, | |
| "loss": 0.0806, | |
| "mean_token_accuracy": 0.9712254106998444, | |
| "step": 1162 | |
| }, | |
| { | |
| "epoch": 6.803519061583578, | |
| "grad_norm": 0.5509327004441354, | |
| "learning_rate": 1.2626477823898843e-05, | |
| "loss": 0.0785, | |
| "mean_token_accuracy": 0.9762394204735756, | |
| "step": 1163 | |
| }, | |
| { | |
| "epoch": 6.809384164222874, | |
| "grad_norm": 0.543808641203145, | |
| "learning_rate": 1.2597218881327944e-05, | |
| "loss": 0.0763, | |
| "mean_token_accuracy": 0.9751420989632607, | |
| "step": 1164 | |
| }, | |
| { | |
| "epoch": 6.81524926686217, | |
| "grad_norm": 0.4711371688926112, | |
| "learning_rate": 1.2567994067054961e-05, | |
| "loss": 0.0721, | |
| "mean_token_accuracy": 0.9763290211558342, | |
| "step": 1165 | |
| }, | |
| { | |
| "epoch": 6.821114369501466, | |
| "grad_norm": 0.4137756121566806, | |
| "learning_rate": 1.2538803487154177e-05, | |
| "loss": 0.0679, | |
| "mean_token_accuracy": 0.9790268614888191, | |
| "step": 1166 | |
| }, | |
| { | |
| "epoch": 6.826979472140763, | |
| "grad_norm": 0.4621022306065054, | |
| "learning_rate": 1.25096472475756e-05, | |
| "loss": 0.0754, | |
| "mean_token_accuracy": 0.9756415113806725, | |
| "step": 1167 | |
| }, | |
| { | |
| "epoch": 6.832844574780059, | |
| "grad_norm": 0.609760780869737, | |
| "learning_rate": 1.248052545414461e-05, | |
| "loss": 0.0828, | |
| "mean_token_accuracy": 0.9744556695222855, | |
| "step": 1168 | |
| }, | |
| { | |
| "epoch": 6.838709677419355, | |
| "grad_norm": 0.594238030928876, | |
| "learning_rate": 1.2451438212561556e-05, | |
| "loss": 0.0835, | |
| "mean_token_accuracy": 0.9688782170414925, | |
| "step": 1169 | |
| }, | |
| { | |
| "epoch": 6.844574780058651, | |
| "grad_norm": 0.5363107528216697, | |
| "learning_rate": 1.2422385628401377e-05, | |
| "loss": 0.0803, | |
| "mean_token_accuracy": 0.975056879222393, | |
| "step": 1170 | |
| }, | |
| { | |
| "epoch": 6.850439882697947, | |
| "grad_norm": 0.5043511960644672, | |
| "learning_rate": 1.2393367807113217e-05, | |
| "loss": 0.0727, | |
| "mean_token_accuracy": 0.9775923490524292, | |
| "step": 1171 | |
| }, | |
| { | |
| "epoch": 6.8563049853372435, | |
| "grad_norm": 0.39887527153514973, | |
| "learning_rate": 1.236438485402005e-05, | |
| "loss": 0.073, | |
| "mean_token_accuracy": 0.9786998331546783, | |
| "step": 1172 | |
| }, | |
| { | |
| "epoch": 6.86217008797654, | |
| "grad_norm": 0.3888848351878772, | |
| "learning_rate": 1.2335436874318293e-05, | |
| "loss": 0.0658, | |
| "mean_token_accuracy": 0.9791335985064507, | |
| "step": 1173 | |
| }, | |
| { | |
| "epoch": 6.868035190615836, | |
| "grad_norm": 0.6715132177529508, | |
| "learning_rate": 1.2306523973077416e-05, | |
| "loss": 0.0887, | |
| "mean_token_accuracy": 0.9724654704332352, | |
| "step": 1174 | |
| }, | |
| { | |
| "epoch": 6.873900293255132, | |
| "grad_norm": 0.5119648654749844, | |
| "learning_rate": 1.2277646255239572e-05, | |
| "loss": 0.083, | |
| "mean_token_accuracy": 0.9759775921702385, | |
| "step": 1175 | |
| }, | |
| { | |
| "epoch": 6.879765395894428, | |
| "grad_norm": 0.5514064582412531, | |
| "learning_rate": 1.2248803825619224e-05, | |
| "loss": 0.0771, | |
| "mean_token_accuracy": 0.975832425057888, | |
| "step": 1176 | |
| }, | |
| { | |
| "epoch": 6.885630498533724, | |
| "grad_norm": 0.5353698021835701, | |
| "learning_rate": 1.2219996788902734e-05, | |
| "loss": 0.0734, | |
| "mean_token_accuracy": 0.9779311493039131, | |
| "step": 1177 | |
| }, | |
| { | |
| "epoch": 6.891495601173021, | |
| "grad_norm": 0.5915181123188481, | |
| "learning_rate": 1.2191225249648016e-05, | |
| "loss": 0.073, | |
| "mean_token_accuracy": 0.9749589264392853, | |
| "step": 1178 | |
| }, | |
| { | |
| "epoch": 6.897360703812317, | |
| "grad_norm": 0.5164903900438643, | |
| "learning_rate": 1.216248931228413e-05, | |
| "loss": 0.0827, | |
| "mean_token_accuracy": 0.9742263928055763, | |
| "step": 1179 | |
| }, | |
| { | |
| "epoch": 6.903225806451613, | |
| "grad_norm": 0.44759904248078125, | |
| "learning_rate": 1.2133789081110927e-05, | |
| "loss": 0.0671, | |
| "mean_token_accuracy": 0.9794506952166557, | |
| "step": 1180 | |
| }, | |
| { | |
| "epoch": 6.909090909090909, | |
| "grad_norm": 0.569379246167165, | |
| "learning_rate": 1.2105124660298655e-05, | |
| "loss": 0.0725, | |
| "mean_token_accuracy": 0.9769483134150505, | |
| "step": 1181 | |
| }, | |
| { | |
| "epoch": 6.914956011730205, | |
| "grad_norm": 0.4885422172148321, | |
| "learning_rate": 1.2076496153887587e-05, | |
| "loss": 0.0679, | |
| "mean_token_accuracy": 0.9801295772194862, | |
| "step": 1182 | |
| }, | |
| { | |
| "epoch": 6.9208211143695015, | |
| "grad_norm": 0.4453475631938246, | |
| "learning_rate": 1.2047903665787633e-05, | |
| "loss": 0.0685, | |
| "mean_token_accuracy": 0.9790622964501381, | |
| "step": 1183 | |
| }, | |
| { | |
| "epoch": 6.926686217008798, | |
| "grad_norm": 0.48406132453413675, | |
| "learning_rate": 1.2019347299777981e-05, | |
| "loss": 0.0637, | |
| "mean_token_accuracy": 0.9804274588823318, | |
| "step": 1184 | |
| }, | |
| { | |
| "epoch": 6.932551319648094, | |
| "grad_norm": 0.5372921912873875, | |
| "learning_rate": 1.199082715950671e-05, | |
| "loss": 0.0832, | |
| "mean_token_accuracy": 0.9742497876286507, | |
| "step": 1185 | |
| }, | |
| { | |
| "epoch": 6.93841642228739, | |
| "grad_norm": 0.474153710918661, | |
| "learning_rate": 1.1962343348490407e-05, | |
| "loss": 0.0641, | |
| "mean_token_accuracy": 0.979320339858532, | |
| "step": 1186 | |
| }, | |
| { | |
| "epoch": 6.944281524926686, | |
| "grad_norm": 0.4893568094546928, | |
| "learning_rate": 1.1933895970113798e-05, | |
| "loss": 0.0775, | |
| "mean_token_accuracy": 0.9794166311621666, | |
| "step": 1187 | |
| }, | |
| { | |
| "epoch": 6.9501466275659824, | |
| "grad_norm": 0.49286831595383807, | |
| "learning_rate": 1.1905485127629387e-05, | |
| "loss": 0.0796, | |
| "mean_token_accuracy": 0.9763789474964142, | |
| "step": 1188 | |
| }, | |
| { | |
| "epoch": 6.956011730205279, | |
| "grad_norm": 0.5564449012573591, | |
| "learning_rate": 1.1877110924157046e-05, | |
| "loss": 0.0698, | |
| "mean_token_accuracy": 0.9768785759806633, | |
| "step": 1189 | |
| }, | |
| { | |
| "epoch": 6.961876832844575, | |
| "grad_norm": 0.4806314265595714, | |
| "learning_rate": 1.1848773462683684e-05, | |
| "loss": 0.0783, | |
| "mean_token_accuracy": 0.9757219702005386, | |
| "step": 1190 | |
| }, | |
| { | |
| "epoch": 6.967741935483871, | |
| "grad_norm": 0.44614104084922607, | |
| "learning_rate": 1.1820472846062842e-05, | |
| "loss": 0.0678, | |
| "mean_token_accuracy": 0.9781024381518364, | |
| "step": 1191 | |
| }, | |
| { | |
| "epoch": 6.973607038123167, | |
| "grad_norm": 0.4899717975189836, | |
| "learning_rate": 1.1792209177014317e-05, | |
| "loss": 0.071, | |
| "mean_token_accuracy": 0.9795149564743042, | |
| "step": 1192 | |
| }, | |
| { | |
| "epoch": 6.979472140762463, | |
| "grad_norm": 0.558509379737497, | |
| "learning_rate": 1.1763982558123823e-05, | |
| "loss": 0.0769, | |
| "mean_token_accuracy": 0.977259911596775, | |
| "step": 1193 | |
| }, | |
| { | |
| "epoch": 6.9853372434017595, | |
| "grad_norm": 0.5117843575515925, | |
| "learning_rate": 1.1735793091842583e-05, | |
| "loss": 0.0749, | |
| "mean_token_accuracy": 0.9768305420875549, | |
| "step": 1194 | |
| }, | |
| { | |
| "epoch": 6.991202346041056, | |
| "grad_norm": 0.7538879848011749, | |
| "learning_rate": 1.1707640880486975e-05, | |
| "loss": 0.0957, | |
| "mean_token_accuracy": 0.9704435393214226, | |
| "step": 1195 | |
| }, | |
| { | |
| "epoch": 6.997067448680352, | |
| "grad_norm": 0.3002576605567238, | |
| "learning_rate": 1.1679526026238155e-05, | |
| "loss": 0.0554, | |
| "mean_token_accuracy": 0.9824624955654144, | |
| "step": 1196 | |
| }, | |
| { | |
| "epoch": 7.0, | |
| "grad_norm": 0.3002576605567238, | |
| "learning_rate": 1.165144863114169e-05, | |
| "loss": 0.0623, | |
| "mean_token_accuracy": 0.9811327904462814, | |
| "step": 1197 | |
| }, | |
| { | |
| "epoch": 7.005865102639296, | |
| "grad_norm": 0.6433063715747249, | |
| "learning_rate": 1.1623408797107185e-05, | |
| "loss": 0.0732, | |
| "mean_token_accuracy": 0.9747751802206039, | |
| "step": 1198 | |
| }, | |
| { | |
| "epoch": 7.011730205278592, | |
| "grad_norm": 0.4650556964344494, | |
| "learning_rate": 1.1595406625907914e-05, | |
| "loss": 0.0653, | |
| "mean_token_accuracy": 0.9819451943039894, | |
| "step": 1199 | |
| }, | |
| { | |
| "epoch": 7.0175953079178885, | |
| "grad_norm": 0.4519271364761177, | |
| "learning_rate": 1.1567442219180446e-05, | |
| "loss": 0.0627, | |
| "mean_token_accuracy": 0.9809525832533836, | |
| "step": 1200 | |
| }, | |
| { | |
| "epoch": 7.023460410557185, | |
| "grad_norm": 0.4074954888509191, | |
| "learning_rate": 1.153951567842429e-05, | |
| "loss": 0.0529, | |
| "mean_token_accuracy": 0.9832786470651627, | |
| "step": 1201 | |
| }, | |
| { | |
| "epoch": 7.029325513196481, | |
| "grad_norm": 0.3356064558125727, | |
| "learning_rate": 1.1511627105001501e-05, | |
| "loss": 0.0678, | |
| "mean_token_accuracy": 0.9793870970606804, | |
| "step": 1202 | |
| }, | |
| { | |
| "epoch": 7.035190615835777, | |
| "grad_norm": 0.543533892964494, | |
| "learning_rate": 1.1483776600136344e-05, | |
| "loss": 0.0758, | |
| "mean_token_accuracy": 0.9736167937517166, | |
| "step": 1203 | |
| }, | |
| { | |
| "epoch": 7.041055718475073, | |
| "grad_norm": 0.508836792144545, | |
| "learning_rate": 1.1455964264914906e-05, | |
| "loss": 0.0594, | |
| "mean_token_accuracy": 0.9800908789038658, | |
| "step": 1204 | |
| }, | |
| { | |
| "epoch": 7.0469208211143695, | |
| "grad_norm": 0.6127313068418908, | |
| "learning_rate": 1.142819020028472e-05, | |
| "loss": 0.0813, | |
| "mean_token_accuracy": 0.9731275960803032, | |
| "step": 1205 | |
| }, | |
| { | |
| "epoch": 7.052785923753666, | |
| "grad_norm": 0.5480092721545003, | |
| "learning_rate": 1.140045450705443e-05, | |
| "loss": 0.0622, | |
| "mean_token_accuracy": 0.977430984377861, | |
| "step": 1206 | |
| }, | |
| { | |
| "epoch": 7.058651026392962, | |
| "grad_norm": 0.5197554266463569, | |
| "learning_rate": 1.13727572858934e-05, | |
| "loss": 0.0601, | |
| "mean_token_accuracy": 0.9804290607571602, | |
| "step": 1207 | |
| }, | |
| { | |
| "epoch": 7.064516129032258, | |
| "grad_norm": 0.41897297895717694, | |
| "learning_rate": 1.1345098637331356e-05, | |
| "loss": 0.0588, | |
| "mean_token_accuracy": 0.9806567952036858, | |
| "step": 1208 | |
| }, | |
| { | |
| "epoch": 7.070381231671554, | |
| "grad_norm": 0.5729005539563279, | |
| "learning_rate": 1.1317478661758022e-05, | |
| "loss": 0.0805, | |
| "mean_token_accuracy": 0.973584771156311, | |
| "step": 1209 | |
| }, | |
| { | |
| "epoch": 7.07624633431085, | |
| "grad_norm": 0.45278187335218867, | |
| "learning_rate": 1.1289897459422756e-05, | |
| "loss": 0.0605, | |
| "mean_token_accuracy": 0.9816871359944344, | |
| "step": 1210 | |
| }, | |
| { | |
| "epoch": 7.0821114369501466, | |
| "grad_norm": 0.5535750142635363, | |
| "learning_rate": 1.126235513043418e-05, | |
| "loss": 0.0766, | |
| "mean_token_accuracy": 0.9764493852853775, | |
| "step": 1211 | |
| }, | |
| { | |
| "epoch": 7.087976539589443, | |
| "grad_norm": 0.5290761436847783, | |
| "learning_rate": 1.1234851774759828e-05, | |
| "loss": 0.0589, | |
| "mean_token_accuracy": 0.9827250093221664, | |
| "step": 1212 | |
| }, | |
| { | |
| "epoch": 7.093841642228739, | |
| "grad_norm": 0.4338439220392749, | |
| "learning_rate": 1.1207387492225772e-05, | |
| "loss": 0.0627, | |
| "mean_token_accuracy": 0.9795321822166443, | |
| "step": 1213 | |
| }, | |
| { | |
| "epoch": 7.099706744868035, | |
| "grad_norm": 0.4229830435807885, | |
| "learning_rate": 1.1179962382516268e-05, | |
| "loss": 0.073, | |
| "mean_token_accuracy": 0.9799394384026527, | |
| "step": 1214 | |
| }, | |
| { | |
| "epoch": 7.105571847507331, | |
| "grad_norm": 0.5596786809514015, | |
| "learning_rate": 1.1152576545173388e-05, | |
| "loss": 0.069, | |
| "mean_token_accuracy": 0.9794164597988129, | |
| "step": 1215 | |
| }, | |
| { | |
| "epoch": 7.1114369501466275, | |
| "grad_norm": 0.4190871184858409, | |
| "learning_rate": 1.1125230079596654e-05, | |
| "loss": 0.0576, | |
| "mean_token_accuracy": 0.9811462908983231, | |
| "step": 1216 | |
| }, | |
| { | |
| "epoch": 7.117302052785924, | |
| "grad_norm": 0.4594859424747388, | |
| "learning_rate": 1.10979230850427e-05, | |
| "loss": 0.0652, | |
| "mean_token_accuracy": 0.9795415550470352, | |
| "step": 1217 | |
| }, | |
| { | |
| "epoch": 7.12316715542522, | |
| "grad_norm": 0.47431112233441436, | |
| "learning_rate": 1.1070655660624876e-05, | |
| "loss": 0.0734, | |
| "mean_token_accuracy": 0.9774219766259193, | |
| "step": 1218 | |
| }, | |
| { | |
| "epoch": 7.129032258064516, | |
| "grad_norm": 0.5154540944862103, | |
| "learning_rate": 1.1043427905312933e-05, | |
| "loss": 0.0756, | |
| "mean_token_accuracy": 0.9775685295462608, | |
| "step": 1219 | |
| }, | |
| { | |
| "epoch": 7.134897360703812, | |
| "grad_norm": 0.39434647534644846, | |
| "learning_rate": 1.1016239917932618e-05, | |
| "loss": 0.0668, | |
| "mean_token_accuracy": 0.9800396636128426, | |
| "step": 1220 | |
| }, | |
| { | |
| "epoch": 7.140762463343108, | |
| "grad_norm": 0.5032190860559029, | |
| "learning_rate": 1.098909179716535e-05, | |
| "loss": 0.0716, | |
| "mean_token_accuracy": 0.9765996113419533, | |
| "step": 1221 | |
| }, | |
| { | |
| "epoch": 7.146627565982405, | |
| "grad_norm": 0.475595902088216, | |
| "learning_rate": 1.096198364154784e-05, | |
| "loss": 0.0627, | |
| "mean_token_accuracy": 0.9804063886404037, | |
| "step": 1222 | |
| }, | |
| { | |
| "epoch": 7.152492668621701, | |
| "grad_norm": 0.41368564350345344, | |
| "learning_rate": 1.0934915549471747e-05, | |
| "loss": 0.0629, | |
| "mean_token_accuracy": 0.9803172200918198, | |
| "step": 1223 | |
| }, | |
| { | |
| "epoch": 7.158357771260997, | |
| "grad_norm": 0.5177453029351333, | |
| "learning_rate": 1.0907887619183308e-05, | |
| "loss": 0.0624, | |
| "mean_token_accuracy": 0.9804964661598206, | |
| "step": 1224 | |
| }, | |
| { | |
| "epoch": 7.164222873900293, | |
| "grad_norm": 0.5051177169181578, | |
| "learning_rate": 1.0880899948783002e-05, | |
| "loss": 0.0698, | |
| "mean_token_accuracy": 0.9775585755705833, | |
| "step": 1225 | |
| }, | |
| { | |
| "epoch": 7.170087976539589, | |
| "grad_norm": 0.5960031715664886, | |
| "learning_rate": 1.0853952636225165e-05, | |
| "loss": 0.0618, | |
| "mean_token_accuracy": 0.9809572696685791, | |
| "step": 1226 | |
| }, | |
| { | |
| "epoch": 7.1759530791788855, | |
| "grad_norm": 0.5329760895034595, | |
| "learning_rate": 1.0827045779317662e-05, | |
| "loss": 0.0642, | |
| "mean_token_accuracy": 0.9801081269979477, | |
| "step": 1227 | |
| }, | |
| { | |
| "epoch": 7.181818181818182, | |
| "grad_norm": 0.45414595161061955, | |
| "learning_rate": 1.080017947572152e-05, | |
| "loss": 0.0563, | |
| "mean_token_accuracy": 0.9822395518422127, | |
| "step": 1228 | |
| }, | |
| { | |
| "epoch": 7.187683284457478, | |
| "grad_norm": 0.53901041860263, | |
| "learning_rate": 1.0773353822950563e-05, | |
| "loss": 0.0714, | |
| "mean_token_accuracy": 0.9785963222384453, | |
| "step": 1229 | |
| }, | |
| { | |
| "epoch": 7.193548387096774, | |
| "grad_norm": 0.4745632320224554, | |
| "learning_rate": 1.074656891837108e-05, | |
| "loss": 0.0555, | |
| "mean_token_accuracy": 0.9822941496968269, | |
| "step": 1230 | |
| }, | |
| { | |
| "epoch": 7.19941348973607, | |
| "grad_norm": 0.3665296243010212, | |
| "learning_rate": 1.0719824859201457e-05, | |
| "loss": 0.0645, | |
| "mean_token_accuracy": 0.980594739317894, | |
| "step": 1231 | |
| }, | |
| { | |
| "epoch": 7.205278592375366, | |
| "grad_norm": 0.6047864498585788, | |
| "learning_rate": 1.0693121742511828e-05, | |
| "loss": 0.0771, | |
| "mean_token_accuracy": 0.9721439629793167, | |
| "step": 1232 | |
| }, | |
| { | |
| "epoch": 7.211143695014663, | |
| "grad_norm": 0.5041682071363984, | |
| "learning_rate": 1.0666459665223718e-05, | |
| "loss": 0.0697, | |
| "mean_token_accuracy": 0.9779375120997429, | |
| "step": 1233 | |
| }, | |
| { | |
| "epoch": 7.217008797653959, | |
| "grad_norm": 0.5019058566934924, | |
| "learning_rate": 1.0639838724109708e-05, | |
| "loss": 0.066, | |
| "mean_token_accuracy": 0.9792098253965378, | |
| "step": 1234 | |
| }, | |
| { | |
| "epoch": 7.222873900293255, | |
| "grad_norm": 0.43810664457682996, | |
| "learning_rate": 1.0613259015793056e-05, | |
| "loss": 0.0528, | |
| "mean_token_accuracy": 0.9823940470814705, | |
| "step": 1235 | |
| }, | |
| { | |
| "epoch": 7.228739002932551, | |
| "grad_norm": 0.4417784699824646, | |
| "learning_rate": 1.0586720636747368e-05, | |
| "loss": 0.0695, | |
| "mean_token_accuracy": 0.9772520065307617, | |
| "step": 1236 | |
| }, | |
| { | |
| "epoch": 7.234604105571847, | |
| "grad_norm": 0.4743340130011368, | |
| "learning_rate": 1.0560223683296244e-05, | |
| "loss": 0.0649, | |
| "mean_token_accuracy": 0.9788348078727722, | |
| "step": 1237 | |
| }, | |
| { | |
| "epoch": 7.2404692082111435, | |
| "grad_norm": 0.4073966976376095, | |
| "learning_rate": 1.0533768251612924e-05, | |
| "loss": 0.0672, | |
| "mean_token_accuracy": 0.9761592745780945, | |
| "step": 1238 | |
| }, | |
| { | |
| "epoch": 7.24633431085044, | |
| "grad_norm": 0.38289151835341184, | |
| "learning_rate": 1.0507354437719938e-05, | |
| "loss": 0.0591, | |
| "mean_token_accuracy": 0.9810685515403748, | |
| "step": 1239 | |
| }, | |
| { | |
| "epoch": 7.252199413489736, | |
| "grad_norm": 0.5293503986290922, | |
| "learning_rate": 1.0480982337488768e-05, | |
| "loss": 0.0632, | |
| "mean_token_accuracy": 0.9800685048103333, | |
| "step": 1240 | |
| }, | |
| { | |
| "epoch": 7.258064516129032, | |
| "grad_norm": 0.4680528752727949, | |
| "learning_rate": 1.0454652046639486e-05, | |
| "loss": 0.07, | |
| "mean_token_accuracy": 0.9787988215684891, | |
| "step": 1241 | |
| }, | |
| { | |
| "epoch": 7.263929618768328, | |
| "grad_norm": 0.5371939764519806, | |
| "learning_rate": 1.0428363660740407e-05, | |
| "loss": 0.0671, | |
| "mean_token_accuracy": 0.9764540642499924, | |
| "step": 1242 | |
| }, | |
| { | |
| "epoch": 7.269794721407624, | |
| "grad_norm": 0.4604907427704817, | |
| "learning_rate": 1.0402117275207757e-05, | |
| "loss": 0.0727, | |
| "mean_token_accuracy": 0.9751237854361534, | |
| "step": 1243 | |
| }, | |
| { | |
| "epoch": 7.275659824046921, | |
| "grad_norm": 0.543393907597377, | |
| "learning_rate": 1.0375912985305319e-05, | |
| "loss": 0.0683, | |
| "mean_token_accuracy": 0.9787429869174957, | |
| "step": 1244 | |
| }, | |
| { | |
| "epoch": 7.281524926686217, | |
| "grad_norm": 0.44383993642036296, | |
| "learning_rate": 1.0349750886144077e-05, | |
| "loss": 0.0654, | |
| "mean_token_accuracy": 0.9764397814869881, | |
| "step": 1245 | |
| }, | |
| { | |
| "epoch": 7.287390029325513, | |
| "grad_norm": 0.41378217779118326, | |
| "learning_rate": 1.0323631072681888e-05, | |
| "loss": 0.0602, | |
| "mean_token_accuracy": 0.9803901985287666, | |
| "step": 1246 | |
| }, | |
| { | |
| "epoch": 7.293255131964809, | |
| "grad_norm": 0.355989396188892, | |
| "learning_rate": 1.0297553639723123e-05, | |
| "loss": 0.0585, | |
| "mean_token_accuracy": 0.981739304959774, | |
| "step": 1247 | |
| }, | |
| { | |
| "epoch": 7.299120234604105, | |
| "grad_norm": 0.43357089167776125, | |
| "learning_rate": 1.027151868191834e-05, | |
| "loss": 0.0686, | |
| "mean_token_accuracy": 0.9760547429323196, | |
| "step": 1248 | |
| }, | |
| { | |
| "epoch": 7.3049853372434015, | |
| "grad_norm": 0.48332066026786996, | |
| "learning_rate": 1.0245526293763908e-05, | |
| "loss": 0.0786, | |
| "mean_token_accuracy": 0.9736545458436012, | |
| "step": 1249 | |
| }, | |
| { | |
| "epoch": 7.310850439882698, | |
| "grad_norm": 0.47753461671593495, | |
| "learning_rate": 1.0219576569601707e-05, | |
| "loss": 0.0794, | |
| "mean_token_accuracy": 0.9766260907053947, | |
| "step": 1250 | |
| }, | |
| { | |
| "epoch": 7.316715542521994, | |
| "grad_norm": 0.45521081483229914, | |
| "learning_rate": 1.0193669603618757e-05, | |
| "loss": 0.0691, | |
| "mean_token_accuracy": 0.9787106961011887, | |
| "step": 1251 | |
| }, | |
| { | |
| "epoch": 7.32258064516129, | |
| "grad_norm": 0.48836814239477233, | |
| "learning_rate": 1.0167805489846873e-05, | |
| "loss": 0.0673, | |
| "mean_token_accuracy": 0.9792743027210236, | |
| "step": 1252 | |
| }, | |
| { | |
| "epoch": 7.328445747800586, | |
| "grad_norm": 0.47006783479866115, | |
| "learning_rate": 1.0141984322162353e-05, | |
| "loss": 0.0611, | |
| "mean_token_accuracy": 0.9803495109081268, | |
| "step": 1253 | |
| }, | |
| { | |
| "epoch": 7.334310850439882, | |
| "grad_norm": 0.4156017026046577, | |
| "learning_rate": 1.0116206194285598e-05, | |
| "loss": 0.0697, | |
| "mean_token_accuracy": 0.9772606119513512, | |
| "step": 1254 | |
| }, | |
| { | |
| "epoch": 7.340175953079179, | |
| "grad_norm": 0.5125748013140727, | |
| "learning_rate": 1.0090471199780812e-05, | |
| "loss": 0.0729, | |
| "mean_token_accuracy": 0.9776462465524673, | |
| "step": 1255 | |
| }, | |
| { | |
| "epoch": 7.346041055718475, | |
| "grad_norm": 0.4569654216489075, | |
| "learning_rate": 1.0064779432055616e-05, | |
| "loss": 0.0668, | |
| "mean_token_accuracy": 0.976767286658287, | |
| "step": 1256 | |
| }, | |
| { | |
| "epoch": 7.351906158357771, | |
| "grad_norm": 0.45479785552447444, | |
| "learning_rate": 1.0039130984360761e-05, | |
| "loss": 0.066, | |
| "mean_token_accuracy": 0.979149229824543, | |
| "step": 1257 | |
| }, | |
| { | |
| "epoch": 7.357771260997067, | |
| "grad_norm": 0.40924296015239126, | |
| "learning_rate": 1.0013525949789745e-05, | |
| "loss": 0.0715, | |
| "mean_token_accuracy": 0.9773788601160049, | |
| "step": 1258 | |
| }, | |
| { | |
| "epoch": 7.363636363636363, | |
| "grad_norm": 0.41003497991772014, | |
| "learning_rate": 9.987964421278512e-06, | |
| "loss": 0.0618, | |
| "mean_token_accuracy": 0.9816633462905884, | |
| "step": 1259 | |
| }, | |
| { | |
| "epoch": 7.3695014662756595, | |
| "grad_norm": 0.4754148725808644, | |
| "learning_rate": 9.962446491605084e-06, | |
| "loss": 0.0713, | |
| "mean_token_accuracy": 0.9745640978217125, | |
| "step": 1260 | |
| }, | |
| { | |
| "epoch": 7.375366568914956, | |
| "grad_norm": 0.5371881554820942, | |
| "learning_rate": 9.936972253389235e-06, | |
| "loss": 0.063, | |
| "mean_token_accuracy": 0.9805537834763527, | |
| "step": 1261 | |
| }, | |
| { | |
| "epoch": 7.381231671554252, | |
| "grad_norm": 0.6119393072173988, | |
| "learning_rate": 9.911541799092162e-06, | |
| "loss": 0.076, | |
| "mean_token_accuracy": 0.9751638323068619, | |
| "step": 1262 | |
| }, | |
| { | |
| "epoch": 7.387096774193548, | |
| "grad_norm": 0.38304046490416693, | |
| "learning_rate": 9.88615522101615e-06, | |
| "loss": 0.0619, | |
| "mean_token_accuracy": 0.9794823378324509, | |
| "step": 1263 | |
| }, | |
| { | |
| "epoch": 7.392961876832844, | |
| "grad_norm": 0.407569526828338, | |
| "learning_rate": 9.860812611304225e-06, | |
| "loss": 0.0574, | |
| "mean_token_accuracy": 0.9806607142090797, | |
| "step": 1264 | |
| }, | |
| { | |
| "epoch": 7.39882697947214, | |
| "grad_norm": 0.43838874061417293, | |
| "learning_rate": 9.835514061939822e-06, | |
| "loss": 0.0564, | |
| "mean_token_accuracy": 0.9821709990501404, | |
| "step": 1265 | |
| }, | |
| { | |
| "epoch": 7.404692082111437, | |
| "grad_norm": 0.39028973009973217, | |
| "learning_rate": 9.810259664746454e-06, | |
| "loss": 0.0677, | |
| "mean_token_accuracy": 0.979277141392231, | |
| "step": 1266 | |
| }, | |
| { | |
| "epoch": 7.410557184750733, | |
| "grad_norm": 0.47142083942451735, | |
| "learning_rate": 9.785049511387383e-06, | |
| "loss": 0.0622, | |
| "mean_token_accuracy": 0.9790802374482155, | |
| "step": 1267 | |
| }, | |
| { | |
| "epoch": 7.416422287390029, | |
| "grad_norm": 0.47820475224887893, | |
| "learning_rate": 9.759883693365287e-06, | |
| "loss": 0.0732, | |
| "mean_token_accuracy": 0.9772188812494278, | |
| "step": 1268 | |
| }, | |
| { | |
| "epoch": 7.422287390029325, | |
| "grad_norm": 0.4438546581105144, | |
| "learning_rate": 9.734762302021923e-06, | |
| "loss": 0.0533, | |
| "mean_token_accuracy": 0.9826917573809624, | |
| "step": 1269 | |
| }, | |
| { | |
| "epoch": 7.428152492668621, | |
| "grad_norm": 0.36892193219065467, | |
| "learning_rate": 9.709685428537794e-06, | |
| "loss": 0.0624, | |
| "mean_token_accuracy": 0.9822022542357445, | |
| "step": 1270 | |
| }, | |
| { | |
| "epoch": 7.4340175953079175, | |
| "grad_norm": 0.5438930387436425, | |
| "learning_rate": 9.684653163931823e-06, | |
| "loss": 0.0807, | |
| "mean_token_accuracy": 0.9757629334926605, | |
| "step": 1271 | |
| }, | |
| { | |
| "epoch": 7.439882697947214, | |
| "grad_norm": 0.6492834212054615, | |
| "learning_rate": 9.659665599061019e-06, | |
| "loss": 0.0758, | |
| "mean_token_accuracy": 0.9740792885422707, | |
| "step": 1272 | |
| }, | |
| { | |
| "epoch": 7.44574780058651, | |
| "grad_norm": 0.49786671471578997, | |
| "learning_rate": 9.634722824620154e-06, | |
| "loss": 0.0594, | |
| "mean_token_accuracy": 0.9776691198348999, | |
| "step": 1273 | |
| }, | |
| { | |
| "epoch": 7.451612903225806, | |
| "grad_norm": 0.45398970829477336, | |
| "learning_rate": 9.609824931141423e-06, | |
| "loss": 0.0618, | |
| "mean_token_accuracy": 0.9789795354008675, | |
| "step": 1274 | |
| }, | |
| { | |
| "epoch": 7.457478005865102, | |
| "grad_norm": 0.3880313111487553, | |
| "learning_rate": 9.584972008994123e-06, | |
| "loss": 0.0629, | |
| "mean_token_accuracy": 0.980201467871666, | |
| "step": 1275 | |
| }, | |
| { | |
| "epoch": 7.463343108504398, | |
| "grad_norm": 0.4387804564000816, | |
| "learning_rate": 9.560164148384328e-06, | |
| "loss": 0.0802, | |
| "mean_token_accuracy": 0.9769189730286598, | |
| "step": 1276 | |
| }, | |
| { | |
| "epoch": 7.469208211143695, | |
| "grad_norm": 0.6219576750479341, | |
| "learning_rate": 9.53540143935455e-06, | |
| "loss": 0.0701, | |
| "mean_token_accuracy": 0.9761852920055389, | |
| "step": 1277 | |
| }, | |
| { | |
| "epoch": 7.475073313782991, | |
| "grad_norm": 0.5220170243856437, | |
| "learning_rate": 9.510683971783425e-06, | |
| "loss": 0.0892, | |
| "mean_token_accuracy": 0.9741244614124298, | |
| "step": 1278 | |
| }, | |
| { | |
| "epoch": 7.480938416422287, | |
| "grad_norm": 0.4293959414529352, | |
| "learning_rate": 9.486011835385372e-06, | |
| "loss": 0.0471, | |
| "mean_token_accuracy": 0.9853230938315392, | |
| "step": 1279 | |
| }, | |
| { | |
| "epoch": 7.486803519061583, | |
| "grad_norm": 0.4194976868757321, | |
| "learning_rate": 9.461385119710282e-06, | |
| "loss": 0.0725, | |
| "mean_token_accuracy": 0.9749538227915764, | |
| "step": 1280 | |
| }, | |
| { | |
| "epoch": 7.492668621700879, | |
| "grad_norm": 0.41667970612082306, | |
| "learning_rate": 9.436803914143189e-06, | |
| "loss": 0.0695, | |
| "mean_token_accuracy": 0.9754650443792343, | |
| "step": 1281 | |
| }, | |
| { | |
| "epoch": 7.4985337243401755, | |
| "grad_norm": 0.41015333720683506, | |
| "learning_rate": 9.41226830790394e-06, | |
| "loss": 0.06, | |
| "mean_token_accuracy": 0.9807113930583, | |
| "step": 1282 | |
| }, | |
| { | |
| "epoch": 7.504398826979472, | |
| "grad_norm": 0.4689591041115801, | |
| "learning_rate": 9.387778390046881e-06, | |
| "loss": 0.0662, | |
| "mean_token_accuracy": 0.9797599762678146, | |
| "step": 1283 | |
| }, | |
| { | |
| "epoch": 7.510263929618768, | |
| "grad_norm": 0.3586841195731395, | |
| "learning_rate": 9.363334249460519e-06, | |
| "loss": 0.0662, | |
| "mean_token_accuracy": 0.9795782417058945, | |
| "step": 1284 | |
| }, | |
| { | |
| "epoch": 7.516129032258064, | |
| "grad_norm": 0.46454735014205756, | |
| "learning_rate": 9.338935974867213e-06, | |
| "loss": 0.0702, | |
| "mean_token_accuracy": 0.977179504930973, | |
| "step": 1285 | |
| }, | |
| { | |
| "epoch": 7.52199413489736, | |
| "grad_norm": 0.501859043750425, | |
| "learning_rate": 9.314583654822844e-06, | |
| "loss": 0.0707, | |
| "mean_token_accuracy": 0.9769659638404846, | |
| "step": 1286 | |
| }, | |
| { | |
| "epoch": 7.527859237536656, | |
| "grad_norm": 0.45384333056341686, | |
| "learning_rate": 9.290277377716503e-06, | |
| "loss": 0.075, | |
| "mean_token_accuracy": 0.9742954969406128, | |
| "step": 1287 | |
| }, | |
| { | |
| "epoch": 7.533724340175953, | |
| "grad_norm": 0.453406139729027, | |
| "learning_rate": 9.266017231770155e-06, | |
| "loss": 0.0574, | |
| "mean_token_accuracy": 0.9797300845384598, | |
| "step": 1288 | |
| }, | |
| { | |
| "epoch": 7.539589442815249, | |
| "grad_norm": 0.3578564197025334, | |
| "learning_rate": 9.241803305038333e-06, | |
| "loss": 0.0696, | |
| "mean_token_accuracy": 0.9783648028969765, | |
| "step": 1289 | |
| }, | |
| { | |
| "epoch": 7.545454545454545, | |
| "grad_norm": 0.4350473692602661, | |
| "learning_rate": 9.217635685407813e-06, | |
| "loss": 0.0649, | |
| "mean_token_accuracy": 0.9798745140433311, | |
| "step": 1290 | |
| }, | |
| { | |
| "epoch": 7.551319648093841, | |
| "grad_norm": 0.3536173268233605, | |
| "learning_rate": 9.19351446059729e-06, | |
| "loss": 0.0559, | |
| "mean_token_accuracy": 0.983490340411663, | |
| "step": 1291 | |
| }, | |
| { | |
| "epoch": 7.557184750733137, | |
| "grad_norm": 0.4679083834436488, | |
| "learning_rate": 9.16943971815708e-06, | |
| "loss": 0.0633, | |
| "mean_token_accuracy": 0.9795430973172188, | |
| "step": 1292 | |
| }, | |
| { | |
| "epoch": 7.563049853372434, | |
| "grad_norm": 0.4916207041114535, | |
| "learning_rate": 9.145411545468756e-06, | |
| "loss": 0.06, | |
| "mean_token_accuracy": 0.9794794097542763, | |
| "step": 1293 | |
| }, | |
| { | |
| "epoch": 7.568914956011731, | |
| "grad_norm": 0.4242119544741855, | |
| "learning_rate": 9.121430029744893e-06, | |
| "loss": 0.06, | |
| "mean_token_accuracy": 0.9808191359043121, | |
| "step": 1294 | |
| }, | |
| { | |
| "epoch": 7.574780058651027, | |
| "grad_norm": 0.5109235759797703, | |
| "learning_rate": 9.097495258028703e-06, | |
| "loss": 0.0704, | |
| "mean_token_accuracy": 0.9759545400738716, | |
| "step": 1295 | |
| }, | |
| { | |
| "epoch": 7.580645161290323, | |
| "grad_norm": 0.4640004900858478, | |
| "learning_rate": 9.073607317193742e-06, | |
| "loss": 0.0602, | |
| "mean_token_accuracy": 0.9781745299696922, | |
| "step": 1296 | |
| }, | |
| { | |
| "epoch": 7.586510263929619, | |
| "grad_norm": 0.4044944851920302, | |
| "learning_rate": 9.049766293943589e-06, | |
| "loss": 0.0653, | |
| "mean_token_accuracy": 0.9784148558974266, | |
| "step": 1297 | |
| }, | |
| { | |
| "epoch": 7.592375366568915, | |
| "grad_norm": 0.4823259605858935, | |
| "learning_rate": 9.025972274811527e-06, | |
| "loss": 0.0622, | |
| "mean_token_accuracy": 0.97871433198452, | |
| "step": 1298 | |
| }, | |
| { | |
| "epoch": 7.5982404692082115, | |
| "grad_norm": 0.4480002680288002, | |
| "learning_rate": 9.002225346160238e-06, | |
| "loss": 0.0637, | |
| "mean_token_accuracy": 0.9780783578753471, | |
| "step": 1299 | |
| }, | |
| { | |
| "epoch": 7.604105571847508, | |
| "grad_norm": 0.37868110690032336, | |
| "learning_rate": 8.97852559418148e-06, | |
| "loss": 0.0597, | |
| "mean_token_accuracy": 0.9789452776312828, | |
| "step": 1300 | |
| }, | |
| { | |
| "epoch": 7.609970674486804, | |
| "grad_norm": 0.4042692696917226, | |
| "learning_rate": 8.954873104895787e-06, | |
| "loss": 0.061, | |
| "mean_token_accuracy": 0.9810117408633232, | |
| "step": 1301 | |
| }, | |
| { | |
| "epoch": 7.6158357771261, | |
| "grad_norm": 0.44149411459840987, | |
| "learning_rate": 8.931267964152132e-06, | |
| "loss": 0.0659, | |
| "mean_token_accuracy": 0.9778881147503853, | |
| "step": 1302 | |
| }, | |
| { | |
| "epoch": 7.621700879765396, | |
| "grad_norm": 0.4177264074559071, | |
| "learning_rate": 8.907710257627651e-06, | |
| "loss": 0.0643, | |
| "mean_token_accuracy": 0.9792383164167404, | |
| "step": 1303 | |
| }, | |
| { | |
| "epoch": 7.627565982404692, | |
| "grad_norm": 0.415930949832212, | |
| "learning_rate": 8.884200070827303e-06, | |
| "loss": 0.0615, | |
| "mean_token_accuracy": 0.9772908091545105, | |
| "step": 1304 | |
| }, | |
| { | |
| "epoch": 7.633431085043989, | |
| "grad_norm": 0.4117164642861044, | |
| "learning_rate": 8.86073748908357e-06, | |
| "loss": 0.0614, | |
| "mean_token_accuracy": 0.9795729070901871, | |
| "step": 1305 | |
| }, | |
| { | |
| "epoch": 7.639296187683285, | |
| "grad_norm": 0.3694341756589927, | |
| "learning_rate": 8.837322597556146e-06, | |
| "loss": 0.0621, | |
| "mean_token_accuracy": 0.9798577129840851, | |
| "step": 1306 | |
| }, | |
| { | |
| "epoch": 7.645161290322581, | |
| "grad_norm": 0.4945771560752598, | |
| "learning_rate": 8.813955481231633e-06, | |
| "loss": 0.0718, | |
| "mean_token_accuracy": 0.9765386283397675, | |
| "step": 1307 | |
| }, | |
| { | |
| "epoch": 7.651026392961877, | |
| "grad_norm": 0.3859833883632323, | |
| "learning_rate": 8.790636224923221e-06, | |
| "loss": 0.0767, | |
| "mean_token_accuracy": 0.9768943637609482, | |
| "step": 1308 | |
| }, | |
| { | |
| "epoch": 7.656891495601173, | |
| "grad_norm": 0.6038966303553363, | |
| "learning_rate": 8.767364913270399e-06, | |
| "loss": 0.0676, | |
| "mean_token_accuracy": 0.9774223938584328, | |
| "step": 1309 | |
| }, | |
| { | |
| "epoch": 7.6627565982404695, | |
| "grad_norm": 0.4214725291176917, | |
| "learning_rate": 8.744141630738624e-06, | |
| "loss": 0.0716, | |
| "mean_token_accuracy": 0.978145569562912, | |
| "step": 1310 | |
| }, | |
| { | |
| "epoch": 7.668621700879766, | |
| "grad_norm": 0.4347493152025069, | |
| "learning_rate": 8.720966461619038e-06, | |
| "loss": 0.0735, | |
| "mean_token_accuracy": 0.9777267724275589, | |
| "step": 1311 | |
| }, | |
| { | |
| "epoch": 7.674486803519062, | |
| "grad_norm": 0.4389100664263654, | |
| "learning_rate": 8.69783949002814e-06, | |
| "loss": 0.0591, | |
| "mean_token_accuracy": 0.9805267155170441, | |
| "step": 1312 | |
| }, | |
| { | |
| "epoch": 7.680351906158358, | |
| "grad_norm": 0.37417891099121087, | |
| "learning_rate": 8.6747607999075e-06, | |
| "loss": 0.0541, | |
| "mean_token_accuracy": 0.9824578687548637, | |
| "step": 1313 | |
| }, | |
| { | |
| "epoch": 7.686217008797654, | |
| "grad_norm": 0.3872157231435952, | |
| "learning_rate": 8.651730475023435e-06, | |
| "loss": 0.0703, | |
| "mean_token_accuracy": 0.9778390452265739, | |
| "step": 1314 | |
| }, | |
| { | |
| "epoch": 7.69208211143695, | |
| "grad_norm": 0.5363255423638494, | |
| "learning_rate": 8.628748598966739e-06, | |
| "loss": 0.0648, | |
| "mean_token_accuracy": 0.9759575873613358, | |
| "step": 1315 | |
| }, | |
| { | |
| "epoch": 7.697947214076247, | |
| "grad_norm": 0.5743323555165211, | |
| "learning_rate": 8.605815255152323e-06, | |
| "loss": 0.0715, | |
| "mean_token_accuracy": 0.9736444652080536, | |
| "step": 1316 | |
| }, | |
| { | |
| "epoch": 7.703812316715543, | |
| "grad_norm": 0.5139795369015938, | |
| "learning_rate": 8.582930526818973e-06, | |
| "loss": 0.0721, | |
| "mean_token_accuracy": 0.9771385565400124, | |
| "step": 1317 | |
| }, | |
| { | |
| "epoch": 7.709677419354839, | |
| "grad_norm": 0.4314235747530134, | |
| "learning_rate": 8.560094497029008e-06, | |
| "loss": 0.0662, | |
| "mean_token_accuracy": 0.9789847880601883, | |
| "step": 1318 | |
| }, | |
| { | |
| "epoch": 7.715542521994135, | |
| "grad_norm": 0.4453550791454817, | |
| "learning_rate": 8.537307248667992e-06, | |
| "loss": 0.0631, | |
| "mean_token_accuracy": 0.9789009168744087, | |
| "step": 1319 | |
| }, | |
| { | |
| "epoch": 7.721407624633431, | |
| "grad_norm": 0.39151849340341077, | |
| "learning_rate": 8.514568864444432e-06, | |
| "loss": 0.0741, | |
| "mean_token_accuracy": 0.976598434150219, | |
| "step": 1320 | |
| }, | |
| { | |
| "epoch": 7.7272727272727275, | |
| "grad_norm": 0.414862604020496, | |
| "learning_rate": 8.491879426889483e-06, | |
| "loss": 0.0601, | |
| "mean_token_accuracy": 0.9806237667798996, | |
| "step": 1321 | |
| }, | |
| { | |
| "epoch": 7.733137829912024, | |
| "grad_norm": 0.44999339741901423, | |
| "learning_rate": 8.469239018356636e-06, | |
| "loss": 0.0741, | |
| "mean_token_accuracy": 0.9762846529483795, | |
| "step": 1322 | |
| }, | |
| { | |
| "epoch": 7.73900293255132, | |
| "grad_norm": 0.45252653701202344, | |
| "learning_rate": 8.446647721021435e-06, | |
| "loss": 0.0835, | |
| "mean_token_accuracy": 0.9739138633012772, | |
| "step": 1323 | |
| }, | |
| { | |
| "epoch": 7.744868035190616, | |
| "grad_norm": 0.5028896224891031, | |
| "learning_rate": 8.424105616881161e-06, | |
| "loss": 0.0664, | |
| "mean_token_accuracy": 0.9794121384620667, | |
| "step": 1324 | |
| }, | |
| { | |
| "epoch": 7.750733137829912, | |
| "grad_norm": 0.47860397890481565, | |
| "learning_rate": 8.40161278775455e-06, | |
| "loss": 0.0668, | |
| "mean_token_accuracy": 0.9804290905594826, | |
| "step": 1325 | |
| }, | |
| { | |
| "epoch": 7.756598240469208, | |
| "grad_norm": 0.5527107786202187, | |
| "learning_rate": 8.379169315281485e-06, | |
| "loss": 0.0776, | |
| "mean_token_accuracy": 0.9751468449831009, | |
| "step": 1326 | |
| }, | |
| { | |
| "epoch": 7.762463343108505, | |
| "grad_norm": 0.5582032307994943, | |
| "learning_rate": 8.356775280922708e-06, | |
| "loss": 0.0744, | |
| "mean_token_accuracy": 0.9765750914812088, | |
| "step": 1327 | |
| }, | |
| { | |
| "epoch": 7.768328445747801, | |
| "grad_norm": 0.3908594516424265, | |
| "learning_rate": 8.334430765959522e-06, | |
| "loss": 0.0717, | |
| "mean_token_accuracy": 0.9744571521878242, | |
| "step": 1328 | |
| }, | |
| { | |
| "epoch": 7.774193548387097, | |
| "grad_norm": 0.4215612562157994, | |
| "learning_rate": 8.312135851493494e-06, | |
| "loss": 0.0724, | |
| "mean_token_accuracy": 0.9762986898422241, | |
| "step": 1329 | |
| }, | |
| { | |
| "epoch": 7.780058651026393, | |
| "grad_norm": 0.3985250756824816, | |
| "learning_rate": 8.28989061844615e-06, | |
| "loss": 0.057, | |
| "mean_token_accuracy": 0.9815426766872406, | |
| "step": 1330 | |
| }, | |
| { | |
| "epoch": 7.785923753665689, | |
| "grad_norm": 0.4737274734055571, | |
| "learning_rate": 8.267695147558705e-06, | |
| "loss": 0.0798, | |
| "mean_token_accuracy": 0.9746501669287682, | |
| "step": 1331 | |
| }, | |
| { | |
| "epoch": 7.7917888563049855, | |
| "grad_norm": 0.5474729201560261, | |
| "learning_rate": 8.245549519391758e-06, | |
| "loss": 0.073, | |
| "mean_token_accuracy": 0.9773770347237587, | |
| "step": 1332 | |
| }, | |
| { | |
| "epoch": 7.797653958944282, | |
| "grad_norm": 0.4281595969799011, | |
| "learning_rate": 8.22345381432499e-06, | |
| "loss": 0.0684, | |
| "mean_token_accuracy": 0.9795331656932831, | |
| "step": 1333 | |
| }, | |
| { | |
| "epoch": 7.803519061583578, | |
| "grad_norm": 0.3839556927790539, | |
| "learning_rate": 8.201408112556893e-06, | |
| "loss": 0.0725, | |
| "mean_token_accuracy": 0.9781799539923668, | |
| "step": 1334 | |
| }, | |
| { | |
| "epoch": 7.809384164222874, | |
| "grad_norm": 0.50477297766562, | |
| "learning_rate": 8.179412494104457e-06, | |
| "loss": 0.0697, | |
| "mean_token_accuracy": 0.9756702482700348, | |
| "step": 1335 | |
| }, | |
| { | |
| "epoch": 7.81524926686217, | |
| "grad_norm": 0.43758196800489896, | |
| "learning_rate": 8.15746703880289e-06, | |
| "loss": 0.0662, | |
| "mean_token_accuracy": 0.978801429271698, | |
| "step": 1336 | |
| }, | |
| { | |
| "epoch": 7.821114369501466, | |
| "grad_norm": 0.3833872875436529, | |
| "learning_rate": 8.135571826305339e-06, | |
| "loss": 0.0589, | |
| "mean_token_accuracy": 0.9793982282280922, | |
| "step": 1337 | |
| }, | |
| { | |
| "epoch": 7.826979472140763, | |
| "grad_norm": 0.4203603565828973, | |
| "learning_rate": 8.113726936082576e-06, | |
| "loss": 0.0813, | |
| "mean_token_accuracy": 0.9744042530655861, | |
| "step": 1338 | |
| }, | |
| { | |
| "epoch": 7.832844574780059, | |
| "grad_norm": 0.6028233293566885, | |
| "learning_rate": 8.091932447422737e-06, | |
| "loss": 0.0741, | |
| "mean_token_accuracy": 0.9741432145237923, | |
| "step": 1339 | |
| }, | |
| { | |
| "epoch": 7.838709677419355, | |
| "grad_norm": 0.41261972168356403, | |
| "learning_rate": 8.070188439431005e-06, | |
| "loss": 0.0651, | |
| "mean_token_accuracy": 0.9784349501132965, | |
| "step": 1340 | |
| }, | |
| { | |
| "epoch": 7.844574780058651, | |
| "grad_norm": 0.3952270775470951, | |
| "learning_rate": 8.048494991029352e-06, | |
| "loss": 0.0587, | |
| "mean_token_accuracy": 0.9790202602744102, | |
| "step": 1341 | |
| }, | |
| { | |
| "epoch": 7.850439882697947, | |
| "grad_norm": 0.4811838432284508, | |
| "learning_rate": 8.02685218095624e-06, | |
| "loss": 0.0663, | |
| "mean_token_accuracy": 0.9795545116066933, | |
| "step": 1342 | |
| }, | |
| { | |
| "epoch": 7.8563049853372435, | |
| "grad_norm": 0.47515237201817584, | |
| "learning_rate": 8.005260087766318e-06, | |
| "loss": 0.0732, | |
| "mean_token_accuracy": 0.9771887883543968, | |
| "step": 1343 | |
| }, | |
| { | |
| "epoch": 7.86217008797654, | |
| "grad_norm": 0.4539143892456011, | |
| "learning_rate": 7.983718789830167e-06, | |
| "loss": 0.0711, | |
| "mean_token_accuracy": 0.9775839596986771, | |
| "step": 1344 | |
| }, | |
| { | |
| "epoch": 7.868035190615836, | |
| "grad_norm": 0.45044656532473076, | |
| "learning_rate": 7.962228365333999e-06, | |
| "loss": 0.0718, | |
| "mean_token_accuracy": 0.9797210469841957, | |
| "step": 1345 | |
| }, | |
| { | |
| "epoch": 7.873900293255132, | |
| "grad_norm": 0.39707806692319203, | |
| "learning_rate": 7.940788892279375e-06, | |
| "loss": 0.0707, | |
| "mean_token_accuracy": 0.9795804470777512, | |
| "step": 1346 | |
| }, | |
| { | |
| "epoch": 7.879765395894428, | |
| "grad_norm": 0.4554192754757186, | |
| "learning_rate": 7.919400448482928e-06, | |
| "loss": 0.0653, | |
| "mean_token_accuracy": 0.9769511744379997, | |
| "step": 1347 | |
| }, | |
| { | |
| "epoch": 7.885630498533724, | |
| "grad_norm": 0.5364259319675938, | |
| "learning_rate": 7.898063111576066e-06, | |
| "loss": 0.074, | |
| "mean_token_accuracy": 0.9751681387424469, | |
| "step": 1348 | |
| }, | |
| { | |
| "epoch": 7.891495601173021, | |
| "grad_norm": 0.4464569996214825, | |
| "learning_rate": 7.876776959004706e-06, | |
| "loss": 0.0837, | |
| "mean_token_accuracy": 0.9715655595064163, | |
| "step": 1349 | |
| }, | |
| { | |
| "epoch": 7.897360703812317, | |
| "grad_norm": 0.44780600869493603, | |
| "learning_rate": 7.855542068028981e-06, | |
| "loss": 0.0597, | |
| "mean_token_accuracy": 0.9795584827661514, | |
| "step": 1350 | |
| }, | |
| { | |
| "epoch": 7.903225806451613, | |
| "grad_norm": 0.3485066511432153, | |
| "learning_rate": 7.834358515722977e-06, | |
| "loss": 0.0634, | |
| "mean_token_accuracy": 0.9794716238975525, | |
| "step": 1351 | |
| }, | |
| { | |
| "epoch": 7.909090909090909, | |
| "grad_norm": 0.3447849555142869, | |
| "learning_rate": 7.813226378974427e-06, | |
| "loss": 0.0675, | |
| "mean_token_accuracy": 0.977924183011055, | |
| "step": 1352 | |
| }, | |
| { | |
| "epoch": 7.914956011730205, | |
| "grad_norm": 0.4688435626034042, | |
| "learning_rate": 7.792145734484455e-06, | |
| "loss": 0.0655, | |
| "mean_token_accuracy": 0.975438766181469, | |
| "step": 1353 | |
| }, | |
| { | |
| "epoch": 7.9208211143695015, | |
| "grad_norm": 0.395352024579098, | |
| "learning_rate": 7.771116658767286e-06, | |
| "loss": 0.0721, | |
| "mean_token_accuracy": 0.9782473146915436, | |
| "step": 1354 | |
| }, | |
| { | |
| "epoch": 7.926686217008798, | |
| "grad_norm": 0.4536644748137589, | |
| "learning_rate": 7.750139228149978e-06, | |
| "loss": 0.0741, | |
| "mean_token_accuracy": 0.9761862754821777, | |
| "step": 1355 | |
| }, | |
| { | |
| "epoch": 7.932551319648094, | |
| "grad_norm": 0.44439638707441587, | |
| "learning_rate": 7.729213518772121e-06, | |
| "loss": 0.0621, | |
| "mean_token_accuracy": 0.9807272255420685, | |
| "step": 1356 | |
| }, | |
| { | |
| "epoch": 7.93841642228739, | |
| "grad_norm": 0.36724032837204973, | |
| "learning_rate": 7.708339606585591e-06, | |
| "loss": 0.0693, | |
| "mean_token_accuracy": 0.9746986031532288, | |
| "step": 1357 | |
| }, | |
| { | |
| "epoch": 7.944281524926686, | |
| "grad_norm": 0.4764460524283809, | |
| "learning_rate": 7.687517567354266e-06, | |
| "loss": 0.0811, | |
| "mean_token_accuracy": 0.976442739367485, | |
| "step": 1358 | |
| }, | |
| { | |
| "epoch": 7.9501466275659824, | |
| "grad_norm": 0.45827023462979105, | |
| "learning_rate": 7.66674747665373e-06, | |
| "loss": 0.0594, | |
| "mean_token_accuracy": 0.9784819781780243, | |
| "step": 1359 | |
| }, | |
| { | |
| "epoch": 7.956011730205279, | |
| "grad_norm": 0.388246402879624, | |
| "learning_rate": 7.646029409871029e-06, | |
| "loss": 0.0728, | |
| "mean_token_accuracy": 0.975219152867794, | |
| "step": 1360 | |
| }, | |
| { | |
| "epoch": 7.961876832844575, | |
| "grad_norm": 0.35552101550272835, | |
| "learning_rate": 7.625363442204379e-06, | |
| "loss": 0.0542, | |
| "mean_token_accuracy": 0.9826669692993164, | |
| "step": 1361 | |
| }, | |
| { | |
| "epoch": 7.967741935483871, | |
| "grad_norm": 0.3797612701869844, | |
| "learning_rate": 7.604749648662892e-06, | |
| "loss": 0.0679, | |
| "mean_token_accuracy": 0.9792850464582443, | |
| "step": 1362 | |
| }, | |
| { | |
| "epoch": 7.973607038123167, | |
| "grad_norm": 0.39566846200882205, | |
| "learning_rate": 7.584188104066317e-06, | |
| "loss": 0.0594, | |
| "mean_token_accuracy": 0.9788248687982559, | |
| "step": 1363 | |
| }, | |
| { | |
| "epoch": 7.979472140762463, | |
| "grad_norm": 0.5329907294697588, | |
| "learning_rate": 7.563678883044754e-06, | |
| "loss": 0.0872, | |
| "mean_token_accuracy": 0.9761364907026291, | |
| "step": 1364 | |
| }, | |
| { | |
| "epoch": 7.9853372434017595, | |
| "grad_norm": 0.5497543178648615, | |
| "learning_rate": 7.5432220600383935e-06, | |
| "loss": 0.084, | |
| "mean_token_accuracy": 0.973995603621006, | |
| "step": 1365 | |
| }, | |
| { | |
| "epoch": 7.991202346041056, | |
| "grad_norm": 0.375109161415316, | |
| "learning_rate": 7.522817709297241e-06, | |
| "loss": 0.064, | |
| "mean_token_accuracy": 0.9805333688855171, | |
| "step": 1366 | |
| }, | |
| { | |
| "epoch": 7.997067448680352, | |
| "grad_norm": 0.3931034615674967, | |
| "learning_rate": 7.502465904880849e-06, | |
| "loss": 0.0689, | |
| "mean_token_accuracy": 0.9779509454965591, | |
| "step": 1367 | |
| }, | |
| { | |
| "epoch": 8.0, | |
| "grad_norm": 0.7243852896511626, | |
| "learning_rate": 7.482166720658046e-06, | |
| "loss": 0.068, | |
| "mean_token_accuracy": 0.9819488376379013, | |
| "step": 1368 | |
| }, | |
| { | |
| "epoch": 8.005865102639296, | |
| "grad_norm": 0.3923119289851871, | |
| "learning_rate": 7.461920230306674e-06, | |
| "loss": 0.0655, | |
| "mean_token_accuracy": 0.9765593409538269, | |
| "step": 1369 | |
| }, | |
| { | |
| "epoch": 8.011730205278592, | |
| "grad_norm": 0.34742033683272505, | |
| "learning_rate": 7.441726507313318e-06, | |
| "loss": 0.0557, | |
| "mean_token_accuracy": 0.9818401262164116, | |
| "step": 1370 | |
| }, | |
| { | |
| "epoch": 8.017595307917889, | |
| "grad_norm": 0.39423410919506524, | |
| "learning_rate": 7.421585624973033e-06, | |
| "loss": 0.0658, | |
| "mean_token_accuracy": 0.9781031236052513, | |
| "step": 1371 | |
| }, | |
| { | |
| "epoch": 8.023460410557185, | |
| "grad_norm": 0.3493603544083918, | |
| "learning_rate": 7.4014976563890915e-06, | |
| "loss": 0.0541, | |
| "mean_token_accuracy": 0.980792760848999, | |
| "step": 1372 | |
| }, | |
| { | |
| "epoch": 8.029325513196481, | |
| "grad_norm": 0.33145773920777494, | |
| "learning_rate": 7.381462674472702e-06, | |
| "loss": 0.0555, | |
| "mean_token_accuracy": 0.9846704751253128, | |
| "step": 1373 | |
| }, | |
| { | |
| "epoch": 8.035190615835777, | |
| "grad_norm": 0.3327332254267587, | |
| "learning_rate": 7.36148075194276e-06, | |
| "loss": 0.0563, | |
| "mean_token_accuracy": 0.9803121909499168, | |
| "step": 1374 | |
| }, | |
| { | |
| "epoch": 8.041055718475073, | |
| "grad_norm": 0.35755724158142155, | |
| "learning_rate": 7.341551961325574e-06, | |
| "loss": 0.0567, | |
| "mean_token_accuracy": 0.9821698665618896, | |
| "step": 1375 | |
| }, | |
| { | |
| "epoch": 8.04692082111437, | |
| "grad_norm": 0.3459921058263699, | |
| "learning_rate": 7.3216763749546025e-06, | |
| "loss": 0.0528, | |
| "mean_token_accuracy": 0.9846973270177841, | |
| "step": 1376 | |
| }, | |
| { | |
| "epoch": 8.052785923753666, | |
| "grad_norm": 0.420527501236528, | |
| "learning_rate": 7.301854064970202e-06, | |
| "loss": 0.0621, | |
| "mean_token_accuracy": 0.9815153554081917, | |
| "step": 1377 | |
| }, | |
| { | |
| "epoch": 8.058651026392962, | |
| "grad_norm": 0.36370898558772446, | |
| "learning_rate": 7.282085103319349e-06, | |
| "loss": 0.0592, | |
| "mean_token_accuracy": 0.9816609099507332, | |
| "step": 1378 | |
| }, | |
| { | |
| "epoch": 8.064516129032258, | |
| "grad_norm": 0.5158426013736117, | |
| "learning_rate": 7.2623695617553934e-06, | |
| "loss": 0.0686, | |
| "mean_token_accuracy": 0.9795558750629425, | |
| "step": 1379 | |
| }, | |
| { | |
| "epoch": 8.070381231671554, | |
| "grad_norm": 0.3759003395576899, | |
| "learning_rate": 7.242707511837781e-06, | |
| "loss": 0.0529, | |
| "mean_token_accuracy": 0.9835638403892517, | |
| "step": 1380 | |
| }, | |
| { | |
| "epoch": 8.07624633431085, | |
| "grad_norm": 0.3069917423790072, | |
| "learning_rate": 7.223099024931817e-06, | |
| "loss": 0.0529, | |
| "mean_token_accuracy": 0.9842489287257195, | |
| "step": 1381 | |
| }, | |
| { | |
| "epoch": 8.082111436950147, | |
| "grad_norm": 0.388069657624564, | |
| "learning_rate": 7.203544172208387e-06, | |
| "loss": 0.0588, | |
| "mean_token_accuracy": 0.9819294437766075, | |
| "step": 1382 | |
| }, | |
| { | |
| "epoch": 8.087976539589443, | |
| "grad_norm": 0.42963562388871895, | |
| "learning_rate": 7.184043024643712e-06, | |
| "loss": 0.0619, | |
| "mean_token_accuracy": 0.9813325479626656, | |
| "step": 1383 | |
| }, | |
| { | |
| "epoch": 8.093841642228739, | |
| "grad_norm": 0.5150673091985366, | |
| "learning_rate": 7.16459565301908e-06, | |
| "loss": 0.0628, | |
| "mean_token_accuracy": 0.9807056784629822, | |
| "step": 1384 | |
| }, | |
| { | |
| "epoch": 8.099706744868035, | |
| "grad_norm": 0.4915099085064982, | |
| "learning_rate": 7.145202127920598e-06, | |
| "loss": 0.071, | |
| "mean_token_accuracy": 0.9762818515300751, | |
| "step": 1385 | |
| }, | |
| { | |
| "epoch": 8.105571847507331, | |
| "grad_norm": 0.34525443548547485, | |
| "learning_rate": 7.125862519738924e-06, | |
| "loss": 0.058, | |
| "mean_token_accuracy": 0.978920042514801, | |
| "step": 1386 | |
| }, | |
| { | |
| "epoch": 8.111436950146627, | |
| "grad_norm": 0.40866276867871215, | |
| "learning_rate": 7.106576898669031e-06, | |
| "loss": 0.0631, | |
| "mean_token_accuracy": 0.9809467121958733, | |
| "step": 1387 | |
| }, | |
| { | |
| "epoch": 8.117302052785924, | |
| "grad_norm": 0.4275703636185301, | |
| "learning_rate": 7.087345334709931e-06, | |
| "loss": 0.0665, | |
| "mean_token_accuracy": 0.9750278368592262, | |
| "step": 1388 | |
| }, | |
| { | |
| "epoch": 8.12316715542522, | |
| "grad_norm": 0.4453607889338328, | |
| "learning_rate": 7.068167897664433e-06, | |
| "loss": 0.0656, | |
| "mean_token_accuracy": 0.9771971851587296, | |
| "step": 1389 | |
| }, | |
| { | |
| "epoch": 8.129032258064516, | |
| "grad_norm": 0.40445124023147544, | |
| "learning_rate": 7.0490446571388925e-06, | |
| "loss": 0.0684, | |
| "mean_token_accuracy": 0.9804186373949051, | |
| "step": 1390 | |
| }, | |
| { | |
| "epoch": 8.134897360703812, | |
| "grad_norm": 0.3809170636214878, | |
| "learning_rate": 7.0299756825429465e-06, | |
| "loss": 0.0586, | |
| "mean_token_accuracy": 0.979775458574295, | |
| "step": 1391 | |
| }, | |
| { | |
| "epoch": 8.140762463343108, | |
| "grad_norm": 0.35758312690942345, | |
| "learning_rate": 7.010961043089277e-06, | |
| "loss": 0.0501, | |
| "mean_token_accuracy": 0.984392948448658, | |
| "step": 1392 | |
| }, | |
| { | |
| "epoch": 8.146627565982405, | |
| "grad_norm": 0.3705074517793997, | |
| "learning_rate": 6.992000807793333e-06, | |
| "loss": 0.0548, | |
| "mean_token_accuracy": 0.9821308106184006, | |
| "step": 1393 | |
| }, | |
| { | |
| "epoch": 8.1524926686217, | |
| "grad_norm": 0.40202702754880276, | |
| "learning_rate": 6.973095045473124e-06, | |
| "loss": 0.0642, | |
| "mean_token_accuracy": 0.9799409285187721, | |
| "step": 1394 | |
| }, | |
| { | |
| "epoch": 8.158357771260997, | |
| "grad_norm": 0.40834566224229935, | |
| "learning_rate": 6.954243824748922e-06, | |
| "loss": 0.0659, | |
| "mean_token_accuracy": 0.9817368164658546, | |
| "step": 1395 | |
| }, | |
| { | |
| "epoch": 8.164222873900293, | |
| "grad_norm": 0.3340248620481329, | |
| "learning_rate": 6.93544721404305e-06, | |
| "loss": 0.0576, | |
| "mean_token_accuracy": 0.9808118641376495, | |
| "step": 1396 | |
| }, | |
| { | |
| "epoch": 8.17008797653959, | |
| "grad_norm": 1.4026355726442485, | |
| "learning_rate": 6.916705281579612e-06, | |
| "loss": 0.0589, | |
| "mean_token_accuracy": 0.9816148653626442, | |
| "step": 1397 | |
| }, | |
| { | |
| "epoch": 8.175953079178885, | |
| "grad_norm": 0.4602551692432743, | |
| "learning_rate": 6.898018095384252e-06, | |
| "loss": 0.0751, | |
| "mean_token_accuracy": 0.9773136898875237, | |
| "step": 1398 | |
| }, | |
| { | |
| "epoch": 8.181818181818182, | |
| "grad_norm": 0.4282779839360214, | |
| "learning_rate": 6.879385723283913e-06, | |
| "loss": 0.0602, | |
| "mean_token_accuracy": 0.980457104742527, | |
| "step": 1399 | |
| }, | |
| { | |
| "epoch": 8.187683284457478, | |
| "grad_norm": 0.40072108525872063, | |
| "learning_rate": 6.8608082329065775e-06, | |
| "loss": 0.0608, | |
| "mean_token_accuracy": 0.98114163428545, | |
| "step": 1400 | |
| }, | |
| { | |
| "epoch": 8.193548387096774, | |
| "grad_norm": 0.4949708344107772, | |
| "learning_rate": 6.842285691681032e-06, | |
| "loss": 0.0698, | |
| "mean_token_accuracy": 0.9777880758047104, | |
| "step": 1401 | |
| }, | |
| { | |
| "epoch": 8.19941348973607, | |
| "grad_norm": 0.41143780595437013, | |
| "learning_rate": 6.8238181668366244e-06, | |
| "loss": 0.0527, | |
| "mean_token_accuracy": 0.9804578647017479, | |
| "step": 1402 | |
| }, | |
| { | |
| "epoch": 8.205278592375366, | |
| "grad_norm": 0.446656112624407, | |
| "learning_rate": 6.805405725403006e-06, | |
| "loss": 0.0689, | |
| "mean_token_accuracy": 0.9786640405654907, | |
| "step": 1403 | |
| }, | |
| { | |
| "epoch": 8.211143695014663, | |
| "grad_norm": 0.4317831312437193, | |
| "learning_rate": 6.787048434209906e-06, | |
| "loss": 0.0627, | |
| "mean_token_accuracy": 0.979246236383915, | |
| "step": 1404 | |
| }, | |
| { | |
| "epoch": 8.217008797653959, | |
| "grad_norm": 0.44989525731094726, | |
| "learning_rate": 6.768746359886882e-06, | |
| "loss": 0.0627, | |
| "mean_token_accuracy": 0.9802892208099365, | |
| "step": 1405 | |
| }, | |
| { | |
| "epoch": 8.222873900293255, | |
| "grad_norm": 0.38725230143331046, | |
| "learning_rate": 6.750499568863061e-06, | |
| "loss": 0.063, | |
| "mean_token_accuracy": 0.9803582206368446, | |
| "step": 1406 | |
| }, | |
| { | |
| "epoch": 8.228739002932551, | |
| "grad_norm": 0.45583599437841565, | |
| "learning_rate": 6.732308127366931e-06, | |
| "loss": 0.0721, | |
| "mean_token_accuracy": 0.9776536375284195, | |
| "step": 1407 | |
| }, | |
| { | |
| "epoch": 8.234604105571847, | |
| "grad_norm": 0.48140191290947787, | |
| "learning_rate": 6.714172101426077e-06, | |
| "loss": 0.0685, | |
| "mean_token_accuracy": 0.9765149429440498, | |
| "step": 1408 | |
| }, | |
| { | |
| "epoch": 8.240469208211143, | |
| "grad_norm": 0.34413464461383864, | |
| "learning_rate": 6.696091556866948e-06, | |
| "loss": 0.0494, | |
| "mean_token_accuracy": 0.9841427430510521, | |
| "step": 1409 | |
| }, | |
| { | |
| "epoch": 8.24633431085044, | |
| "grad_norm": 0.41459274863437684, | |
| "learning_rate": 6.678066559314622e-06, | |
| "loss": 0.0696, | |
| "mean_token_accuracy": 0.9769286438822746, | |
| "step": 1410 | |
| }, | |
| { | |
| "epoch": 8.252199413489736, | |
| "grad_norm": 0.463295539451374, | |
| "learning_rate": 6.660097174192556e-06, | |
| "loss": 0.0648, | |
| "mean_token_accuracy": 0.9780315533280373, | |
| "step": 1411 | |
| }, | |
| { | |
| "epoch": 8.258064516129032, | |
| "grad_norm": 0.37945484006643254, | |
| "learning_rate": 6.642183466722363e-06, | |
| "loss": 0.0635, | |
| "mean_token_accuracy": 0.9778400585055351, | |
| "step": 1412 | |
| }, | |
| { | |
| "epoch": 8.263929618768328, | |
| "grad_norm": 0.4058692987195602, | |
| "learning_rate": 6.624325501923565e-06, | |
| "loss": 0.0651, | |
| "mean_token_accuracy": 0.9796192273497581, | |
| "step": 1413 | |
| }, | |
| { | |
| "epoch": 8.269794721407624, | |
| "grad_norm": 0.5148308880148996, | |
| "learning_rate": 6.606523344613362e-06, | |
| "loss": 0.0712, | |
| "mean_token_accuracy": 0.9737554267048836, | |
| "step": 1414 | |
| }, | |
| { | |
| "epoch": 8.27565982404692, | |
| "grad_norm": 0.40329074589234587, | |
| "learning_rate": 6.588777059406397e-06, | |
| "loss": 0.0632, | |
| "mean_token_accuracy": 0.9812517613172531, | |
| "step": 1415 | |
| }, | |
| { | |
| "epoch": 8.281524926686217, | |
| "grad_norm": 0.36054433665413455, | |
| "learning_rate": 6.571086710714516e-06, | |
| "loss": 0.0536, | |
| "mean_token_accuracy": 0.9814741238951683, | |
| "step": 1416 | |
| }, | |
| { | |
| "epoch": 8.287390029325513, | |
| "grad_norm": 0.425843553206683, | |
| "learning_rate": 6.553452362746543e-06, | |
| "loss": 0.0677, | |
| "mean_token_accuracy": 0.9788385927677155, | |
| "step": 1417 | |
| }, | |
| { | |
| "epoch": 8.29325513196481, | |
| "grad_norm": 0.4350712251738216, | |
| "learning_rate": 6.5358740795080335e-06, | |
| "loss": 0.0737, | |
| "mean_token_accuracy": 0.9740309417247772, | |
| "step": 1418 | |
| }, | |
| { | |
| "epoch": 8.299120234604105, | |
| "grad_norm": 0.4461854232913144, | |
| "learning_rate": 6.518351924801061e-06, | |
| "loss": 0.0687, | |
| "mean_token_accuracy": 0.9785968512296677, | |
| "step": 1419 | |
| }, | |
| { | |
| "epoch": 8.304985337243401, | |
| "grad_norm": 0.32524924858998455, | |
| "learning_rate": 6.500885962223969e-06, | |
| "loss": 0.0572, | |
| "mean_token_accuracy": 0.9841800928115845, | |
| "step": 1420 | |
| }, | |
| { | |
| "epoch": 8.310850439882698, | |
| "grad_norm": 0.4251399950470201, | |
| "learning_rate": 6.483476255171146e-06, | |
| "loss": 0.0712, | |
| "mean_token_accuracy": 0.977549247443676, | |
| "step": 1421 | |
| }, | |
| { | |
| "epoch": 8.316715542521994, | |
| "grad_norm": 0.36996569232809523, | |
| "learning_rate": 6.4661228668328015e-06, | |
| "loss": 0.057, | |
| "mean_token_accuracy": 0.9811054393649101, | |
| "step": 1422 | |
| }, | |
| { | |
| "epoch": 8.32258064516129, | |
| "grad_norm": 0.45412035479532575, | |
| "learning_rate": 6.448825860194722e-06, | |
| "loss": 0.0699, | |
| "mean_token_accuracy": 0.9784989804029465, | |
| "step": 1423 | |
| }, | |
| { | |
| "epoch": 8.328445747800586, | |
| "grad_norm": 0.3460756895624333, | |
| "learning_rate": 6.431585298038057e-06, | |
| "loss": 0.0468, | |
| "mean_token_accuracy": 0.9839349910616875, | |
| "step": 1424 | |
| }, | |
| { | |
| "epoch": 8.334310850439882, | |
| "grad_norm": 0.4192387101641489, | |
| "learning_rate": 6.414401242939087e-06, | |
| "loss": 0.0673, | |
| "mean_token_accuracy": 0.9777123183012009, | |
| "step": 1425 | |
| }, | |
| { | |
| "epoch": 8.340175953079179, | |
| "grad_norm": 0.4254508423152308, | |
| "learning_rate": 6.397273757268987e-06, | |
| "loss": 0.0592, | |
| "mean_token_accuracy": 0.9829535335302353, | |
| "step": 1426 | |
| }, | |
| { | |
| "epoch": 8.346041055718475, | |
| "grad_norm": 0.44331662945401107, | |
| "learning_rate": 6.380202903193616e-06, | |
| "loss": 0.0733, | |
| "mean_token_accuracy": 0.9791046679019928, | |
| "step": 1427 | |
| }, | |
| { | |
| "epoch": 8.351906158357771, | |
| "grad_norm": 0.42973375916020207, | |
| "learning_rate": 6.363188742673281e-06, | |
| "loss": 0.0636, | |
| "mean_token_accuracy": 0.9783178418874741, | |
| "step": 1428 | |
| }, | |
| { | |
| "epoch": 8.357771260997067, | |
| "grad_norm": 0.44400134517648665, | |
| "learning_rate": 6.346231337462513e-06, | |
| "loss": 0.0591, | |
| "mean_token_accuracy": 0.9802630767226219, | |
| "step": 1429 | |
| }, | |
| { | |
| "epoch": 8.363636363636363, | |
| "grad_norm": 0.5332774215139838, | |
| "learning_rate": 6.329330749109839e-06, | |
| "loss": 0.0774, | |
| "mean_token_accuracy": 0.9744322821497917, | |
| "step": 1430 | |
| }, | |
| { | |
| "epoch": 8.36950146627566, | |
| "grad_norm": 0.44670726539330724, | |
| "learning_rate": 6.312487038957573e-06, | |
| "loss": 0.0651, | |
| "mean_token_accuracy": 0.9792210832238197, | |
| "step": 1431 | |
| }, | |
| { | |
| "epoch": 8.375366568914956, | |
| "grad_norm": 0.3564342406919051, | |
| "learning_rate": 6.295700268141579e-06, | |
| "loss": 0.0534, | |
| "mean_token_accuracy": 0.9812228232622147, | |
| "step": 1432 | |
| }, | |
| { | |
| "epoch": 8.381231671554252, | |
| "grad_norm": 0.34676297877859363, | |
| "learning_rate": 6.2789704975910574e-06, | |
| "loss": 0.0534, | |
| "mean_token_accuracy": 0.9819273874163628, | |
| "step": 1433 | |
| }, | |
| { | |
| "epoch": 8.387096774193548, | |
| "grad_norm": 0.4412483888206217, | |
| "learning_rate": 6.262297788028316e-06, | |
| "loss": 0.0557, | |
| "mean_token_accuracy": 0.9792822226881981, | |
| "step": 1434 | |
| }, | |
| { | |
| "epoch": 8.392961876832844, | |
| "grad_norm": 0.4185238793799762, | |
| "learning_rate": 6.245682199968556e-06, | |
| "loss": 0.0648, | |
| "mean_token_accuracy": 0.9784470945596695, | |
| "step": 1435 | |
| }, | |
| { | |
| "epoch": 8.39882697947214, | |
| "grad_norm": 0.4118391479432624, | |
| "learning_rate": 6.229123793719656e-06, | |
| "loss": 0.0615, | |
| "mean_token_accuracy": 0.978393092751503, | |
| "step": 1436 | |
| }, | |
| { | |
| "epoch": 8.404692082111437, | |
| "grad_norm": 0.3947979484149084, | |
| "learning_rate": 6.21262262938194e-06, | |
| "loss": 0.0582, | |
| "mean_token_accuracy": 0.9816270843148232, | |
| "step": 1437 | |
| }, | |
| { | |
| "epoch": 8.410557184750733, | |
| "grad_norm": 0.3995440140541391, | |
| "learning_rate": 6.196178766847969e-06, | |
| "loss": 0.0608, | |
| "mean_token_accuracy": 0.9799474999308586, | |
| "step": 1438 | |
| }, | |
| { | |
| "epoch": 8.416422287390029, | |
| "grad_norm": 0.45065461106759747, | |
| "learning_rate": 6.1797922658023264e-06, | |
| "loss": 0.0732, | |
| "mean_token_accuracy": 0.9742324277758598, | |
| "step": 1439 | |
| }, | |
| { | |
| "epoch": 8.422287390029325, | |
| "grad_norm": 0.4231965191553397, | |
| "learning_rate": 6.16346318572139e-06, | |
| "loss": 0.0654, | |
| "mean_token_accuracy": 0.9804536998271942, | |
| "step": 1440 | |
| }, | |
| { | |
| "epoch": 8.428152492668621, | |
| "grad_norm": 0.4374272740525697, | |
| "learning_rate": 6.147191585873128e-06, | |
| "loss": 0.07, | |
| "mean_token_accuracy": 0.9777810722589493, | |
| "step": 1441 | |
| }, | |
| { | |
| "epoch": 8.434017595307918, | |
| "grad_norm": 0.377152469018675, | |
| "learning_rate": 6.130977525316878e-06, | |
| "loss": 0.0653, | |
| "mean_token_accuracy": 0.9798897877335548, | |
| "step": 1442 | |
| }, | |
| { | |
| "epoch": 8.439882697947214, | |
| "grad_norm": 0.3637818282003128, | |
| "learning_rate": 6.114821062903125e-06, | |
| "loss": 0.0617, | |
| "mean_token_accuracy": 0.9810713827610016, | |
| "step": 1443 | |
| }, | |
| { | |
| "epoch": 8.44574780058651, | |
| "grad_norm": 0.3566863914606614, | |
| "learning_rate": 6.098722257273303e-06, | |
| "loss": 0.0605, | |
| "mean_token_accuracy": 0.9771608412265778, | |
| "step": 1444 | |
| }, | |
| { | |
| "epoch": 8.451612903225806, | |
| "grad_norm": 0.5411699563362643, | |
| "learning_rate": 6.082681166859579e-06, | |
| "loss": 0.0797, | |
| "mean_token_accuracy": 0.9781805574893951, | |
| "step": 1445 | |
| }, | |
| { | |
| "epoch": 8.457478005865102, | |
| "grad_norm": 0.3721832814334966, | |
| "learning_rate": 6.066697849884629e-06, | |
| "loss": 0.0668, | |
| "mean_token_accuracy": 0.9788375273346901, | |
| "step": 1446 | |
| }, | |
| { | |
| "epoch": 8.463343108504398, | |
| "grad_norm": 0.3381336089514224, | |
| "learning_rate": 6.0507723643614415e-06, | |
| "loss": 0.0483, | |
| "mean_token_accuracy": 0.9832936450839043, | |
| "step": 1447 | |
| }, | |
| { | |
| "epoch": 8.469208211143695, | |
| "grad_norm": 0.4932957030708935, | |
| "learning_rate": 6.034904768093095e-06, | |
| "loss": 0.0642, | |
| "mean_token_accuracy": 0.9772367179393768, | |
| "step": 1448 | |
| }, | |
| { | |
| "epoch": 8.47507331378299, | |
| "grad_norm": 0.4159775829748118, | |
| "learning_rate": 6.019095118672557e-06, | |
| "loss": 0.0715, | |
| "mean_token_accuracy": 0.9766595289111137, | |
| "step": 1449 | |
| }, | |
| { | |
| "epoch": 8.480938416422287, | |
| "grad_norm": 0.458694278310154, | |
| "learning_rate": 6.003343473482469e-06, | |
| "loss": 0.0636, | |
| "mean_token_accuracy": 0.9791706949472427, | |
| "step": 1450 | |
| }, | |
| { | |
| "epoch": 8.486803519061583, | |
| "grad_norm": 0.45200635004622236, | |
| "learning_rate": 5.98764988969494e-06, | |
| "loss": 0.0677, | |
| "mean_token_accuracy": 0.9770751595497131, | |
| "step": 1451 | |
| }, | |
| { | |
| "epoch": 8.49266862170088, | |
| "grad_norm": 0.33939476846204236, | |
| "learning_rate": 5.972014424271344e-06, | |
| "loss": 0.0542, | |
| "mean_token_accuracy": 0.9825559854507446, | |
| "step": 1452 | |
| }, | |
| { | |
| "epoch": 8.498533724340176, | |
| "grad_norm": 0.40019157724081794, | |
| "learning_rate": 5.956437133962103e-06, | |
| "loss": 0.0633, | |
| "mean_token_accuracy": 0.9802030697464943, | |
| "step": 1453 | |
| }, | |
| { | |
| "epoch": 8.504398826979472, | |
| "grad_norm": 0.4903797639199131, | |
| "learning_rate": 5.94091807530649e-06, | |
| "loss": 0.0652, | |
| "mean_token_accuracy": 0.9779242128133774, | |
| "step": 1454 | |
| }, | |
| { | |
| "epoch": 8.510263929618768, | |
| "grad_norm": 0.432188854067136, | |
| "learning_rate": 5.925457304632421e-06, | |
| "loss": 0.0672, | |
| "mean_token_accuracy": 0.9784858375787735, | |
| "step": 1455 | |
| }, | |
| { | |
| "epoch": 8.516129032258064, | |
| "grad_norm": 0.4354626927723909, | |
| "learning_rate": 5.91005487805625e-06, | |
| "loss": 0.0734, | |
| "mean_token_accuracy": 0.9774255007505417, | |
| "step": 1456 | |
| }, | |
| { | |
| "epoch": 8.52199413489736, | |
| "grad_norm": 0.3889579760347756, | |
| "learning_rate": 5.894710851482563e-06, | |
| "loss": 0.0618, | |
| "mean_token_accuracy": 0.9819393903017044, | |
| "step": 1457 | |
| }, | |
| { | |
| "epoch": 8.527859237536656, | |
| "grad_norm": 0.4190443435761548, | |
| "learning_rate": 5.879425280603981e-06, | |
| "loss": 0.0662, | |
| "mean_token_accuracy": 0.9783936589956284, | |
| "step": 1458 | |
| }, | |
| { | |
| "epoch": 8.533724340175953, | |
| "grad_norm": 0.3716427805379074, | |
| "learning_rate": 5.864198220900952e-06, | |
| "loss": 0.0573, | |
| "mean_token_accuracy": 0.9804115891456604, | |
| "step": 1459 | |
| }, | |
| { | |
| "epoch": 8.539589442815249, | |
| "grad_norm": 0.4304695345860514, | |
| "learning_rate": 5.849029727641552e-06, | |
| "loss": 0.0633, | |
| "mean_token_accuracy": 0.9779717996716499, | |
| "step": 1460 | |
| }, | |
| { | |
| "epoch": 8.545454545454545, | |
| "grad_norm": 0.40629590116112413, | |
| "learning_rate": 5.833919855881286e-06, | |
| "loss": 0.0648, | |
| "mean_token_accuracy": 0.9776366725564003, | |
| "step": 1461 | |
| }, | |
| { | |
| "epoch": 8.551319648093841, | |
| "grad_norm": 0.3863860594277557, | |
| "learning_rate": 5.818868660462886e-06, | |
| "loss": 0.0587, | |
| "mean_token_accuracy": 0.9794067367911339, | |
| "step": 1462 | |
| }, | |
| { | |
| "epoch": 8.557184750733137, | |
| "grad_norm": 0.3520313645458845, | |
| "learning_rate": 5.803876196016114e-06, | |
| "loss": 0.0597, | |
| "mean_token_accuracy": 0.982727088034153, | |
| "step": 1463 | |
| }, | |
| { | |
| "epoch": 8.563049853372434, | |
| "grad_norm": 0.36907769611395586, | |
| "learning_rate": 5.788942516957561e-06, | |
| "loss": 0.0581, | |
| "mean_token_accuracy": 0.9817357435822487, | |
| "step": 1464 | |
| }, | |
| { | |
| "epoch": 8.56891495601173, | |
| "grad_norm": 0.5200410648312286, | |
| "learning_rate": 5.774067677490448e-06, | |
| "loss": 0.0703, | |
| "mean_token_accuracy": 0.9768775627017021, | |
| "step": 1465 | |
| }, | |
| { | |
| "epoch": 8.574780058651026, | |
| "grad_norm": 0.39888558956463543, | |
| "learning_rate": 5.759251731604435e-06, | |
| "loss": 0.0547, | |
| "mean_token_accuracy": 0.9802219718694687, | |
| "step": 1466 | |
| }, | |
| { | |
| "epoch": 8.580645161290322, | |
| "grad_norm": 0.44676621846863246, | |
| "learning_rate": 5.744494733075424e-06, | |
| "loss": 0.0653, | |
| "mean_token_accuracy": 0.9792397990822792, | |
| "step": 1467 | |
| }, | |
| { | |
| "epoch": 8.586510263929618, | |
| "grad_norm": 0.3931258333397362, | |
| "learning_rate": 5.729796735465359e-06, | |
| "loss": 0.0637, | |
| "mean_token_accuracy": 0.9781016334891319, | |
| "step": 1468 | |
| }, | |
| { | |
| "epoch": 8.592375366568914, | |
| "grad_norm": 0.4755690885762794, | |
| "learning_rate": 5.7151577921220356e-06, | |
| "loss": 0.067, | |
| "mean_token_accuracy": 0.9788734018802643, | |
| "step": 1469 | |
| }, | |
| { | |
| "epoch": 8.59824046920821, | |
| "grad_norm": 0.32224410997626207, | |
| "learning_rate": 5.7005779561789046e-06, | |
| "loss": 0.0514, | |
| "mean_token_accuracy": 0.9822601303458214, | |
| "step": 1470 | |
| }, | |
| { | |
| "epoch": 8.604105571847507, | |
| "grad_norm": 0.38096415492092744, | |
| "learning_rate": 5.686057280554882e-06, | |
| "loss": 0.0595, | |
| "mean_token_accuracy": 0.9811006933450699, | |
| "step": 1471 | |
| }, | |
| { | |
| "epoch": 8.609970674486803, | |
| "grad_norm": 0.3798971384798509, | |
| "learning_rate": 5.671595817954157e-06, | |
| "loss": 0.062, | |
| "mean_token_accuracy": 0.9809722378849983, | |
| "step": 1472 | |
| }, | |
| { | |
| "epoch": 8.6158357771261, | |
| "grad_norm": 0.39712814437410365, | |
| "learning_rate": 5.657193620865997e-06, | |
| "loss": 0.0583, | |
| "mean_token_accuracy": 0.9817419350147247, | |
| "step": 1473 | |
| }, | |
| { | |
| "epoch": 8.621700879765395, | |
| "grad_norm": 0.426717484698465, | |
| "learning_rate": 5.642850741564562e-06, | |
| "loss": 0.0678, | |
| "mean_token_accuracy": 0.9797801226377487, | |
| "step": 1474 | |
| }, | |
| { | |
| "epoch": 8.627565982404692, | |
| "grad_norm": 0.39180647548645714, | |
| "learning_rate": 5.62856723210871e-06, | |
| "loss": 0.0647, | |
| "mean_token_accuracy": 0.9771608114242554, | |
| "step": 1475 | |
| }, | |
| { | |
| "epoch": 8.633431085043988, | |
| "grad_norm": 0.4430059558951615, | |
| "learning_rate": 5.614343144341814e-06, | |
| "loss": 0.0674, | |
| "mean_token_accuracy": 0.976546123623848, | |
| "step": 1476 | |
| }, | |
| { | |
| "epoch": 8.639296187683284, | |
| "grad_norm": 0.3569576065869277, | |
| "learning_rate": 5.600178529891564e-06, | |
| "loss": 0.0555, | |
| "mean_token_accuracy": 0.9811270162463188, | |
| "step": 1477 | |
| }, | |
| { | |
| "epoch": 8.64516129032258, | |
| "grad_norm": 0.45417837250222987, | |
| "learning_rate": 5.58607344016979e-06, | |
| "loss": 0.074, | |
| "mean_token_accuracy": 0.9747651368379593, | |
| "step": 1478 | |
| }, | |
| { | |
| "epoch": 8.651026392961876, | |
| "grad_norm": 0.3698876433443863, | |
| "learning_rate": 5.5720279263722795e-06, | |
| "loss": 0.0592, | |
| "mean_token_accuracy": 0.9789956957101822, | |
| "step": 1479 | |
| }, | |
| { | |
| "epoch": 8.656891495601172, | |
| "grad_norm": 0.35791268214354616, | |
| "learning_rate": 5.558042039478564e-06, | |
| "loss": 0.0581, | |
| "mean_token_accuracy": 0.9798028543591499, | |
| "step": 1480 | |
| }, | |
| { | |
| "epoch": 8.662756598240469, | |
| "grad_norm": 0.45869214210455983, | |
| "learning_rate": 5.544115830251769e-06, | |
| "loss": 0.0735, | |
| "mean_token_accuracy": 0.9762693867087364, | |
| "step": 1481 | |
| }, | |
| { | |
| "epoch": 8.668621700879765, | |
| "grad_norm": 0.39328389095358174, | |
| "learning_rate": 5.530249349238407e-06, | |
| "loss": 0.0643, | |
| "mean_token_accuracy": 0.9799056574702263, | |
| "step": 1482 | |
| }, | |
| { | |
| "epoch": 8.674486803519061, | |
| "grad_norm": 0.46693918116536703, | |
| "learning_rate": 5.516442646768207e-06, | |
| "loss": 0.0726, | |
| "mean_token_accuracy": 0.9731965810060501, | |
| "step": 1483 | |
| }, | |
| { | |
| "epoch": 8.680351906158357, | |
| "grad_norm": 0.40893126252597817, | |
| "learning_rate": 5.502695772953922e-06, | |
| "loss": 0.0709, | |
| "mean_token_accuracy": 0.9753478020429611, | |
| "step": 1484 | |
| }, | |
| { | |
| "epoch": 8.686217008797653, | |
| "grad_norm": 0.4089615786595679, | |
| "learning_rate": 5.489008777691151e-06, | |
| "loss": 0.0629, | |
| "mean_token_accuracy": 0.981198251247406, | |
| "step": 1485 | |
| }, | |
| { | |
| "epoch": 8.69208211143695, | |
| "grad_norm": 0.44617842869601976, | |
| "learning_rate": 5.475381710658161e-06, | |
| "loss": 0.0645, | |
| "mean_token_accuracy": 0.9780819341540337, | |
| "step": 1486 | |
| }, | |
| { | |
| "epoch": 8.697947214076246, | |
| "grad_norm": 0.4292444383541708, | |
| "learning_rate": 5.4618146213157e-06, | |
| "loss": 0.0727, | |
| "mean_token_accuracy": 0.974052220582962, | |
| "step": 1487 | |
| }, | |
| { | |
| "epoch": 8.703812316715542, | |
| "grad_norm": 0.40014883458637857, | |
| "learning_rate": 5.448307558906822e-06, | |
| "loss": 0.0678, | |
| "mean_token_accuracy": 0.9791484698653221, | |
| "step": 1488 | |
| }, | |
| { | |
| "epoch": 8.709677419354838, | |
| "grad_norm": 0.4054936264596516, | |
| "learning_rate": 5.434860572456711e-06, | |
| "loss": 0.0609, | |
| "mean_token_accuracy": 0.9779806360602379, | |
| "step": 1489 | |
| }, | |
| { | |
| "epoch": 8.715542521994134, | |
| "grad_norm": 0.3718640448186527, | |
| "learning_rate": 5.421473710772496e-06, | |
| "loss": 0.063, | |
| "mean_token_accuracy": 0.9803246408700943, | |
| "step": 1490 | |
| }, | |
| { | |
| "epoch": 8.72140762463343, | |
| "grad_norm": 0.39050493584924245, | |
| "learning_rate": 5.408147022443077e-06, | |
| "loss": 0.0579, | |
| "mean_token_accuracy": 0.9788827151060104, | |
| "step": 1491 | |
| }, | |
| { | |
| "epoch": 8.727272727272727, | |
| "grad_norm": 0.41088568338867887, | |
| "learning_rate": 5.39488055583895e-06, | |
| "loss": 0.0681, | |
| "mean_token_accuracy": 0.9798588156700134, | |
| "step": 1492 | |
| }, | |
| { | |
| "epoch": 8.733137829912023, | |
| "grad_norm": 0.40538498630057557, | |
| "learning_rate": 5.3816743591120365e-06, | |
| "loss": 0.0624, | |
| "mean_token_accuracy": 0.9792995974421501, | |
| "step": 1493 | |
| }, | |
| { | |
| "epoch": 8.739002932551319, | |
| "grad_norm": 0.39709976977685296, | |
| "learning_rate": 5.368528480195492e-06, | |
| "loss": 0.0654, | |
| "mean_token_accuracy": 0.9794448539614677, | |
| "step": 1494 | |
| }, | |
| { | |
| "epoch": 8.744868035190615, | |
| "grad_norm": 0.29534621119213206, | |
| "learning_rate": 5.355442966803544e-06, | |
| "loss": 0.0499, | |
| "mean_token_accuracy": 0.9821311086416245, | |
| "step": 1495 | |
| }, | |
| { | |
| "epoch": 8.750733137829911, | |
| "grad_norm": 0.44576282146133084, | |
| "learning_rate": 5.342417866431326e-06, | |
| "loss": 0.0715, | |
| "mean_token_accuracy": 0.9736595973372459, | |
| "step": 1496 | |
| }, | |
| { | |
| "epoch": 8.756598240469208, | |
| "grad_norm": 0.4858368189644326, | |
| "learning_rate": 5.329453226354692e-06, | |
| "loss": 0.0683, | |
| "mean_token_accuracy": 0.9792052656412125, | |
| "step": 1497 | |
| }, | |
| { | |
| "epoch": 8.762463343108504, | |
| "grad_norm": 0.39967097238505644, | |
| "learning_rate": 5.31654909363005e-06, | |
| "loss": 0.0636, | |
| "mean_token_accuracy": 0.9812125638127327, | |
| "step": 1498 | |
| }, | |
| { | |
| "epoch": 8.7683284457478, | |
| "grad_norm": 0.48415089273894846, | |
| "learning_rate": 5.303705515094187e-06, | |
| "loss": 0.0797, | |
| "mean_token_accuracy": 0.9757314994931221, | |
| "step": 1499 | |
| }, | |
| { | |
| "epoch": 8.774193548387096, | |
| "grad_norm": 0.48987771606948255, | |
| "learning_rate": 5.290922537364109e-06, | |
| "loss": 0.0778, | |
| "mean_token_accuracy": 0.9723097234964371, | |
| "step": 1500 | |
| }, | |
| { | |
| "epoch": 8.780058651026392, | |
| "grad_norm": 0.3870672590317324, | |
| "learning_rate": 5.278200206836861e-06, | |
| "loss": 0.0667, | |
| "mean_token_accuracy": 0.976532444357872, | |
| "step": 1501 | |
| }, | |
| { | |
| "epoch": 8.785923753665688, | |
| "grad_norm": 0.3792488697620107, | |
| "learning_rate": 5.265538569689365e-06, | |
| "loss": 0.0597, | |
| "mean_token_accuracy": 0.9791921749711037, | |
| "step": 1502 | |
| }, | |
| { | |
| "epoch": 8.791788856304985, | |
| "grad_norm": 0.3755126980286468, | |
| "learning_rate": 5.25293767187825e-06, | |
| "loss": 0.0591, | |
| "mean_token_accuracy": 0.9820508360862732, | |
| "step": 1503 | |
| }, | |
| { | |
| "epoch": 8.79765395894428, | |
| "grad_norm": 0.45791046477915304, | |
| "learning_rate": 5.240397559139685e-06, | |
| "loss": 0.0665, | |
| "mean_token_accuracy": 0.976809673011303, | |
| "step": 1504 | |
| }, | |
| { | |
| "epoch": 8.803519061583577, | |
| "grad_norm": 0.339257377791184, | |
| "learning_rate": 5.227918276989215e-06, | |
| "loss": 0.0596, | |
| "mean_token_accuracy": 0.9784117043018341, | |
| "step": 1505 | |
| }, | |
| { | |
| "epoch": 8.809384164222873, | |
| "grad_norm": 0.3563307355157344, | |
| "learning_rate": 5.2154998707215976e-06, | |
| "loss": 0.0609, | |
| "mean_token_accuracy": 0.9783895686268806, | |
| "step": 1506 | |
| }, | |
| { | |
| "epoch": 8.81524926686217, | |
| "grad_norm": 0.391641788208127, | |
| "learning_rate": 5.203142385410628e-06, | |
| "loss": 0.0607, | |
| "mean_token_accuracy": 0.9819121286273003, | |
| "step": 1507 | |
| }, | |
| { | |
| "epoch": 8.821114369501466, | |
| "grad_norm": 0.3732982966201132, | |
| "learning_rate": 5.190845865908987e-06, | |
| "loss": 0.0582, | |
| "mean_token_accuracy": 0.9774733334779739, | |
| "step": 1508 | |
| }, | |
| { | |
| "epoch": 8.826979472140762, | |
| "grad_norm": 0.4446554745627886, | |
| "learning_rate": 5.178610356848075e-06, | |
| "loss": 0.065, | |
| "mean_token_accuracy": 0.9783753901720047, | |
| "step": 1509 | |
| }, | |
| { | |
| "epoch": 8.832844574780058, | |
| "grad_norm": 0.40749848731190025, | |
| "learning_rate": 5.166435902637848e-06, | |
| "loss": 0.0577, | |
| "mean_token_accuracy": 0.9798924401402473, | |
| "step": 1510 | |
| }, | |
| { | |
| "epoch": 8.838709677419354, | |
| "grad_norm": 0.3711771079046037, | |
| "learning_rate": 5.154322547466658e-06, | |
| "loss": 0.0594, | |
| "mean_token_accuracy": 0.9817164465785027, | |
| "step": 1511 | |
| }, | |
| { | |
| "epoch": 8.84457478005865, | |
| "grad_norm": 0.4084163982169611, | |
| "learning_rate": 5.142270335301095e-06, | |
| "loss": 0.058, | |
| "mean_token_accuracy": 0.9804741069674492, | |
| "step": 1512 | |
| }, | |
| { | |
| "epoch": 8.850439882697946, | |
| "grad_norm": 0.36704184008992147, | |
| "learning_rate": 5.130279309885817e-06, | |
| "loss": 0.0596, | |
| "mean_token_accuracy": 0.9785638004541397, | |
| "step": 1513 | |
| }, | |
| { | |
| "epoch": 8.856304985337243, | |
| "grad_norm": 0.48744936724567456, | |
| "learning_rate": 5.118349514743404e-06, | |
| "loss": 0.0735, | |
| "mean_token_accuracy": 0.9747727289795876, | |
| "step": 1514 | |
| }, | |
| { | |
| "epoch": 8.862170087976539, | |
| "grad_norm": 0.5326424417819942, | |
| "learning_rate": 5.1064809931741975e-06, | |
| "loss": 0.0786, | |
| "mean_token_accuracy": 0.974541112780571, | |
| "step": 1515 | |
| }, | |
| { | |
| "epoch": 8.868035190615835, | |
| "grad_norm": 0.43515177370388725, | |
| "learning_rate": 5.094673788256137e-06, | |
| "loss": 0.065, | |
| "mean_token_accuracy": 0.9825545847415924, | |
| "step": 1516 | |
| }, | |
| { | |
| "epoch": 8.873900293255131, | |
| "grad_norm": 0.49677928721350817, | |
| "learning_rate": 5.082927942844603e-06, | |
| "loss": 0.0721, | |
| "mean_token_accuracy": 0.9781579226255417, | |
| "step": 1517 | |
| }, | |
| { | |
| "epoch": 8.879765395894427, | |
| "grad_norm": 0.4087812579703352, | |
| "learning_rate": 5.0712434995722734e-06, | |
| "loss": 0.0651, | |
| "mean_token_accuracy": 0.97551279515028, | |
| "step": 1518 | |
| }, | |
| { | |
| "epoch": 8.885630498533724, | |
| "grad_norm": 0.4484581230306622, | |
| "learning_rate": 5.059620500848964e-06, | |
| "loss": 0.0652, | |
| "mean_token_accuracy": 0.9802887737751007, | |
| "step": 1519 | |
| }, | |
| { | |
| "epoch": 8.89149560117302, | |
| "grad_norm": 0.4349996172658484, | |
| "learning_rate": 5.048058988861455e-06, | |
| "loss": 0.0659, | |
| "mean_token_accuracy": 0.9786894470453262, | |
| "step": 1520 | |
| }, | |
| { | |
| "epoch": 8.897360703812316, | |
| "grad_norm": 0.3788164408254738, | |
| "learning_rate": 5.0365590055733715e-06, | |
| "loss": 0.0607, | |
| "mean_token_accuracy": 0.980890542268753, | |
| "step": 1521 | |
| }, | |
| { | |
| "epoch": 8.903225806451612, | |
| "grad_norm": 0.47821896076846393, | |
| "learning_rate": 5.025120592725009e-06, | |
| "loss": 0.0716, | |
| "mean_token_accuracy": 0.9776649847626686, | |
| "step": 1522 | |
| }, | |
| { | |
| "epoch": 8.909090909090908, | |
| "grad_norm": 0.4392969635422559, | |
| "learning_rate": 5.013743791833187e-06, | |
| "loss": 0.0667, | |
| "mean_token_accuracy": 0.9793538227677345, | |
| "step": 1523 | |
| }, | |
| { | |
| "epoch": 8.914956011730204, | |
| "grad_norm": 0.43229020194353457, | |
| "learning_rate": 5.002428644191094e-06, | |
| "loss": 0.0645, | |
| "mean_token_accuracy": 0.9785308763384819, | |
| "step": 1524 | |
| }, | |
| { | |
| "epoch": 8.9208211143695, | |
| "grad_norm": 0.5650155135114429, | |
| "learning_rate": 4.991175190868148e-06, | |
| "loss": 0.0697, | |
| "mean_token_accuracy": 0.9788858294487, | |
| "step": 1525 | |
| }, | |
| { | |
| "epoch": 8.926686217008797, | |
| "grad_norm": 0.3791246428901692, | |
| "learning_rate": 4.9799834727098415e-06, | |
| "loss": 0.0567, | |
| "mean_token_accuracy": 0.9807322397828102, | |
| "step": 1526 | |
| }, | |
| { | |
| "epoch": 8.932551319648093, | |
| "grad_norm": 0.4199681370793567, | |
| "learning_rate": 4.968853530337587e-06, | |
| "loss": 0.0667, | |
| "mean_token_accuracy": 0.979116789996624, | |
| "step": 1527 | |
| }, | |
| { | |
| "epoch": 8.93841642228739, | |
| "grad_norm": 0.33066038490437366, | |
| "learning_rate": 4.957785404148585e-06, | |
| "loss": 0.0546, | |
| "mean_token_accuracy": 0.9786024838685989, | |
| "step": 1528 | |
| }, | |
| { | |
| "epoch": 8.944281524926687, | |
| "grad_norm": 0.4365370500260649, | |
| "learning_rate": 4.946779134315662e-06, | |
| "loss": 0.0709, | |
| "mean_token_accuracy": 0.9773358777165413, | |
| "step": 1529 | |
| }, | |
| { | |
| "epoch": 8.950146627565982, | |
| "grad_norm": 0.3908055069300888, | |
| "learning_rate": 4.935834760787133e-06, | |
| "loss": 0.064, | |
| "mean_token_accuracy": 0.9808821976184845, | |
| "step": 1530 | |
| }, | |
| { | |
| "epoch": 8.95601173020528, | |
| "grad_norm": 0.42105218806769806, | |
| "learning_rate": 4.924952323286651e-06, | |
| "loss": 0.0642, | |
| "mean_token_accuracy": 0.9778595939278603, | |
| "step": 1531 | |
| }, | |
| { | |
| "epoch": 8.961876832844574, | |
| "grad_norm": 0.42944765888744796, | |
| "learning_rate": 4.91413186131307e-06, | |
| "loss": 0.0654, | |
| "mean_token_accuracy": 0.9791755601763725, | |
| "step": 1532 | |
| }, | |
| { | |
| "epoch": 8.967741935483872, | |
| "grad_norm": 0.38789725100449346, | |
| "learning_rate": 4.9033734141402964e-06, | |
| "loss": 0.0654, | |
| "mean_token_accuracy": 0.9773519560694695, | |
| "step": 1533 | |
| }, | |
| { | |
| "epoch": 8.973607038123166, | |
| "grad_norm": 0.39321990127885054, | |
| "learning_rate": 4.892677020817151e-06, | |
| "loss": 0.0621, | |
| "mean_token_accuracy": 0.9775725901126862, | |
| "step": 1534 | |
| }, | |
| { | |
| "epoch": 8.979472140762464, | |
| "grad_norm": 0.42869069521785264, | |
| "learning_rate": 4.8820427201672195e-06, | |
| "loss": 0.0653, | |
| "mean_token_accuracy": 0.9767110124230385, | |
| "step": 1535 | |
| }, | |
| { | |
| "epoch": 8.985337243401759, | |
| "grad_norm": 0.4786492298609853, | |
| "learning_rate": 4.871470550788717e-06, | |
| "loss": 0.0726, | |
| "mean_token_accuracy": 0.9734204337000847, | |
| "step": 1536 | |
| }, | |
| { | |
| "epoch": 8.991202346041057, | |
| "grad_norm": 0.4053785268805127, | |
| "learning_rate": 4.860960551054352e-06, | |
| "loss": 0.066, | |
| "mean_token_accuracy": 0.9776445254683495, | |
| "step": 1537 | |
| }, | |
| { | |
| "epoch": 8.997067448680351, | |
| "grad_norm": 0.35094233757463134, | |
| "learning_rate": 4.850512759111177e-06, | |
| "loss": 0.0643, | |
| "mean_token_accuracy": 0.9781850948929787, | |
| "step": 1538 | |
| }, | |
| { | |
| "epoch": 9.0, | |
| "grad_norm": 0.35094233757463134, | |
| "learning_rate": 4.840127212880457e-06, | |
| "loss": 0.0547, | |
| "mean_token_accuracy": 0.9817498177289963, | |
| "step": 1539 | |
| }, | |
| { | |
| "epoch": 9.005865102639296, | |
| "grad_norm": 0.5237297972324702, | |
| "learning_rate": 4.82980395005753e-06, | |
| "loss": 0.0619, | |
| "mean_token_accuracy": 0.9797078743577003, | |
| "step": 1540 | |
| }, | |
| { | |
| "epoch": 9.011730205278592, | |
| "grad_norm": 0.44512166679517556, | |
| "learning_rate": 4.8195430081116715e-06, | |
| "loss": 0.0658, | |
| "mean_token_accuracy": 0.9778907150030136, | |
| "step": 1541 | |
| }, | |
| { | |
| "epoch": 9.017595307917889, | |
| "grad_norm": 0.3971540226360805, | |
| "learning_rate": 4.809344424285959e-06, | |
| "loss": 0.0525, | |
| "mean_token_accuracy": 0.9832166880369186, | |
| "step": 1542 | |
| }, | |
| { | |
| "epoch": 9.023460410557185, | |
| "grad_norm": 0.4344511380877736, | |
| "learning_rate": 4.799208235597129e-06, | |
| "loss": 0.0669, | |
| "mean_token_accuracy": 0.975956417620182, | |
| "step": 1543 | |
| }, | |
| { | |
| "epoch": 9.029325513196481, | |
| "grad_norm": 0.4443832424460116, | |
| "learning_rate": 4.7891344788354535e-06, | |
| "loss": 0.0629, | |
| "mean_token_accuracy": 0.9792809188365936, | |
| "step": 1544 | |
| }, | |
| { | |
| "epoch": 9.035190615835777, | |
| "grad_norm": 0.4256572665128867, | |
| "learning_rate": 4.779123190564601e-06, | |
| "loss": 0.0716, | |
| "mean_token_accuracy": 0.9784202426671982, | |
| "step": 1545 | |
| }, | |
| { | |
| "epoch": 9.041055718475073, | |
| "grad_norm": 0.4221529309761034, | |
| "learning_rate": 4.769174407121508e-06, | |
| "loss": 0.0583, | |
| "mean_token_accuracy": 0.9809971302747726, | |
| "step": 1546 | |
| }, | |
| { | |
| "epoch": 9.04692082111437, | |
| "grad_norm": 0.3734102169463082, | |
| "learning_rate": 4.7592881646162336e-06, | |
| "loss": 0.073, | |
| "mean_token_accuracy": 0.9773375019431114, | |
| "step": 1547 | |
| }, | |
| { | |
| "epoch": 9.052785923753666, | |
| "grad_norm": 0.447186570979957, | |
| "learning_rate": 4.749464498931852e-06, | |
| "loss": 0.0511, | |
| "mean_token_accuracy": 0.9823371693491936, | |
| "step": 1548 | |
| }, | |
| { | |
| "epoch": 9.058651026392962, | |
| "grad_norm": 0.3671698486649876, | |
| "learning_rate": 4.739703445724296e-06, | |
| "loss": 0.0621, | |
| "mean_token_accuracy": 0.982838973402977, | |
| "step": 1549 | |
| }, | |
| { | |
| "epoch": 9.064516129032258, | |
| "grad_norm": 0.37297501174682546, | |
| "learning_rate": 4.730005040422253e-06, | |
| "loss": 0.0553, | |
| "mean_token_accuracy": 0.9813901260495186, | |
| "step": 1550 | |
| }, | |
| { | |
| "epoch": 9.070381231671554, | |
| "grad_norm": 0.38009096211845106, | |
| "learning_rate": 4.720369318227014e-06, | |
| "loss": 0.0554, | |
| "mean_token_accuracy": 0.9810118600726128, | |
| "step": 1551 | |
| }, | |
| { | |
| "epoch": 9.07624633431085, | |
| "grad_norm": 0.3900939913077741, | |
| "learning_rate": 4.710796314112358e-06, | |
| "loss": 0.06, | |
| "mean_token_accuracy": 0.980675220489502, | |
| "step": 1552 | |
| }, | |
| { | |
| "epoch": 9.082111436950147, | |
| "grad_norm": 0.4026706110167321, | |
| "learning_rate": 4.701286062824425e-06, | |
| "loss": 0.0577, | |
| "mean_token_accuracy": 0.9805927649140358, | |
| "step": 1553 | |
| }, | |
| { | |
| "epoch": 9.087976539589443, | |
| "grad_norm": 0.45380931104767136, | |
| "learning_rate": 4.691838598881587e-06, | |
| "loss": 0.0633, | |
| "mean_token_accuracy": 0.9795587509870529, | |
| "step": 1554 | |
| }, | |
| { | |
| "epoch": 9.093841642228739, | |
| "grad_norm": 0.3600842002137639, | |
| "learning_rate": 4.68245395657432e-06, | |
| "loss": 0.0595, | |
| "mean_token_accuracy": 0.9825239703059196, | |
| "step": 1555 | |
| }, | |
| { | |
| "epoch": 9.099706744868035, | |
| "grad_norm": 0.36982100027726045, | |
| "learning_rate": 4.673132169965089e-06, | |
| "loss": 0.0574, | |
| "mean_token_accuracy": 0.981265015900135, | |
| "step": 1556 | |
| }, | |
| { | |
| "epoch": 9.105571847507331, | |
| "grad_norm": 0.3408794742470351, | |
| "learning_rate": 4.663873272888212e-06, | |
| "loss": 0.0533, | |
| "mean_token_accuracy": 0.9842574149370193, | |
| "step": 1557 | |
| }, | |
| { | |
| "epoch": 9.111436950146627, | |
| "grad_norm": 0.35088642747738386, | |
| "learning_rate": 4.654677298949746e-06, | |
| "loss": 0.0589, | |
| "mean_token_accuracy": 0.9791886806488037, | |
| "step": 1558 | |
| }, | |
| { | |
| "epoch": 9.117302052785924, | |
| "grad_norm": 0.36467197514575506, | |
| "learning_rate": 4.645544281527362e-06, | |
| "loss": 0.0576, | |
| "mean_token_accuracy": 0.9801543280482292, | |
| "step": 1559 | |
| }, | |
| { | |
| "epoch": 9.12316715542522, | |
| "grad_norm": 0.34756702727558925, | |
| "learning_rate": 4.636474253770226e-06, | |
| "loss": 0.0509, | |
| "mean_token_accuracy": 0.9805246517062187, | |
| "step": 1560 | |
| }, | |
| { | |
| "epoch": 9.129032258064516, | |
| "grad_norm": 0.37383863828715663, | |
| "learning_rate": 4.627467248598876e-06, | |
| "loss": 0.0601, | |
| "mean_token_accuracy": 0.9800150692462921, | |
| "step": 1561 | |
| }, | |
| { | |
| "epoch": 9.134897360703812, | |
| "grad_norm": 0.389051327036322, | |
| "learning_rate": 4.618523298705101e-06, | |
| "loss": 0.0586, | |
| "mean_token_accuracy": 0.9797752350568771, | |
| "step": 1562 | |
| }, | |
| { | |
| "epoch": 9.140762463343108, | |
| "grad_norm": 0.4206126663755432, | |
| "learning_rate": 4.609642436551828e-06, | |
| "loss": 0.057, | |
| "mean_token_accuracy": 0.9813887402415276, | |
| "step": 1563 | |
| }, | |
| { | |
| "epoch": 9.146627565982405, | |
| "grad_norm": 0.35220076392965055, | |
| "learning_rate": 4.600824694373e-06, | |
| "loss": 0.0543, | |
| "mean_token_accuracy": 0.9830791279673576, | |
| "step": 1564 | |
| }, | |
| { | |
| "epoch": 9.1524926686217, | |
| "grad_norm": 0.3906579664451844, | |
| "learning_rate": 4.592070104173461e-06, | |
| "loss": 0.057, | |
| "mean_token_accuracy": 0.9816785305738449, | |
| "step": 1565 | |
| }, | |
| { | |
| "epoch": 9.158357771260997, | |
| "grad_norm": 0.3955333597592588, | |
| "learning_rate": 4.583378697728835e-06, | |
| "loss": 0.0604, | |
| "mean_token_accuracy": 0.9795950427651405, | |
| "step": 1566 | |
| }, | |
| { | |
| "epoch": 9.164222873900293, | |
| "grad_norm": 0.38507961041809957, | |
| "learning_rate": 4.574750506585419e-06, | |
| "loss": 0.0542, | |
| "mean_token_accuracy": 0.9788780435919762, | |
| "step": 1567 | |
| }, | |
| { | |
| "epoch": 9.17008797653959, | |
| "grad_norm": 0.4103182153300624, | |
| "learning_rate": 4.566185562060062e-06, | |
| "loss": 0.0643, | |
| "mean_token_accuracy": 0.9781305864453316, | |
| "step": 1568 | |
| }, | |
| { | |
| "epoch": 9.175953079178885, | |
| "grad_norm": 0.3887724325841942, | |
| "learning_rate": 4.557683895240052e-06, | |
| "loss": 0.0638, | |
| "mean_token_accuracy": 0.9806842058897018, | |
| "step": 1569 | |
| }, | |
| { | |
| "epoch": 9.181818181818182, | |
| "grad_norm": 0.4716208696133187, | |
| "learning_rate": 4.549245536983009e-06, | |
| "loss": 0.0596, | |
| "mean_token_accuracy": 0.9804612249135971, | |
| "step": 1570 | |
| }, | |
| { | |
| "epoch": 9.187683284457478, | |
| "grad_norm": 0.5070036957239129, | |
| "learning_rate": 4.540870517916765e-06, | |
| "loss": 0.0613, | |
| "mean_token_accuracy": 0.9818564429879189, | |
| "step": 1571 | |
| }, | |
| { | |
| "epoch": 9.193548387096774, | |
| "grad_norm": 0.4470372868463617, | |
| "learning_rate": 4.532558868439249e-06, | |
| "loss": 0.0614, | |
| "mean_token_accuracy": 0.9813555926084518, | |
| "step": 1572 | |
| }, | |
| { | |
| "epoch": 9.19941348973607, | |
| "grad_norm": 0.38207644337118024, | |
| "learning_rate": 4.524310618718403e-06, | |
| "loss": 0.058, | |
| "mean_token_accuracy": 0.9805998206138611, | |
| "step": 1573 | |
| }, | |
| { | |
| "epoch": 9.205278592375366, | |
| "grad_norm": 0.39814646651964014, | |
| "learning_rate": 4.516125798692037e-06, | |
| "loss": 0.0594, | |
| "mean_token_accuracy": 0.9801303371787071, | |
| "step": 1574 | |
| }, | |
| { | |
| "epoch": 9.211143695014663, | |
| "grad_norm": 0.43225612025999705, | |
| "learning_rate": 4.508004438067742e-06, | |
| "loss": 0.0641, | |
| "mean_token_accuracy": 0.9784739464521408, | |
| "step": 1575 | |
| }, | |
| { | |
| "epoch": 9.217008797653959, | |
| "grad_norm": 0.39246909818546494, | |
| "learning_rate": 4.4999465663227785e-06, | |
| "loss": 0.0548, | |
| "mean_token_accuracy": 0.9829608574509621, | |
| "step": 1576 | |
| }, | |
| { | |
| "epoch": 9.222873900293255, | |
| "grad_norm": 0.3600372585620087, | |
| "learning_rate": 4.491952212703964e-06, | |
| "loss": 0.059, | |
| "mean_token_accuracy": 0.9804084450006485, | |
| "step": 1577 | |
| }, | |
| { | |
| "epoch": 9.228739002932551, | |
| "grad_norm": 0.39264492417648206, | |
| "learning_rate": 4.484021406227576e-06, | |
| "loss": 0.0594, | |
| "mean_token_accuracy": 0.9805421456694603, | |
| "step": 1578 | |
| }, | |
| { | |
| "epoch": 9.234604105571847, | |
| "grad_norm": 0.42510250192373583, | |
| "learning_rate": 4.476154175679239e-06, | |
| "loss": 0.0645, | |
| "mean_token_accuracy": 0.9789351969957352, | |
| "step": 1579 | |
| }, | |
| { | |
| "epoch": 9.240469208211143, | |
| "grad_norm": 0.3938384763118357, | |
| "learning_rate": 4.468350549613822e-06, | |
| "loss": 0.0508, | |
| "mean_token_accuracy": 0.9829250946640968, | |
| "step": 1580 | |
| }, | |
| { | |
| "epoch": 9.24633431085044, | |
| "grad_norm": 0.4246125900175589, | |
| "learning_rate": 4.460610556355333e-06, | |
| "loss": 0.066, | |
| "mean_token_accuracy": 0.9760880768299103, | |
| "step": 1581 | |
| }, | |
| { | |
| "epoch": 9.252199413489736, | |
| "grad_norm": 0.36802833825061815, | |
| "learning_rate": 4.452934223996824e-06, | |
| "loss": 0.0546, | |
| "mean_token_accuracy": 0.9816494733095169, | |
| "step": 1582 | |
| }, | |
| { | |
| "epoch": 9.258064516129032, | |
| "grad_norm": 0.37411173603250736, | |
| "learning_rate": 4.445321580400281e-06, | |
| "loss": 0.0573, | |
| "mean_token_accuracy": 0.978959359228611, | |
| "step": 1583 | |
| }, | |
| { | |
| "epoch": 9.263929618768328, | |
| "grad_norm": 0.3890694859810841, | |
| "learning_rate": 4.437772653196527e-06, | |
| "loss": 0.0628, | |
| "mean_token_accuracy": 0.9814217388629913, | |
| "step": 1584 | |
| }, | |
| { | |
| "epoch": 9.269794721407624, | |
| "grad_norm": 0.44573942459915583, | |
| "learning_rate": 4.430287469785118e-06, | |
| "loss": 0.0739, | |
| "mean_token_accuracy": 0.9738015979528427, | |
| "step": 1585 | |
| }, | |
| { | |
| "epoch": 9.27565982404692, | |
| "grad_norm": 0.48678328598976534, | |
| "learning_rate": 4.422866057334246e-06, | |
| "loss": 0.0639, | |
| "mean_token_accuracy": 0.9811902418732643, | |
| "step": 1586 | |
| }, | |
| { | |
| "epoch": 9.281524926686217, | |
| "grad_norm": 0.41976267189188704, | |
| "learning_rate": 4.415508442780642e-06, | |
| "loss": 0.066, | |
| "mean_token_accuracy": 0.9764698594808578, | |
| "step": 1587 | |
| }, | |
| { | |
| "epoch": 9.287390029325513, | |
| "grad_norm": 0.4342338051618334, | |
| "learning_rate": 4.408214652829473e-06, | |
| "loss": 0.0639, | |
| "mean_token_accuracy": 0.9801175519824028, | |
| "step": 1588 | |
| }, | |
| { | |
| "epoch": 9.29325513196481, | |
| "grad_norm": 0.3584174292660337, | |
| "learning_rate": 4.400984713954253e-06, | |
| "loss": 0.0487, | |
| "mean_token_accuracy": 0.9840468168258667, | |
| "step": 1589 | |
| }, | |
| { | |
| "epoch": 9.299120234604105, | |
| "grad_norm": 0.36705643787783054, | |
| "learning_rate": 4.39381865239674e-06, | |
| "loss": 0.0668, | |
| "mean_token_accuracy": 0.9782479926943779, | |
| "step": 1590 | |
| }, | |
| { | |
| "epoch": 9.304985337243401, | |
| "grad_norm": 0.43484590696065256, | |
| "learning_rate": 4.386716494166842e-06, | |
| "loss": 0.0631, | |
| "mean_token_accuracy": 0.9777791127562523, | |
| "step": 1591 | |
| }, | |
| { | |
| "epoch": 9.310850439882698, | |
| "grad_norm": 0.45480469469795737, | |
| "learning_rate": 4.379678265042529e-06, | |
| "loss": 0.0623, | |
| "mean_token_accuracy": 0.9770721420645714, | |
| "step": 1592 | |
| }, | |
| { | |
| "epoch": 9.316715542521994, | |
| "grad_norm": 0.43015696322178737, | |
| "learning_rate": 4.372703990569725e-06, | |
| "loss": 0.0616, | |
| "mean_token_accuracy": 0.980101153254509, | |
| "step": 1593 | |
| }, | |
| { | |
| "epoch": 9.32258064516129, | |
| "grad_norm": 0.42007509787839914, | |
| "learning_rate": 4.365793696062231e-06, | |
| "loss": 0.0649, | |
| "mean_token_accuracy": 0.9778276085853577, | |
| "step": 1594 | |
| }, | |
| { | |
| "epoch": 9.328445747800586, | |
| "grad_norm": 0.4149482516608359, | |
| "learning_rate": 4.358947406601626e-06, | |
| "loss": 0.056, | |
| "mean_token_accuracy": 0.9816755205392838, | |
| "step": 1595 | |
| }, | |
| { | |
| "epoch": 9.334310850439882, | |
| "grad_norm": 0.34040334345484463, | |
| "learning_rate": 4.352165147037177e-06, | |
| "loss": 0.0614, | |
| "mean_token_accuracy": 0.9789953827857971, | |
| "step": 1596 | |
| }, | |
| { | |
| "epoch": 9.340175953079179, | |
| "grad_norm": 0.4023555043611021, | |
| "learning_rate": 4.345446941985741e-06, | |
| "loss": 0.0559, | |
| "mean_token_accuracy": 0.980284109711647, | |
| "step": 1597 | |
| }, | |
| { | |
| "epoch": 9.346041055718475, | |
| "grad_norm": 0.36962446757993983, | |
| "learning_rate": 4.338792815831698e-06, | |
| "loss": 0.0562, | |
| "mean_token_accuracy": 0.9776437729597092, | |
| "step": 1598 | |
| }, | |
| { | |
| "epoch": 9.351906158357771, | |
| "grad_norm": 0.4596662981829499, | |
| "learning_rate": 4.332202792726832e-06, | |
| "loss": 0.0695, | |
| "mean_token_accuracy": 0.9775255918502808, | |
| "step": 1599 | |
| }, | |
| { | |
| "epoch": 9.357771260997067, | |
| "grad_norm": 0.42975827996705523, | |
| "learning_rate": 4.3256768965902684e-06, | |
| "loss": 0.0641, | |
| "mean_token_accuracy": 0.9770565405488014, | |
| "step": 1600 | |
| }, | |
| { | |
| "epoch": 9.363636363636363, | |
| "grad_norm": 0.4470080718132124, | |
| "learning_rate": 4.319215151108373e-06, | |
| "loss": 0.076, | |
| "mean_token_accuracy": 0.9741783291101456, | |
| "step": 1601 | |
| }, | |
| { | |
| "epoch": 9.36950146627566, | |
| "grad_norm": 0.41580223631412677, | |
| "learning_rate": 4.312817579734673e-06, | |
| "loss": 0.0591, | |
| "mean_token_accuracy": 0.9820943027734756, | |
| "step": 1602 | |
| }, | |
| { | |
| "epoch": 9.375366568914956, | |
| "grad_norm": 0.43117316682024365, | |
| "learning_rate": 4.306484205689768e-06, | |
| "loss": 0.0675, | |
| "mean_token_accuracy": 0.9774612933397293, | |
| "step": 1603 | |
| }, | |
| { | |
| "epoch": 9.381231671554252, | |
| "grad_norm": 0.42448166413412824, | |
| "learning_rate": 4.300215051961248e-06, | |
| "loss": 0.0666, | |
| "mean_token_accuracy": 0.9788782298564911, | |
| "step": 1604 | |
| }, | |
| { | |
| "epoch": 9.387096774193548, | |
| "grad_norm": 0.41207232977220787, | |
| "learning_rate": 4.2940101413036115e-06, | |
| "loss": 0.0571, | |
| "mean_token_accuracy": 0.9822210595011711, | |
| "step": 1605 | |
| }, | |
| { | |
| "epoch": 9.392961876832844, | |
| "grad_norm": 0.3969085611246392, | |
| "learning_rate": 4.287869496238174e-06, | |
| "loss": 0.0687, | |
| "mean_token_accuracy": 0.9773961007595062, | |
| "step": 1606 | |
| }, | |
| { | |
| "epoch": 9.39882697947214, | |
| "grad_norm": 0.38674351125589457, | |
| "learning_rate": 4.281793139053001e-06, | |
| "loss": 0.0594, | |
| "mean_token_accuracy": 0.978922612965107, | |
| "step": 1607 | |
| }, | |
| { | |
| "epoch": 9.404692082111437, | |
| "grad_norm": 0.44522870176548146, | |
| "learning_rate": 4.275781091802811e-06, | |
| "loss": 0.0804, | |
| "mean_token_accuracy": 0.9744135141372681, | |
| "step": 1608 | |
| }, | |
| { | |
| "epoch": 9.410557184750733, | |
| "grad_norm": 0.5050856462926645, | |
| "learning_rate": 4.26983337630891e-06, | |
| "loss": 0.0632, | |
| "mean_token_accuracy": 0.979136548936367, | |
| "step": 1609 | |
| }, | |
| { | |
| "epoch": 9.416422287390029, | |
| "grad_norm": 0.46147350712406626, | |
| "learning_rate": 4.263950014159103e-06, | |
| "loss": 0.0643, | |
| "mean_token_accuracy": 0.9783479198813438, | |
| "step": 1610 | |
| }, | |
| { | |
| "epoch": 9.422287390029325, | |
| "grad_norm": 0.35759283649520646, | |
| "learning_rate": 4.258131026707618e-06, | |
| "loss": 0.0542, | |
| "mean_token_accuracy": 0.9823313504457474, | |
| "step": 1611 | |
| }, | |
| { | |
| "epoch": 9.428152492668621, | |
| "grad_norm": 0.4068410294223593, | |
| "learning_rate": 4.2523764350750305e-06, | |
| "loss": 0.0674, | |
| "mean_token_accuracy": 0.9794067889451981, | |
| "step": 1612 | |
| }, | |
| { | |
| "epoch": 9.434017595307918, | |
| "grad_norm": 0.4185738461055289, | |
| "learning_rate": 4.246686260148179e-06, | |
| "loss": 0.0605, | |
| "mean_token_accuracy": 0.9806213080883026, | |
| "step": 1613 | |
| }, | |
| { | |
| "epoch": 9.439882697947214, | |
| "grad_norm": 0.48641041661236, | |
| "learning_rate": 4.241060522580108e-06, | |
| "loss": 0.0753, | |
| "mean_token_accuracy": 0.9752521067857742, | |
| "step": 1614 | |
| }, | |
| { | |
| "epoch": 9.44574780058651, | |
| "grad_norm": 0.4453163153136548, | |
| "learning_rate": 4.2354992427899674e-06, | |
| "loss": 0.0565, | |
| "mean_token_accuracy": 0.9813522100448608, | |
| "step": 1615 | |
| }, | |
| { | |
| "epoch": 9.451612903225806, | |
| "grad_norm": 0.40238271305787837, | |
| "learning_rate": 4.23000244096296e-06, | |
| "loss": 0.0606, | |
| "mean_token_accuracy": 0.978582575917244, | |
| "step": 1616 | |
| }, | |
| { | |
| "epoch": 9.457478005865102, | |
| "grad_norm": 0.3997496093412971, | |
| "learning_rate": 4.224570137050254e-06, | |
| "loss": 0.049, | |
| "mean_token_accuracy": 0.9842732697725296, | |
| "step": 1617 | |
| }, | |
| { | |
| "epoch": 9.463343108504398, | |
| "grad_norm": 0.3539750438856785, | |
| "learning_rate": 4.219202350768919e-06, | |
| "loss": 0.0608, | |
| "mean_token_accuracy": 0.9794608727097511, | |
| "step": 1618 | |
| }, | |
| { | |
| "epoch": 9.469208211143695, | |
| "grad_norm": 0.38463794727155115, | |
| "learning_rate": 4.213899101601853e-06, | |
| "loss": 0.0617, | |
| "mean_token_accuracy": 0.9791621938347816, | |
| "step": 1619 | |
| }, | |
| { | |
| "epoch": 9.47507331378299, | |
| "grad_norm": 0.376559715948642, | |
| "learning_rate": 4.208660408797708e-06, | |
| "loss": 0.0616, | |
| "mean_token_accuracy": 0.9801753237843513, | |
| "step": 1620 | |
| }, | |
| { | |
| "epoch": 9.480938416422287, | |
| "grad_norm": 0.39527726743081754, | |
| "learning_rate": 4.203486291370821e-06, | |
| "loss": 0.0602, | |
| "mean_token_accuracy": 0.980688601732254, | |
| "step": 1621 | |
| }, | |
| { | |
| "epoch": 9.486803519061583, | |
| "grad_norm": 0.4421038272813132, | |
| "learning_rate": 4.198376768101149e-06, | |
| "loss": 0.0713, | |
| "mean_token_accuracy": 0.978269450366497, | |
| "step": 1622 | |
| }, | |
| { | |
| "epoch": 9.49266862170088, | |
| "grad_norm": 0.45644675105726096, | |
| "learning_rate": 4.193331857534198e-06, | |
| "loss": 0.0576, | |
| "mean_token_accuracy": 0.9808766692876816, | |
| "step": 1623 | |
| }, | |
| { | |
| "epoch": 9.498533724340176, | |
| "grad_norm": 0.3628850686906786, | |
| "learning_rate": 4.188351577980961e-06, | |
| "loss": 0.0541, | |
| "mean_token_accuracy": 0.9819683060050011, | |
| "step": 1624 | |
| }, | |
| { | |
| "epoch": 9.504398826979472, | |
| "grad_norm": 0.4057038844877085, | |
| "learning_rate": 4.183435947517836e-06, | |
| "loss": 0.0584, | |
| "mean_token_accuracy": 0.9803583696484566, | |
| "step": 1625 | |
| }, | |
| { | |
| "epoch": 9.510263929618768, | |
| "grad_norm": 0.3679923800134878, | |
| "learning_rate": 4.178584983986575e-06, | |
| "loss": 0.0505, | |
| "mean_token_accuracy": 0.9833980947732925, | |
| "step": 1626 | |
| }, | |
| { | |
| "epoch": 9.516129032258064, | |
| "grad_norm": 0.35069345182619793, | |
| "learning_rate": 4.173798704994221e-06, | |
| "loss": 0.0566, | |
| "mean_token_accuracy": 0.9815231710672379, | |
| "step": 1627 | |
| }, | |
| { | |
| "epoch": 9.52199413489736, | |
| "grad_norm": 0.3952790895573117, | |
| "learning_rate": 4.169077127913031e-06, | |
| "loss": 0.0657, | |
| "mean_token_accuracy": 0.9765211567282677, | |
| "step": 1628 | |
| }, | |
| { | |
| "epoch": 9.527859237536656, | |
| "grad_norm": 0.4108655452567332, | |
| "learning_rate": 4.164420269880422e-06, | |
| "loss": 0.061, | |
| "mean_token_accuracy": 0.9768633097410202, | |
| "step": 1629 | |
| }, | |
| { | |
| "epoch": 9.533724340175953, | |
| "grad_norm": 0.39953251365980913, | |
| "learning_rate": 4.159828147798914e-06, | |
| "loss": 0.0566, | |
| "mean_token_accuracy": 0.9824348241090775, | |
| "step": 1630 | |
| }, | |
| { | |
| "epoch": 9.539589442815249, | |
| "grad_norm": 0.3722601174054308, | |
| "learning_rate": 4.155300778336047e-06, | |
| "loss": 0.061, | |
| "mean_token_accuracy": 0.9787588939070702, | |
| "step": 1631 | |
| }, | |
| { | |
| "epoch": 9.545454545454545, | |
| "grad_norm": 0.4843120313142688, | |
| "learning_rate": 4.150838177924349e-06, | |
| "loss": 0.059, | |
| "mean_token_accuracy": 0.9840193539857864, | |
| "step": 1632 | |
| }, | |
| { | |
| "epoch": 9.551319648093841, | |
| "grad_norm": 0.3390691465939108, | |
| "learning_rate": 4.146440362761256e-06, | |
| "loss": 0.0612, | |
| "mean_token_accuracy": 0.9807577356696129, | |
| "step": 1633 | |
| }, | |
| { | |
| "epoch": 9.557184750733137, | |
| "grad_norm": 0.39197990083896755, | |
| "learning_rate": 4.142107348809058e-06, | |
| "loss": 0.0689, | |
| "mean_token_accuracy": 0.9762087687849998, | |
| "step": 1634 | |
| }, | |
| { | |
| "epoch": 9.563049853372434, | |
| "grad_norm": 0.42470639878785044, | |
| "learning_rate": 4.1378391517948505e-06, | |
| "loss": 0.0552, | |
| "mean_token_accuracy": 0.9833284765481949, | |
| "step": 1635 | |
| }, | |
| { | |
| "epoch": 9.56891495601173, | |
| "grad_norm": 0.4205496957044024, | |
| "learning_rate": 4.1336357872104614e-06, | |
| "loss": 0.0621, | |
| "mean_token_accuracy": 0.9796880483627319, | |
| "step": 1636 | |
| }, | |
| { | |
| "epoch": 9.574780058651026, | |
| "grad_norm": 0.37017733597857505, | |
| "learning_rate": 4.12949727031241e-06, | |
| "loss": 0.0634, | |
| "mean_token_accuracy": 0.9796748831868172, | |
| "step": 1637 | |
| }, | |
| { | |
| "epoch": 9.580645161290322, | |
| "grad_norm": 0.4084323347975372, | |
| "learning_rate": 4.125423616121837e-06, | |
| "loss": 0.056, | |
| "mean_token_accuracy": 0.9810463264584541, | |
| "step": 1638 | |
| }, | |
| { | |
| "epoch": 9.586510263929618, | |
| "grad_norm": 0.34366109037322107, | |
| "learning_rate": 4.121414839424464e-06, | |
| "loss": 0.0572, | |
| "mean_token_accuracy": 0.9815482944250107, | |
| "step": 1639 | |
| }, | |
| { | |
| "epoch": 9.592375366568914, | |
| "grad_norm": 0.4338152320928124, | |
| "learning_rate": 4.117470954770529e-06, | |
| "loss": 0.0662, | |
| "mean_token_accuracy": 0.9788042083382607, | |
| "step": 1640 | |
| }, | |
| { | |
| "epoch": 9.59824046920821, | |
| "grad_norm": 0.31292211012562876, | |
| "learning_rate": 4.1135919764747454e-06, | |
| "loss": 0.0549, | |
| "mean_token_accuracy": 0.9811399206519127, | |
| "step": 1641 | |
| }, | |
| { | |
| "epoch": 9.604105571847507, | |
| "grad_norm": 0.3803411380006481, | |
| "learning_rate": 4.109777918616235e-06, | |
| "loss": 0.0624, | |
| "mean_token_accuracy": 0.9828116968274117, | |
| "step": 1642 | |
| }, | |
| { | |
| "epoch": 9.609970674486803, | |
| "grad_norm": 0.3997887550659777, | |
| "learning_rate": 4.106028795038487e-06, | |
| "loss": 0.0644, | |
| "mean_token_accuracy": 0.9773474633693695, | |
| "step": 1643 | |
| }, | |
| { | |
| "epoch": 9.6158357771261, | |
| "grad_norm": 0.45750394844022496, | |
| "learning_rate": 4.102344619349307e-06, | |
| "loss": 0.0754, | |
| "mean_token_accuracy": 0.9736003801226616, | |
| "step": 1644 | |
| }, | |
| { | |
| "epoch": 9.621700879765395, | |
| "grad_norm": 0.4527628966223273, | |
| "learning_rate": 4.098725404920763e-06, | |
| "loss": 0.071, | |
| "mean_token_accuracy": 0.9779922664165497, | |
| "step": 1645 | |
| }, | |
| { | |
| "epoch": 9.627565982404692, | |
| "grad_norm": 0.4896423936917076, | |
| "learning_rate": 4.095171164889143e-06, | |
| "loss": 0.0592, | |
| "mean_token_accuracy": 0.9796406477689743, | |
| "step": 1646 | |
| }, | |
| { | |
| "epoch": 9.633431085043988, | |
| "grad_norm": 0.36707664122576916, | |
| "learning_rate": 4.091681912154903e-06, | |
| "loss": 0.0598, | |
| "mean_token_accuracy": 0.9778028726577759, | |
| "step": 1647 | |
| }, | |
| { | |
| "epoch": 9.639296187683284, | |
| "grad_norm": 0.483706919881383, | |
| "learning_rate": 4.088257659382619e-06, | |
| "loss": 0.0805, | |
| "mean_token_accuracy": 0.9733972549438477, | |
| "step": 1648 | |
| }, | |
| { | |
| "epoch": 9.64516129032258, | |
| "grad_norm": 0.49231494455117975, | |
| "learning_rate": 4.0848984190009495e-06, | |
| "loss": 0.0679, | |
| "mean_token_accuracy": 0.9761296063661575, | |
| "step": 1649 | |
| }, | |
| { | |
| "epoch": 9.651026392961876, | |
| "grad_norm": 0.3301630427949935, | |
| "learning_rate": 4.081604203202577e-06, | |
| "loss": 0.0525, | |
| "mean_token_accuracy": 0.9833341687917709, | |
| "step": 1650 | |
| }, | |
| { | |
| "epoch": 9.656891495601172, | |
| "grad_norm": 0.3566237115058431, | |
| "learning_rate": 4.078375023944175e-06, | |
| "loss": 0.0603, | |
| "mean_token_accuracy": 0.9809886366128922, | |
| "step": 1651 | |
| }, | |
| { | |
| "epoch": 9.662756598240469, | |
| "grad_norm": 0.42277619963529384, | |
| "learning_rate": 4.0752108929463625e-06, | |
| "loss": 0.0709, | |
| "mean_token_accuracy": 0.9733963310718536, | |
| "step": 1652 | |
| }, | |
| { | |
| "epoch": 9.668621700879765, | |
| "grad_norm": 0.46959145501834476, | |
| "learning_rate": 4.072111821693655e-06, | |
| "loss": 0.0659, | |
| "mean_token_accuracy": 0.9796183854341507, | |
| "step": 1653 | |
| }, | |
| { | |
| "epoch": 9.674486803519061, | |
| "grad_norm": 0.40768391294874307, | |
| "learning_rate": 4.069077821434429e-06, | |
| "loss": 0.069, | |
| "mean_token_accuracy": 0.9786179140210152, | |
| "step": 1654 | |
| }, | |
| { | |
| "epoch": 9.680351906158357, | |
| "grad_norm": 0.4324953028254213, | |
| "learning_rate": 4.06610890318088e-06, | |
| "loss": 0.0567, | |
| "mean_token_accuracy": 0.9801836982369423, | |
| "step": 1655 | |
| }, | |
| { | |
| "epoch": 9.686217008797653, | |
| "grad_norm": 0.3104747760584785, | |
| "learning_rate": 4.063205077708986e-06, | |
| "loss": 0.0571, | |
| "mean_token_accuracy": 0.980313815176487, | |
| "step": 1656 | |
| }, | |
| { | |
| "epoch": 9.69208211143695, | |
| "grad_norm": 0.45565777906402427, | |
| "learning_rate": 4.060366355558456e-06, | |
| "loss": 0.064, | |
| "mean_token_accuracy": 0.9771944209933281, | |
| "step": 1657 | |
| }, | |
| { | |
| "epoch": 9.697947214076246, | |
| "grad_norm": 0.40108477777956064, | |
| "learning_rate": 4.057592747032707e-06, | |
| "loss": 0.0757, | |
| "mean_token_accuracy": 0.9765976965427399, | |
| "step": 1658 | |
| }, | |
| { | |
| "epoch": 9.703812316715542, | |
| "grad_norm": 0.4160024663073042, | |
| "learning_rate": 4.054884262198816e-06, | |
| "loss": 0.0536, | |
| "mean_token_accuracy": 0.9806254580616951, | |
| "step": 1659 | |
| }, | |
| { | |
| "epoch": 9.709677419354838, | |
| "grad_norm": 0.32521915471745955, | |
| "learning_rate": 4.052240910887493e-06, | |
| "loss": 0.0583, | |
| "mean_token_accuracy": 0.9815412387251854, | |
| "step": 1660 | |
| }, | |
| { | |
| "epoch": 9.715542521994134, | |
| "grad_norm": 0.36235203791702825, | |
| "learning_rate": 4.049662702693031e-06, | |
| "loss": 0.059, | |
| "mean_token_accuracy": 0.979137234389782, | |
| "step": 1661 | |
| }, | |
| { | |
| "epoch": 9.72140762463343, | |
| "grad_norm": 0.4092475894856, | |
| "learning_rate": 4.047149646973288e-06, | |
| "loss": 0.0616, | |
| "mean_token_accuracy": 0.9769327864050865, | |
| "step": 1662 | |
| }, | |
| { | |
| "epoch": 9.727272727272727, | |
| "grad_norm": 0.38561476103820486, | |
| "learning_rate": 4.044701752849639e-06, | |
| "loss": 0.0568, | |
| "mean_token_accuracy": 0.9809937924146652, | |
| "step": 1663 | |
| }, | |
| { | |
| "epoch": 9.733137829912023, | |
| "grad_norm": 0.35666343819836077, | |
| "learning_rate": 4.042319029206954e-06, | |
| "loss": 0.0566, | |
| "mean_token_accuracy": 0.9806043654680252, | |
| "step": 1664 | |
| }, | |
| { | |
| "epoch": 9.739002932551319, | |
| "grad_norm": 0.3515754455773045, | |
| "learning_rate": 4.040001484693553e-06, | |
| "loss": 0.0545, | |
| "mean_token_accuracy": 0.9808334335684776, | |
| "step": 1665 | |
| }, | |
| { | |
| "epoch": 9.744868035190615, | |
| "grad_norm": 0.42662693644488175, | |
| "learning_rate": 4.037749127721191e-06, | |
| "loss": 0.0585, | |
| "mean_token_accuracy": 0.980788104236126, | |
| "step": 1666 | |
| }, | |
| { | |
| "epoch": 9.750733137829911, | |
| "grad_norm": 0.3414509899736225, | |
| "learning_rate": 4.03556196646501e-06, | |
| "loss": 0.0554, | |
| "mean_token_accuracy": 0.9821551591157913, | |
| "step": 1667 | |
| }, | |
| { | |
| "epoch": 9.756598240469208, | |
| "grad_norm": 0.38406704462610125, | |
| "learning_rate": 4.033440008863528e-06, | |
| "loss": 0.0673, | |
| "mean_token_accuracy": 0.9784510284662247, | |
| "step": 1668 | |
| }, | |
| { | |
| "epoch": 9.762463343108504, | |
| "grad_norm": 0.39541896479899535, | |
| "learning_rate": 4.031383262618588e-06, | |
| "loss": 0.0682, | |
| "mean_token_accuracy": 0.9789265245199203, | |
| "step": 1669 | |
| }, | |
| { | |
| "epoch": 9.7683284457478, | |
| "grad_norm": 0.44476429549220314, | |
| "learning_rate": 4.0293917351953505e-06, | |
| "loss": 0.0611, | |
| "mean_token_accuracy": 0.9808274731040001, | |
| "step": 1670 | |
| }, | |
| { | |
| "epoch": 9.774193548387096, | |
| "grad_norm": 0.3761626945474277, | |
| "learning_rate": 4.027465433822255e-06, | |
| "loss": 0.0565, | |
| "mean_token_accuracy": 0.9801682159304619, | |
| "step": 1671 | |
| }, | |
| { | |
| "epoch": 9.780058651026392, | |
| "grad_norm": 0.3849732857892804, | |
| "learning_rate": 4.025604365490999e-06, | |
| "loss": 0.0594, | |
| "mean_token_accuracy": 0.9807082116603851, | |
| "step": 1672 | |
| }, | |
| { | |
| "epoch": 9.785923753665688, | |
| "grad_norm": 0.3654616184162646, | |
| "learning_rate": 4.0238085369565085e-06, | |
| "loss": 0.0596, | |
| "mean_token_accuracy": 0.9807479158043861, | |
| "step": 1673 | |
| }, | |
| { | |
| "epoch": 9.791788856304985, | |
| "grad_norm": 0.36013060483944914, | |
| "learning_rate": 4.022077954736916e-06, | |
| "loss": 0.0608, | |
| "mean_token_accuracy": 0.9814224913716316, | |
| "step": 1674 | |
| }, | |
| { | |
| "epoch": 9.79765395894428, | |
| "grad_norm": 0.4312325517010621, | |
| "learning_rate": 4.020412625113535e-06, | |
| "loss": 0.0608, | |
| "mean_token_accuracy": 0.9808504059910774, | |
| "step": 1675 | |
| }, | |
| { | |
| "epoch": 9.803519061583577, | |
| "grad_norm": 0.4150291627126331, | |
| "learning_rate": 4.018812554130839e-06, | |
| "loss": 0.0749, | |
| "mean_token_accuracy": 0.9782455414533615, | |
| "step": 1676 | |
| }, | |
| { | |
| "epoch": 9.809384164222873, | |
| "grad_norm": 0.4560487035026618, | |
| "learning_rate": 4.01727774759644e-06, | |
| "loss": 0.0652, | |
| "mean_token_accuracy": 0.9782388657331467, | |
| "step": 1677 | |
| }, | |
| { | |
| "epoch": 9.81524926686217, | |
| "grad_norm": 0.4118036587851704, | |
| "learning_rate": 4.0158082110810695e-06, | |
| "loss": 0.0573, | |
| "mean_token_accuracy": 0.9800616651773453, | |
| "step": 1678 | |
| }, | |
| { | |
| "epoch": 9.821114369501466, | |
| "grad_norm": 0.3973369869246667, | |
| "learning_rate": 4.014403949918545e-06, | |
| "loss": 0.0592, | |
| "mean_token_accuracy": 0.9799008816480637, | |
| "step": 1679 | |
| }, | |
| { | |
| "epoch": 9.826979472140762, | |
| "grad_norm": 0.41088958891964444, | |
| "learning_rate": 4.0130649692057715e-06, | |
| "loss": 0.0635, | |
| "mean_token_accuracy": 0.9786461591720581, | |
| "step": 1680 | |
| }, | |
| { | |
| "epoch": 9.832844574780058, | |
| "grad_norm": 0.4143453814451513, | |
| "learning_rate": 4.01179127380271e-06, | |
| "loss": 0.0677, | |
| "mean_token_accuracy": 0.9771526828408241, | |
| "step": 1681 | |
| }, | |
| { | |
| "epoch": 9.838709677419354, | |
| "grad_norm": 0.380721992706745, | |
| "learning_rate": 4.010582868332353e-06, | |
| "loss": 0.0536, | |
| "mean_token_accuracy": 0.9828857779502869, | |
| "step": 1682 | |
| }, | |
| { | |
| "epoch": 9.84457478005865, | |
| "grad_norm": 0.39817461806336085, | |
| "learning_rate": 4.009439757180732e-06, | |
| "loss": 0.062, | |
| "mean_token_accuracy": 0.9773422405123711, | |
| "step": 1683 | |
| }, | |
| { | |
| "epoch": 9.850439882697946, | |
| "grad_norm": 0.4300740276902467, | |
| "learning_rate": 4.008361944496875e-06, | |
| "loss": 0.0614, | |
| "mean_token_accuracy": 0.9794062152504921, | |
| "step": 1684 | |
| }, | |
| { | |
| "epoch": 9.856304985337243, | |
| "grad_norm": 0.44892781542634275, | |
| "learning_rate": 4.00734943419281e-06, | |
| "loss": 0.0715, | |
| "mean_token_accuracy": 0.9764518886804581, | |
| "step": 1685 | |
| }, | |
| { | |
| "epoch": 9.862170087976539, | |
| "grad_norm": 0.44639082159205296, | |
| "learning_rate": 4.006402229943534e-06, | |
| "loss": 0.0628, | |
| "mean_token_accuracy": 0.9779683649539948, | |
| "step": 1686 | |
| }, | |
| { | |
| "epoch": 9.868035190615835, | |
| "grad_norm": 0.3674527728427102, | |
| "learning_rate": 4.005520335187023e-06, | |
| "loss": 0.065, | |
| "mean_token_accuracy": 0.9792613238096237, | |
| "step": 1687 | |
| }, | |
| { | |
| "epoch": 9.873900293255131, | |
| "grad_norm": 0.4474897338230872, | |
| "learning_rate": 4.004703753124195e-06, | |
| "loss": 0.0653, | |
| "mean_token_accuracy": 0.9801800698041916, | |
| "step": 1688 | |
| }, | |
| { | |
| "epoch": 9.879765395894427, | |
| "grad_norm": 0.36367963546103926, | |
| "learning_rate": 4.003952486718913e-06, | |
| "loss": 0.0545, | |
| "mean_token_accuracy": 0.9815207943320274, | |
| "step": 1689 | |
| }, | |
| { | |
| "epoch": 9.885630498533724, | |
| "grad_norm": 0.35053871103024664, | |
| "learning_rate": 4.003266538697973e-06, | |
| "loss": 0.0597, | |
| "mean_token_accuracy": 0.9800309017300606, | |
| "step": 1690 | |
| }, | |
| { | |
| "epoch": 9.89149560117302, | |
| "grad_norm": 0.3548403767867107, | |
| "learning_rate": 4.002645911551086e-06, | |
| "loss": 0.055, | |
| "mean_token_accuracy": 0.9804805964231491, | |
| "step": 1691 | |
| }, | |
| { | |
| "epoch": 9.897360703812316, | |
| "grad_norm": 0.355690224320949, | |
| "learning_rate": 4.002090607530882e-06, | |
| "loss": 0.0611, | |
| "mean_token_accuracy": 0.9794489368796349, | |
| "step": 1692 | |
| }, | |
| { | |
| "epoch": 9.903225806451612, | |
| "grad_norm": 0.41235923928942947, | |
| "learning_rate": 4.001600628652887e-06, | |
| "loss": 0.0745, | |
| "mean_token_accuracy": 0.974931038916111, | |
| "step": 1693 | |
| }, | |
| { | |
| "epoch": 9.909090909090908, | |
| "grad_norm": 0.41724843581429083, | |
| "learning_rate": 4.001175976695527e-06, | |
| "loss": 0.0682, | |
| "mean_token_accuracy": 0.9750220105051994, | |
| "step": 1694 | |
| }, | |
| { | |
| "epoch": 9.914956011730204, | |
| "grad_norm": 0.4054950367550597, | |
| "learning_rate": 4.000816653200117e-06, | |
| "loss": 0.0529, | |
| "mean_token_accuracy": 0.9841560870409012, | |
| "step": 1695 | |
| }, | |
| { | |
| "epoch": 9.9208211143695, | |
| "grad_norm": 0.3962265652117708, | |
| "learning_rate": 4.000522659470857e-06, | |
| "loss": 0.0589, | |
| "mean_token_accuracy": 0.9802296236157417, | |
| "step": 1696 | |
| }, | |
| { | |
| "epoch": 9.926686217008797, | |
| "grad_norm": 0.48407966179185175, | |
| "learning_rate": 4.000293996574826e-06, | |
| "loss": 0.0788, | |
| "mean_token_accuracy": 0.9757120013237, | |
| "step": 1697 | |
| }, | |
| { | |
| "epoch": 9.932551319648093, | |
| "grad_norm": 0.445480333364879, | |
| "learning_rate": 4.000130665341977e-06, | |
| "loss": 0.0739, | |
| "mean_token_accuracy": 0.976716510951519, | |
| "step": 1698 | |
| }, | |
| { | |
| "epoch": 9.93841642228739, | |
| "grad_norm": 0.3844841575441866, | |
| "learning_rate": 4.000032666365136e-06, | |
| "loss": 0.0587, | |
| "mean_token_accuracy": 0.9814257994294167, | |
| "step": 1699 | |
| }, | |
| { | |
| "epoch": 9.944281524926687, | |
| "grad_norm": 0.36797610892426286, | |
| "learning_rate": 4.000000000000001e-06, | |
| "loss": 0.0585, | |
| "mean_token_accuracy": 0.9790479391813278, | |
| "step": 1700 | |
| }, | |
| { | |
| "epoch": 9.944281524926687, | |
| "step": 1700, | |
| "total_flos": 16679922416640.0, | |
| "train_loss": 0.20836789276889142, | |
| "train_runtime": 36621.1179, | |
| "train_samples_per_second": 1.49, | |
| "train_steps_per_second": 0.046 | |
| } | |
| ], | |
| "logging_steps": 1, | |
| "max_steps": 1700, | |
| "num_input_tokens_seen": 0, | |
| "num_train_epochs": 10, | |
| "save_steps": 200, | |
| "stateful_callbacks": { | |
| "TrainerControl": { | |
| "args": { | |
| "should_epoch_stop": false, | |
| "should_evaluate": false, | |
| "should_log": false, | |
| "should_save": true, | |
| "should_training_stop": true | |
| }, | |
| "attributes": {} | |
| } | |
| }, | |
| "total_flos": 16679922416640.0, | |
| "train_batch_size": 1, | |
| "trial_name": null, | |
| "trial_params": null | |
| } | |