Text Generation
Transformers
Safetensors
llama
Generated from Trainer
trl
sft
conversational
text-generation-inference
Instructions to use cfei621/OlympicCoder-32B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use cfei621/OlympicCoder-32B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="cfei621/OlympicCoder-32B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("cfei621/OlympicCoder-32B") model = AutoModelForCausalLM.from_pretrained("cfei621/OlympicCoder-32B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use cfei621/OlympicCoder-32B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "cfei621/OlympicCoder-32B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "cfei621/OlympicCoder-32B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/cfei621/OlympicCoder-32B
- SGLang
How to use cfei621/OlympicCoder-32B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "cfei621/OlympicCoder-32B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "cfei621/OlympicCoder-32B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "cfei621/OlympicCoder-32B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "cfei621/OlympicCoder-32B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use cfei621/OlympicCoder-32B with Docker Model Runner:
docker model run hf.co/cfei621/OlympicCoder-32B
| { | |
| "best_metric": null, | |
| "best_model_checkpoint": null, | |
| "epoch": 9.944281524926687, | |
| "eval_steps": 500, | |
| "global_step": 1700, | |
| "is_hyper_param_search": false, | |
| "is_local_process_zero": true, | |
| "is_world_process_zero": true, | |
| "log_history": [ | |
| { | |
| "epoch": 0.005865102639296188, | |
| "grad_norm": 36.851659359587856, | |
| "learning_rate": 7.843137254901962e-07, | |
| "loss": 2.4374, | |
| "mean_token_accuracy": 0.5096197873353958, | |
| "step": 1 | |
| }, | |
| { | |
| "epoch": 0.011730205278592375, | |
| "grad_norm": 37.00745341487253, | |
| "learning_rate": 1.5686274509803923e-06, | |
| "loss": 2.5183, | |
| "mean_token_accuracy": 0.5060627236962318, | |
| "step": 2 | |
| }, | |
| { | |
| "epoch": 0.017595307917888565, | |
| "grad_norm": 33.06806834510508, | |
| "learning_rate": 2.3529411764705885e-06, | |
| "loss": 2.3759, | |
| "mean_token_accuracy": 0.5072058513760567, | |
| "step": 3 | |
| }, | |
| { | |
| "epoch": 0.02346041055718475, | |
| "grad_norm": 24.17744815738214, | |
| "learning_rate": 3.1372549019607846e-06, | |
| "loss": 2.291, | |
| "mean_token_accuracy": 0.5175964683294296, | |
| "step": 4 | |
| }, | |
| { | |
| "epoch": 0.02932551319648094, | |
| "grad_norm": 15.532179533817953, | |
| "learning_rate": 3.92156862745098e-06, | |
| "loss": 2.0552, | |
| "mean_token_accuracy": 0.5564254969358444, | |
| "step": 5 | |
| }, | |
| { | |
| "epoch": 0.03519061583577713, | |
| "grad_norm": 12.400572852662727, | |
| "learning_rate": 4.705882352941177e-06, | |
| "loss": 1.8973, | |
| "mean_token_accuracy": 0.5799022987484932, | |
| "step": 6 | |
| }, | |
| { | |
| "epoch": 0.04105571847507331, | |
| "grad_norm": 12.714649590022084, | |
| "learning_rate": 5.4901960784313735e-06, | |
| "loss": 1.8896, | |
| "mean_token_accuracy": 0.5862231701612473, | |
| "step": 7 | |
| }, | |
| { | |
| "epoch": 0.0469208211143695, | |
| "grad_norm": 15.580490797392397, | |
| "learning_rate": 6.274509803921569e-06, | |
| "loss": 1.7358, | |
| "mean_token_accuracy": 0.5966581851243973, | |
| "step": 8 | |
| }, | |
| { | |
| "epoch": 0.05278592375366569, | |
| "grad_norm": 11.702112786205797, | |
| "learning_rate": 7.058823529411766e-06, | |
| "loss": 1.5665, | |
| "mean_token_accuracy": 0.6283580958843231, | |
| "step": 9 | |
| }, | |
| { | |
| "epoch": 0.05865102639296188, | |
| "grad_norm": 8.70207348765521, | |
| "learning_rate": 7.84313725490196e-06, | |
| "loss": 1.4954, | |
| "mean_token_accuracy": 0.6435549259185791, | |
| "step": 10 | |
| }, | |
| { | |
| "epoch": 0.06451612903225806, | |
| "grad_norm": 10.43187874858921, | |
| "learning_rate": 8.627450980392157e-06, | |
| "loss": 1.3849, | |
| "mean_token_accuracy": 0.661979116499424, | |
| "step": 11 | |
| }, | |
| { | |
| "epoch": 0.07038123167155426, | |
| "grad_norm": 8.446666521329538, | |
| "learning_rate": 9.411764705882354e-06, | |
| "loss": 1.3824, | |
| "mean_token_accuracy": 0.6618056520819664, | |
| "step": 12 | |
| }, | |
| { | |
| "epoch": 0.07624633431085044, | |
| "grad_norm": 8.252951365329222, | |
| "learning_rate": 1.0196078431372549e-05, | |
| "loss": 1.3473, | |
| "mean_token_accuracy": 0.6602734103798866, | |
| "step": 13 | |
| }, | |
| { | |
| "epoch": 0.08211143695014662, | |
| "grad_norm": 6.8948482176419015, | |
| "learning_rate": 1.0980392156862747e-05, | |
| "loss": 1.4417, | |
| "mean_token_accuracy": 0.6434540897607803, | |
| "step": 14 | |
| }, | |
| { | |
| "epoch": 0.08797653958944282, | |
| "grad_norm": 6.539460909791532, | |
| "learning_rate": 1.1764705882352942e-05, | |
| "loss": 1.4346, | |
| "mean_token_accuracy": 0.6494264975190163, | |
| "step": 15 | |
| }, | |
| { | |
| "epoch": 0.093841642228739, | |
| "grad_norm": 5.434354265326855, | |
| "learning_rate": 1.2549019607843138e-05, | |
| "loss": 1.3035, | |
| "mean_token_accuracy": 0.6783203706145287, | |
| "step": 16 | |
| }, | |
| { | |
| "epoch": 0.09970674486803519, | |
| "grad_norm": 4.7148773508202515, | |
| "learning_rate": 1.3333333333333333e-05, | |
| "loss": 1.2174, | |
| "mean_token_accuracy": 0.6834078431129456, | |
| "step": 17 | |
| }, | |
| { | |
| "epoch": 0.10557184750733138, | |
| "grad_norm": 5.873881991004589, | |
| "learning_rate": 1.4117647058823532e-05, | |
| "loss": 1.1852, | |
| "mean_token_accuracy": 0.6986361294984818, | |
| "step": 18 | |
| }, | |
| { | |
| "epoch": 0.11143695014662756, | |
| "grad_norm": 25.463977260503608, | |
| "learning_rate": 1.4901960784313726e-05, | |
| "loss": 1.4844, | |
| "mean_token_accuracy": 0.6396931707859039, | |
| "step": 19 | |
| }, | |
| { | |
| "epoch": 0.11730205278592376, | |
| "grad_norm": 7.1572160909660125, | |
| "learning_rate": 1.568627450980392e-05, | |
| "loss": 1.2791, | |
| "mean_token_accuracy": 0.6703987568616867, | |
| "step": 20 | |
| }, | |
| { | |
| "epoch": 0.12316715542521994, | |
| "grad_norm": 4.095861178832352, | |
| "learning_rate": 1.647058823529412e-05, | |
| "loss": 1.2234, | |
| "mean_token_accuracy": 0.6897248178720474, | |
| "step": 21 | |
| }, | |
| { | |
| "epoch": 0.12903225806451613, | |
| "grad_norm": 4.773764142523696, | |
| "learning_rate": 1.7254901960784314e-05, | |
| "loss": 1.2771, | |
| "mean_token_accuracy": 0.6836260929703712, | |
| "step": 22 | |
| }, | |
| { | |
| "epoch": 0.1348973607038123, | |
| "grad_norm": 4.037667101234835, | |
| "learning_rate": 1.8039215686274513e-05, | |
| "loss": 1.1467, | |
| "mean_token_accuracy": 0.7134711667895317, | |
| "step": 23 | |
| }, | |
| { | |
| "epoch": 0.14076246334310852, | |
| "grad_norm": 4.935180182733481, | |
| "learning_rate": 1.8823529411764708e-05, | |
| "loss": 1.2157, | |
| "mean_token_accuracy": 0.6998878344893456, | |
| "step": 24 | |
| }, | |
| { | |
| "epoch": 0.1466275659824047, | |
| "grad_norm": 4.721141853747017, | |
| "learning_rate": 1.9607843137254903e-05, | |
| "loss": 1.1784, | |
| "mean_token_accuracy": 0.6964201852679253, | |
| "step": 25 | |
| }, | |
| { | |
| "epoch": 0.15249266862170088, | |
| "grad_norm": 5.246228951758431, | |
| "learning_rate": 2.0392156862745097e-05, | |
| "loss": 1.204, | |
| "mean_token_accuracy": 0.7094837054610252, | |
| "step": 26 | |
| }, | |
| { | |
| "epoch": 0.15835777126099707, | |
| "grad_norm": 4.682236592289397, | |
| "learning_rate": 2.1176470588235296e-05, | |
| "loss": 1.2207, | |
| "mean_token_accuracy": 0.7027621790766716, | |
| "step": 27 | |
| }, | |
| { | |
| "epoch": 0.16422287390029325, | |
| "grad_norm": 3.9239160089159455, | |
| "learning_rate": 2.1960784313725494e-05, | |
| "loss": 1.0857, | |
| "mean_token_accuracy": 0.7144983857870102, | |
| "step": 28 | |
| }, | |
| { | |
| "epoch": 0.17008797653958943, | |
| "grad_norm": 4.527625873838982, | |
| "learning_rate": 2.274509803921569e-05, | |
| "loss": 1.1507, | |
| "mean_token_accuracy": 0.7080343216657639, | |
| "step": 29 | |
| }, | |
| { | |
| "epoch": 0.17595307917888564, | |
| "grad_norm": 4.522012886880006, | |
| "learning_rate": 2.3529411764705884e-05, | |
| "loss": 1.2317, | |
| "mean_token_accuracy": 0.6883119121193886, | |
| "step": 30 | |
| }, | |
| { | |
| "epoch": 0.18181818181818182, | |
| "grad_norm": 4.0469531644008505, | |
| "learning_rate": 2.431372549019608e-05, | |
| "loss": 1.3669, | |
| "mean_token_accuracy": 0.6731580421328545, | |
| "step": 31 | |
| }, | |
| { | |
| "epoch": 0.187683284457478, | |
| "grad_norm": 3.9549406080521745, | |
| "learning_rate": 2.5098039215686277e-05, | |
| "loss": 1.113, | |
| "mean_token_accuracy": 0.7163452804088593, | |
| "step": 32 | |
| }, | |
| { | |
| "epoch": 0.1935483870967742, | |
| "grad_norm": 4.515614395996245, | |
| "learning_rate": 2.5882352941176475e-05, | |
| "loss": 1.2258, | |
| "mean_token_accuracy": 0.694293312728405, | |
| "step": 33 | |
| }, | |
| { | |
| "epoch": 0.19941348973607037, | |
| "grad_norm": 3.7187249983876667, | |
| "learning_rate": 2.6666666666666667e-05, | |
| "loss": 1.0837, | |
| "mean_token_accuracy": 0.72011748701334, | |
| "step": 34 | |
| }, | |
| { | |
| "epoch": 0.20527859237536658, | |
| "grad_norm": 4.216460194700125, | |
| "learning_rate": 2.7450980392156865e-05, | |
| "loss": 1.1397, | |
| "mean_token_accuracy": 0.7128811702132225, | |
| "step": 35 | |
| }, | |
| { | |
| "epoch": 0.21114369501466276, | |
| "grad_norm": 5.046740107610314, | |
| "learning_rate": 2.8235294117647063e-05, | |
| "loss": 1.1826, | |
| "mean_token_accuracy": 0.7074279710650444, | |
| "step": 36 | |
| }, | |
| { | |
| "epoch": 0.21700879765395895, | |
| "grad_norm": 3.998671819509439, | |
| "learning_rate": 2.9019607843137258e-05, | |
| "loss": 1.0887, | |
| "mean_token_accuracy": 0.7202624008059502, | |
| "step": 37 | |
| }, | |
| { | |
| "epoch": 0.22287390029325513, | |
| "grad_norm": 3.82846591040134, | |
| "learning_rate": 2.9803921568627453e-05, | |
| "loss": 1.0302, | |
| "mean_token_accuracy": 0.7288055121898651, | |
| "step": 38 | |
| }, | |
| { | |
| "epoch": 0.2287390029325513, | |
| "grad_norm": 4.328947102808582, | |
| "learning_rate": 3.0588235294117644e-05, | |
| "loss": 1.3474, | |
| "mean_token_accuracy": 0.665394015610218, | |
| "step": 39 | |
| }, | |
| { | |
| "epoch": 0.23460410557184752, | |
| "grad_norm": 3.534740245743921, | |
| "learning_rate": 3.137254901960784e-05, | |
| "loss": 1.0633, | |
| "mean_token_accuracy": 0.7198921740055084, | |
| "step": 40 | |
| }, | |
| { | |
| "epoch": 0.2404692082111437, | |
| "grad_norm": 3.392870597157443, | |
| "learning_rate": 3.215686274509804e-05, | |
| "loss": 1.0412, | |
| "mean_token_accuracy": 0.7338574528694153, | |
| "step": 41 | |
| }, | |
| { | |
| "epoch": 0.24633431085043989, | |
| "grad_norm": 3.8940128199078816, | |
| "learning_rate": 3.294117647058824e-05, | |
| "loss": 1.0445, | |
| "mean_token_accuracy": 0.7283758819103241, | |
| "step": 42 | |
| }, | |
| { | |
| "epoch": 0.25219941348973607, | |
| "grad_norm": 3.3975334500995986, | |
| "learning_rate": 3.372549019607844e-05, | |
| "loss": 0.8871, | |
| "mean_token_accuracy": 0.7578002512454987, | |
| "step": 43 | |
| }, | |
| { | |
| "epoch": 0.25806451612903225, | |
| "grad_norm": 3.4126700803478633, | |
| "learning_rate": 3.450980392156863e-05, | |
| "loss": 1.0618, | |
| "mean_token_accuracy": 0.7234469875693321, | |
| "step": 44 | |
| }, | |
| { | |
| "epoch": 0.26392961876832843, | |
| "grad_norm": 3.8586208099678645, | |
| "learning_rate": 3.529411764705883e-05, | |
| "loss": 0.9222, | |
| "mean_token_accuracy": 0.749978631734848, | |
| "step": 45 | |
| }, | |
| { | |
| "epoch": 0.2697947214076246, | |
| "grad_norm": 3.9582516129682803, | |
| "learning_rate": 3.6078431372549025e-05, | |
| "loss": 1.1156, | |
| "mean_token_accuracy": 0.7032267674803734, | |
| "step": 46 | |
| }, | |
| { | |
| "epoch": 0.2756598240469208, | |
| "grad_norm": 3.8710398537085586, | |
| "learning_rate": 3.686274509803922e-05, | |
| "loss": 1.0898, | |
| "mean_token_accuracy": 0.7087654024362564, | |
| "step": 47 | |
| }, | |
| { | |
| "epoch": 0.28152492668621704, | |
| "grad_norm": 3.689167131330777, | |
| "learning_rate": 3.7647058823529415e-05, | |
| "loss": 1.0075, | |
| "mean_token_accuracy": 0.7357660159468651, | |
| "step": 48 | |
| }, | |
| { | |
| "epoch": 0.2873900293255132, | |
| "grad_norm": 3.514337462533497, | |
| "learning_rate": 3.8431372549019614e-05, | |
| "loss": 1.0222, | |
| "mean_token_accuracy": 0.7442046403884888, | |
| "step": 49 | |
| }, | |
| { | |
| "epoch": 0.2932551319648094, | |
| "grad_norm": 3.8762873582868718, | |
| "learning_rate": 3.9215686274509805e-05, | |
| "loss": 1.23, | |
| "mean_token_accuracy": 0.6920453310012817, | |
| "step": 50 | |
| }, | |
| { | |
| "epoch": 0.2991202346041056, | |
| "grad_norm": 3.0719439332779332, | |
| "learning_rate": 4e-05, | |
| "loss": 0.9398, | |
| "mean_token_accuracy": 0.7474829107522964, | |
| "step": 51 | |
| }, | |
| { | |
| "epoch": 0.30498533724340177, | |
| "grad_norm": 3.5924564436025497, | |
| "learning_rate": 3.999996733363487e-05, | |
| "loss": 1.0803, | |
| "mean_token_accuracy": 0.7261487022042274, | |
| "step": 52 | |
| }, | |
| { | |
| "epoch": 0.31085043988269795, | |
| "grad_norm": 3.416094380030762, | |
| "learning_rate": 3.9999869334658026e-05, | |
| "loss": 1.0064, | |
| "mean_token_accuracy": 0.7352629378437996, | |
| "step": 53 | |
| }, | |
| { | |
| "epoch": 0.31671554252199413, | |
| "grad_norm": 3.391864669571874, | |
| "learning_rate": 3.9999706003425177e-05, | |
| "loss": 1.0714, | |
| "mean_token_accuracy": 0.7223077043890953, | |
| "step": 54 | |
| }, | |
| { | |
| "epoch": 0.3225806451612903, | |
| "grad_norm": 3.706914591081663, | |
| "learning_rate": 3.999947734052915e-05, | |
| "loss": 1.1963, | |
| "mean_token_accuracy": 0.704016849398613, | |
| "step": 55 | |
| }, | |
| { | |
| "epoch": 0.3284457478005865, | |
| "grad_norm": 3.372340432488623, | |
| "learning_rate": 3.999918334679989e-05, | |
| "loss": 1.0763, | |
| "mean_token_accuracy": 0.7157147005200386, | |
| "step": 56 | |
| }, | |
| { | |
| "epoch": 0.3343108504398827, | |
| "grad_norm": 3.36158570045305, | |
| "learning_rate": 3.999882402330448e-05, | |
| "loss": 1.0139, | |
| "mean_token_accuracy": 0.7262187004089355, | |
| "step": 57 | |
| }, | |
| { | |
| "epoch": 0.34017595307917886, | |
| "grad_norm": 3.451164603835625, | |
| "learning_rate": 3.999839937134712e-05, | |
| "loss": 0.9333, | |
| "mean_token_accuracy": 0.7455613538622856, | |
| "step": 58 | |
| }, | |
| { | |
| "epoch": 0.3460410557184751, | |
| "grad_norm": 3.9810759690964876, | |
| "learning_rate": 3.999790939246912e-05, | |
| "loss": 1.2167, | |
| "mean_token_accuracy": 0.6962130814790726, | |
| "step": 59 | |
| }, | |
| { | |
| "epoch": 0.3519061583577713, | |
| "grad_norm": 3.519609525609912, | |
| "learning_rate": 3.999735408844892e-05, | |
| "loss": 0.9685, | |
| "mean_token_accuracy": 0.7478121444582939, | |
| "step": 60 | |
| }, | |
| { | |
| "epoch": 0.35777126099706746, | |
| "grad_norm": 3.1213934449857277, | |
| "learning_rate": 3.999673346130203e-05, | |
| "loss": 1.0734, | |
| "mean_token_accuracy": 0.7303350269794464, | |
| "step": 61 | |
| }, | |
| { | |
| "epoch": 0.36363636363636365, | |
| "grad_norm": 3.4796741528734656, | |
| "learning_rate": 3.999604751328109e-05, | |
| "loss": 0.9029, | |
| "mean_token_accuracy": 0.7691665366292, | |
| "step": 62 | |
| }, | |
| { | |
| "epoch": 0.36950146627565983, | |
| "grad_norm": 3.452794361053103, | |
| "learning_rate": 3.999529624687581e-05, | |
| "loss": 0.8333, | |
| "mean_token_accuracy": 0.7859309017658234, | |
| "step": 63 | |
| }, | |
| { | |
| "epoch": 0.375366568914956, | |
| "grad_norm": 3.388415827235031, | |
| "learning_rate": 3.999447966481298e-05, | |
| "loss": 1.0312, | |
| "mean_token_accuracy": 0.7555889338254929, | |
| "step": 64 | |
| }, | |
| { | |
| "epoch": 0.3812316715542522, | |
| "grad_norm": 3.573995994060177, | |
| "learning_rate": 3.999359777005647e-05, | |
| "loss": 1.1018, | |
| "mean_token_accuracy": 0.711833767592907, | |
| "step": 65 | |
| }, | |
| { | |
| "epoch": 0.3870967741935484, | |
| "grad_norm": 3.3622463956437576, | |
| "learning_rate": 3.999265056580719e-05, | |
| "loss": 0.8182, | |
| "mean_token_accuracy": 0.7698798477649689, | |
| "step": 66 | |
| }, | |
| { | |
| "epoch": 0.39296187683284456, | |
| "grad_norm": 3.8430356274770063, | |
| "learning_rate": 3.999163805550313e-05, | |
| "loss": 1.2049, | |
| "mean_token_accuracy": 0.7203006744384766, | |
| "step": 67 | |
| }, | |
| { | |
| "epoch": 0.39882697947214074, | |
| "grad_norm": 3.3760389403116977, | |
| "learning_rate": 3.9990560242819274e-05, | |
| "loss": 1.0742, | |
| "mean_token_accuracy": 0.7316281497478485, | |
| "step": 68 | |
| }, | |
| { | |
| "epoch": 0.4046920821114369, | |
| "grad_norm": 3.0177034229887947, | |
| "learning_rate": 3.9989417131667647e-05, | |
| "loss": 0.8724, | |
| "mean_token_accuracy": 0.7838785424828529, | |
| "step": 69 | |
| }, | |
| { | |
| "epoch": 0.41055718475073316, | |
| "grad_norm": 2.9210681589288323, | |
| "learning_rate": 3.9988208726197293e-05, | |
| "loss": 0.8838, | |
| "mean_token_accuracy": 0.7562002912163734, | |
| "step": 70 | |
| }, | |
| { | |
| "epoch": 0.41642228739002934, | |
| "grad_norm": 4.12729120204305, | |
| "learning_rate": 3.998693503079423e-05, | |
| "loss": 1.1041, | |
| "mean_token_accuracy": 0.7364362850785255, | |
| "step": 71 | |
| }, | |
| { | |
| "epoch": 0.4222873900293255, | |
| "grad_norm": 3.6965716329786513, | |
| "learning_rate": 3.998559605008146e-05, | |
| "loss": 0.9396, | |
| "mean_token_accuracy": 0.7411833629012108, | |
| "step": 72 | |
| }, | |
| { | |
| "epoch": 0.4281524926686217, | |
| "grad_norm": 3.515036685731384, | |
| "learning_rate": 3.9984191788918936e-05, | |
| "loss": 1.0019, | |
| "mean_token_accuracy": 0.7481007277965546, | |
| "step": 73 | |
| }, | |
| { | |
| "epoch": 0.4340175953079179, | |
| "grad_norm": 3.2827315115719475, | |
| "learning_rate": 3.998272225240356e-05, | |
| "loss": 1.0716, | |
| "mean_token_accuracy": 0.7445208355784416, | |
| "step": 74 | |
| }, | |
| { | |
| "epoch": 0.4398826979472141, | |
| "grad_norm": 3.4002738542391864, | |
| "learning_rate": 3.9981187445869165e-05, | |
| "loss": 0.9189, | |
| "mean_token_accuracy": 0.7811929360032082, | |
| "step": 75 | |
| }, | |
| { | |
| "epoch": 0.44574780058651026, | |
| "grad_norm": 3.189019628719659, | |
| "learning_rate": 3.9979587374886466e-05, | |
| "loss": 1.0077, | |
| "mean_token_accuracy": 0.7385336235165596, | |
| "step": 76 | |
| }, | |
| { | |
| "epoch": 0.45161290322580644, | |
| "grad_norm": 3.567260269970604, | |
| "learning_rate": 3.997792204526309e-05, | |
| "loss": 0.9266, | |
| "mean_token_accuracy": 0.7527733370661736, | |
| "step": 77 | |
| }, | |
| { | |
| "epoch": 0.4574780058651026, | |
| "grad_norm": 2.8177642527128977, | |
| "learning_rate": 3.99761914630435e-05, | |
| "loss": 0.8658, | |
| "mean_token_accuracy": 0.7683205679059029, | |
| "step": 78 | |
| }, | |
| { | |
| "epoch": 0.4633431085043988, | |
| "grad_norm": 2.977084955813338, | |
| "learning_rate": 3.997439563450901e-05, | |
| "loss": 0.8314, | |
| "mean_token_accuracy": 0.7646985054016113, | |
| "step": 79 | |
| }, | |
| { | |
| "epoch": 0.46920821114369504, | |
| "grad_norm": 3.326612634601017, | |
| "learning_rate": 3.997253456617775e-05, | |
| "loss": 0.9131, | |
| "mean_token_accuracy": 0.7705802395939827, | |
| "step": 80 | |
| }, | |
| { | |
| "epoch": 0.4750733137829912, | |
| "grad_norm": 2.8536925285281827, | |
| "learning_rate": 3.997060826480465e-05, | |
| "loss": 0.8028, | |
| "mean_token_accuracy": 0.7799694091081619, | |
| "step": 81 | |
| }, | |
| { | |
| "epoch": 0.4809384164222874, | |
| "grad_norm": 2.7871810127043863, | |
| "learning_rate": 3.9968616737381414e-05, | |
| "loss": 0.9226, | |
| "mean_token_accuracy": 0.7754707932472229, | |
| "step": 82 | |
| }, | |
| { | |
| "epoch": 0.4868035190615836, | |
| "grad_norm": 2.7373917465082354, | |
| "learning_rate": 3.996655999113647e-05, | |
| "loss": 0.7868, | |
| "mean_token_accuracy": 0.7968426421284676, | |
| "step": 83 | |
| }, | |
| { | |
| "epoch": 0.49266862170087977, | |
| "grad_norm": 2.6733857555268847, | |
| "learning_rate": 3.9964438033534994e-05, | |
| "loss": 0.6813, | |
| "mean_token_accuracy": 0.806500993669033, | |
| "step": 84 | |
| }, | |
| { | |
| "epoch": 0.49853372434017595, | |
| "grad_norm": 2.710796729205208, | |
| "learning_rate": 3.996225087227881e-05, | |
| "loss": 0.8327, | |
| "mean_token_accuracy": 0.7808489948511124, | |
| "step": 85 | |
| }, | |
| { | |
| "epoch": 0.5043988269794721, | |
| "grad_norm": 2.767538588379973, | |
| "learning_rate": 3.995999851530645e-05, | |
| "loss": 0.8104, | |
| "mean_token_accuracy": 0.8030309081077576, | |
| "step": 86 | |
| }, | |
| { | |
| "epoch": 0.5102639296187683, | |
| "grad_norm": 2.791748527828644, | |
| "learning_rate": 3.995768097079305e-05, | |
| "loss": 0.8377, | |
| "mean_token_accuracy": 0.7796871438622475, | |
| "step": 87 | |
| }, | |
| { | |
| "epoch": 0.5161290322580645, | |
| "grad_norm": 17.10160874448148, | |
| "learning_rate": 3.9955298247150365e-05, | |
| "loss": 1.0538, | |
| "mean_token_accuracy": 0.7283467650413513, | |
| "step": 88 | |
| }, | |
| { | |
| "epoch": 0.5219941348973607, | |
| "grad_norm": 2.9697344797921774, | |
| "learning_rate": 3.9952850353026715e-05, | |
| "loss": 0.8371, | |
| "mean_token_accuracy": 0.7595526427030563, | |
| "step": 89 | |
| }, | |
| { | |
| "epoch": 0.5278592375366569, | |
| "grad_norm": 3.063089786027371, | |
| "learning_rate": 3.9950337297306976e-05, | |
| "loss": 0.9252, | |
| "mean_token_accuracy": 0.765510194003582, | |
| "step": 90 | |
| }, | |
| { | |
| "epoch": 0.533724340175953, | |
| "grad_norm": 3.2593335067842117, | |
| "learning_rate": 3.994775908911251e-05, | |
| "loss": 0.9852, | |
| "mean_token_accuracy": 0.7499447464942932, | |
| "step": 91 | |
| }, | |
| { | |
| "epoch": 0.5395894428152492, | |
| "grad_norm": 2.755070128063194, | |
| "learning_rate": 3.9945115737801183e-05, | |
| "loss": 0.805, | |
| "mean_token_accuracy": 0.779099851846695, | |
| "step": 92 | |
| }, | |
| { | |
| "epoch": 0.5454545454545454, | |
| "grad_norm": 3.0713510061837743, | |
| "learning_rate": 3.99424072529673e-05, | |
| "loss": 0.9714, | |
| "mean_token_accuracy": 0.7578205242753029, | |
| "step": 93 | |
| }, | |
| { | |
| "epoch": 0.5513196480938416, | |
| "grad_norm": 5.577700081592657, | |
| "learning_rate": 3.993963364444155e-05, | |
| "loss": 0.8266, | |
| "mean_token_accuracy": 0.7803476750850677, | |
| "step": 94 | |
| }, | |
| { | |
| "epoch": 0.5571847507331378, | |
| "grad_norm": 3.0701912646852225, | |
| "learning_rate": 3.9936794922291015e-05, | |
| "loss": 0.9948, | |
| "mean_token_accuracy": 0.7452556863427162, | |
| "step": 95 | |
| }, | |
| { | |
| "epoch": 0.5630498533724341, | |
| "grad_norm": 17.60337365182807, | |
| "learning_rate": 3.993389109681912e-05, | |
| "loss": 1.0502, | |
| "mean_token_accuracy": 0.7418598681688309, | |
| "step": 96 | |
| }, | |
| { | |
| "epoch": 0.5689149560117303, | |
| "grad_norm": 3.5846812500371357, | |
| "learning_rate": 3.993092217856557e-05, | |
| "loss": 0.8611, | |
| "mean_token_accuracy": 0.7773077115416527, | |
| "step": 97 | |
| }, | |
| { | |
| "epoch": 0.5747800586510264, | |
| "grad_norm": 6.77146659060288, | |
| "learning_rate": 3.9927888178306346e-05, | |
| "loss": 0.9778, | |
| "mean_token_accuracy": 0.7604374513030052, | |
| "step": 98 | |
| }, | |
| { | |
| "epoch": 0.5806451612903226, | |
| "grad_norm": 3.736494622645263, | |
| "learning_rate": 3.992478910705364e-05, | |
| "loss": 0.9402, | |
| "mean_token_accuracy": 0.7550177425146103, | |
| "step": 99 | |
| }, | |
| { | |
| "epoch": 0.5865102639296188, | |
| "grad_norm": 2.675434314951168, | |
| "learning_rate": 3.992162497605583e-05, | |
| "loss": 0.7708, | |
| "mean_token_accuracy": 0.8071942776441574, | |
| "step": 100 | |
| }, | |
| { | |
| "epoch": 0.592375366568915, | |
| "grad_norm": 3.3217948172570586, | |
| "learning_rate": 3.991839579679742e-05, | |
| "loss": 0.8784, | |
| "mean_token_accuracy": 0.7734089642763138, | |
| "step": 101 | |
| }, | |
| { | |
| "epoch": 0.5982404692082112, | |
| "grad_norm": 2.771224155637984, | |
| "learning_rate": 3.991510158099905e-05, | |
| "loss": 0.63, | |
| "mean_token_accuracy": 0.8300730064511299, | |
| "step": 102 | |
| }, | |
| { | |
| "epoch": 0.6041055718475073, | |
| "grad_norm": 2.778513902291998, | |
| "learning_rate": 3.991174234061738e-05, | |
| "loss": 0.7114, | |
| "mean_token_accuracy": 0.8191840052604675, | |
| "step": 103 | |
| }, | |
| { | |
| "epoch": 0.6099706744868035, | |
| "grad_norm": 2.9362931569310113, | |
| "learning_rate": 3.9908318087845104e-05, | |
| "loss": 0.8523, | |
| "mean_token_accuracy": 0.7773655205965042, | |
| "step": 104 | |
| }, | |
| { | |
| "epoch": 0.6158357771260997, | |
| "grad_norm": 2.9316191976481516, | |
| "learning_rate": 3.990482883511086e-05, | |
| "loss": 0.6483, | |
| "mean_token_accuracy": 0.8267587870359421, | |
| "step": 105 | |
| }, | |
| { | |
| "epoch": 0.6217008797653959, | |
| "grad_norm": 3.0440297107545917, | |
| "learning_rate": 3.990127459507924e-05, | |
| "loss": 0.7311, | |
| "mean_token_accuracy": 0.8009310364723206, | |
| "step": 106 | |
| }, | |
| { | |
| "epoch": 0.6275659824046921, | |
| "grad_norm": 2.8590753777344324, | |
| "learning_rate": 3.98976553806507e-05, | |
| "loss": 0.7072, | |
| "mean_token_accuracy": 0.805036373436451, | |
| "step": 107 | |
| }, | |
| { | |
| "epoch": 0.6334310850439883, | |
| "grad_norm": 2.6771045583267137, | |
| "learning_rate": 3.989397120496152e-05, | |
| "loss": 0.5646, | |
| "mean_token_accuracy": 0.8565261512994766, | |
| "step": 108 | |
| }, | |
| { | |
| "epoch": 0.6392961876832844, | |
| "grad_norm": 3.0186022781962074, | |
| "learning_rate": 3.989022208138377e-05, | |
| "loss": 0.6418, | |
| "mean_token_accuracy": 0.83050137758255, | |
| "step": 109 | |
| }, | |
| { | |
| "epoch": 0.6451612903225806, | |
| "grad_norm": 3.8525032256000267, | |
| "learning_rate": 3.9886408023525256e-05, | |
| "loss": 0.9126, | |
| "mean_token_accuracy": 0.7831842973828316, | |
| "step": 110 | |
| }, | |
| { | |
| "epoch": 0.6510263929618768, | |
| "grad_norm": 3.1696627914446314, | |
| "learning_rate": 3.9882529045229475e-05, | |
| "loss": 0.9495, | |
| "mean_token_accuracy": 0.7556833699345589, | |
| "step": 111 | |
| }, | |
| { | |
| "epoch": 0.656891495601173, | |
| "grad_norm": 3.0391907313935658, | |
| "learning_rate": 3.987858516057554e-05, | |
| "loss": 0.6691, | |
| "mean_token_accuracy": 0.8235155344009399, | |
| "step": 112 | |
| }, | |
| { | |
| "epoch": 0.6627565982404692, | |
| "grad_norm": 3.1884729753748267, | |
| "learning_rate": 3.9874576383878165e-05, | |
| "loss": 0.7276, | |
| "mean_token_accuracy": 0.8132852613925934, | |
| "step": 113 | |
| }, | |
| { | |
| "epoch": 0.6686217008797654, | |
| "grad_norm": 2.8702285262921188, | |
| "learning_rate": 3.9870502729687594e-05, | |
| "loss": 0.7586, | |
| "mean_token_accuracy": 0.8025686517357826, | |
| "step": 114 | |
| }, | |
| { | |
| "epoch": 0.6744868035190615, | |
| "grad_norm": 3.477327705325461, | |
| "learning_rate": 3.986636421278954e-05, | |
| "loss": 0.7962, | |
| "mean_token_accuracy": 0.801344022154808, | |
| "step": 115 | |
| }, | |
| { | |
| "epoch": 0.6803519061583577, | |
| "grad_norm": 2.478525996153529, | |
| "learning_rate": 3.986216084820515e-05, | |
| "loss": 0.5867, | |
| "mean_token_accuracy": 0.8358008116483688, | |
| "step": 116 | |
| }, | |
| { | |
| "epoch": 0.6862170087976539, | |
| "grad_norm": 2.954408831168468, | |
| "learning_rate": 3.985789265119095e-05, | |
| "loss": 0.642, | |
| "mean_token_accuracy": 0.8179620429873466, | |
| "step": 117 | |
| }, | |
| { | |
| "epoch": 0.6920821114369502, | |
| "grad_norm": 2.448877211660101, | |
| "learning_rate": 3.985355963723875e-05, | |
| "loss": 0.5627, | |
| "mean_token_accuracy": 0.8517102301120758, | |
| "step": 118 | |
| }, | |
| { | |
| "epoch": 0.6979472140762464, | |
| "grad_norm": 2.79210837705298, | |
| "learning_rate": 3.9849161822075655e-05, | |
| "loss": 0.627, | |
| "mean_token_accuracy": 0.824343703687191, | |
| "step": 119 | |
| }, | |
| { | |
| "epoch": 0.7038123167155426, | |
| "grad_norm": 3.135497073478835, | |
| "learning_rate": 3.984469922166396e-05, | |
| "loss": 0.7399, | |
| "mean_token_accuracy": 0.8141955435276031, | |
| "step": 120 | |
| }, | |
| { | |
| "epoch": 0.7096774193548387, | |
| "grad_norm": 3.296087451604132, | |
| "learning_rate": 3.984017185220109e-05, | |
| "loss": 0.8949, | |
| "mean_token_accuracy": 0.7819836810231209, | |
| "step": 121 | |
| }, | |
| { | |
| "epoch": 0.7155425219941349, | |
| "grad_norm": 2.7562945396523584, | |
| "learning_rate": 3.9835579730119576e-05, | |
| "loss": 0.805, | |
| "mean_token_accuracy": 0.7919190227985382, | |
| "step": 122 | |
| }, | |
| { | |
| "epoch": 0.7214076246334311, | |
| "grad_norm": 2.667758292238495, | |
| "learning_rate": 3.9830922872086974e-05, | |
| "loss": 0.8093, | |
| "mean_token_accuracy": 0.806427076458931, | |
| "step": 123 | |
| }, | |
| { | |
| "epoch": 0.7272727272727273, | |
| "grad_norm": 2.8478048457167584, | |
| "learning_rate": 3.9826201295005784e-05, | |
| "loss": 0.8689, | |
| "mean_token_accuracy": 0.7796251475811005, | |
| "step": 124 | |
| }, | |
| { | |
| "epoch": 0.7331378299120235, | |
| "grad_norm": 2.88793329176012, | |
| "learning_rate": 3.982141501601343e-05, | |
| "loss": 0.8385, | |
| "mean_token_accuracy": 0.779462069272995, | |
| "step": 125 | |
| }, | |
| { | |
| "epoch": 0.7390029325513197, | |
| "grad_norm": 2.6368519953403418, | |
| "learning_rate": 3.9816564052482164e-05, | |
| "loss": 0.741, | |
| "mean_token_accuracy": 0.7980659455060959, | |
| "step": 126 | |
| }, | |
| { | |
| "epoch": 0.7448680351906158, | |
| "grad_norm": 2.6609506778610172, | |
| "learning_rate": 3.981164842201904e-05, | |
| "loss": 0.81, | |
| "mean_token_accuracy": 0.8032805398106575, | |
| "step": 127 | |
| }, | |
| { | |
| "epoch": 0.750733137829912, | |
| "grad_norm": 2.6663223208590776, | |
| "learning_rate": 3.9806668142465804e-05, | |
| "loss": 0.8363, | |
| "mean_token_accuracy": 0.8015436008572578, | |
| "step": 128 | |
| }, | |
| { | |
| "epoch": 0.7565982404692082, | |
| "grad_norm": 2.7057881563332487, | |
| "learning_rate": 3.9801623231898856e-05, | |
| "loss": 0.6372, | |
| "mean_token_accuracy": 0.8209338411688805, | |
| "step": 129 | |
| }, | |
| { | |
| "epoch": 0.7624633431085044, | |
| "grad_norm": 2.5687358685227664, | |
| "learning_rate": 3.9796513708629186e-05, | |
| "loss": 0.6557, | |
| "mean_token_accuracy": 0.82688008248806, | |
| "step": 130 | |
| }, | |
| { | |
| "epoch": 0.7683284457478006, | |
| "grad_norm": 2.6773701427099668, | |
| "learning_rate": 3.979133959120229e-05, | |
| "loss": 0.626, | |
| "mean_token_accuracy": 0.8373893424868584, | |
| "step": 131 | |
| }, | |
| { | |
| "epoch": 0.7741935483870968, | |
| "grad_norm": 2.587857152937776, | |
| "learning_rate": 3.9786100898398145e-05, | |
| "loss": 0.662, | |
| "mean_token_accuracy": 0.8253741338849068, | |
| "step": 132 | |
| }, | |
| { | |
| "epoch": 0.7800586510263929, | |
| "grad_norm": 2.5567903495608557, | |
| "learning_rate": 3.9780797649231085e-05, | |
| "loss": 0.7087, | |
| "mean_token_accuracy": 0.8196479603648186, | |
| "step": 133 | |
| }, | |
| { | |
| "epoch": 0.7859237536656891, | |
| "grad_norm": 2.8459978973895055, | |
| "learning_rate": 3.9775429862949745e-05, | |
| "loss": 0.8041, | |
| "mean_token_accuracy": 0.7983852028846741, | |
| "step": 134 | |
| }, | |
| { | |
| "epoch": 0.7917888563049853, | |
| "grad_norm": 2.687549165432222, | |
| "learning_rate": 3.976999755903704e-05, | |
| "loss": 0.7966, | |
| "mean_token_accuracy": 0.7903356328606606, | |
| "step": 135 | |
| }, | |
| { | |
| "epoch": 0.7976539589442815, | |
| "grad_norm": 2.3804933775980297, | |
| "learning_rate": 3.976450075721003e-05, | |
| "loss": 0.6608, | |
| "mean_token_accuracy": 0.8316953554749489, | |
| "step": 136 | |
| }, | |
| { | |
| "epoch": 0.8035190615835777, | |
| "grad_norm": 2.670873668569676, | |
| "learning_rate": 3.975893947741989e-05, | |
| "loss": 0.6061, | |
| "mean_token_accuracy": 0.8426604494452477, | |
| "step": 137 | |
| }, | |
| { | |
| "epoch": 0.8093841642228738, | |
| "grad_norm": 2.5007269896547784, | |
| "learning_rate": 3.9753313739851824e-05, | |
| "loss": 0.7976, | |
| "mean_token_accuracy": 0.796313926577568, | |
| "step": 138 | |
| }, | |
| { | |
| "epoch": 0.8152492668621701, | |
| "grad_norm": 2.9023488126536483, | |
| "learning_rate": 3.974762356492498e-05, | |
| "loss": 0.9283, | |
| "mean_token_accuracy": 0.7767031267285347, | |
| "step": 139 | |
| }, | |
| { | |
| "epoch": 0.8211143695014663, | |
| "grad_norm": 2.3838468185396082, | |
| "learning_rate": 3.974186897329239e-05, | |
| "loss": 0.6223, | |
| "mean_token_accuracy": 0.8510593101382256, | |
| "step": 140 | |
| }, | |
| { | |
| "epoch": 0.8269794721407625, | |
| "grad_norm": 2.707549081974321, | |
| "learning_rate": 3.97360499858409e-05, | |
| "loss": 0.6268, | |
| "mean_token_accuracy": 0.8382851779460907, | |
| "step": 141 | |
| }, | |
| { | |
| "epoch": 0.8328445747800587, | |
| "grad_norm": 3.7838908615103692, | |
| "learning_rate": 3.9730166623691096e-05, | |
| "loss": 0.8547, | |
| "mean_token_accuracy": 0.7846512198448181, | |
| "step": 142 | |
| }, | |
| { | |
| "epoch": 0.8387096774193549, | |
| "grad_norm": 2.5548687454598604, | |
| "learning_rate": 3.9724218908197194e-05, | |
| "loss": 0.5932, | |
| "mean_token_accuracy": 0.83430977165699, | |
| "step": 143 | |
| }, | |
| { | |
| "epoch": 0.844574780058651, | |
| "grad_norm": 3.199919744808126, | |
| "learning_rate": 3.971820686094701e-05, | |
| "loss": 0.9231, | |
| "mean_token_accuracy": 0.7737978771328926, | |
| "step": 144 | |
| }, | |
| { | |
| "epoch": 0.8504398826979472, | |
| "grad_norm": 2.8910753925453934, | |
| "learning_rate": 3.971213050376183e-05, | |
| "loss": 0.8144, | |
| "mean_token_accuracy": 0.7877362817525864, | |
| "step": 145 | |
| }, | |
| { | |
| "epoch": 0.8563049853372434, | |
| "grad_norm": 2.3603427477860395, | |
| "learning_rate": 3.9705989858696387e-05, | |
| "loss": 0.5849, | |
| "mean_token_accuracy": 0.8466823399066925, | |
| "step": 146 | |
| }, | |
| { | |
| "epoch": 0.8621700879765396, | |
| "grad_norm": 2.304209736108249, | |
| "learning_rate": 3.969978494803876e-05, | |
| "loss": 0.5765, | |
| "mean_token_accuracy": 0.841646671295166, | |
| "step": 147 | |
| }, | |
| { | |
| "epoch": 0.8680351906158358, | |
| "grad_norm": 2.6376352843496926, | |
| "learning_rate": 3.969351579431024e-05, | |
| "loss": 0.6115, | |
| "mean_token_accuracy": 0.8385377004742622, | |
| "step": 148 | |
| }, | |
| { | |
| "epoch": 0.873900293255132, | |
| "grad_norm": 2.7296649476772807, | |
| "learning_rate": 3.968718242026533e-05, | |
| "loss": 0.6022, | |
| "mean_token_accuracy": 0.8428103923797607, | |
| "step": 149 | |
| }, | |
| { | |
| "epoch": 0.8797653958944281, | |
| "grad_norm": 2.2084195177668, | |
| "learning_rate": 3.968078484889163e-05, | |
| "loss": 0.4707, | |
| "mean_token_accuracy": 0.8667677119374275, | |
| "step": 150 | |
| }, | |
| { | |
| "epoch": 0.8856304985337243, | |
| "grad_norm": 2.7078580526227976, | |
| "learning_rate": 3.9674323103409736e-05, | |
| "loss": 0.6618, | |
| "mean_token_accuracy": 0.8220425173640251, | |
| "step": 151 | |
| }, | |
| { | |
| "epoch": 0.8914956011730205, | |
| "grad_norm": 2.840612532292078, | |
| "learning_rate": 3.966779720727317e-05, | |
| "loss": 0.8325, | |
| "mean_token_accuracy": 0.7944880649447441, | |
| "step": 152 | |
| }, | |
| { | |
| "epoch": 0.8973607038123167, | |
| "grad_norm": 2.567048273653788, | |
| "learning_rate": 3.9661207184168305e-05, | |
| "loss": 0.6274, | |
| "mean_token_accuracy": 0.8346653878688812, | |
| "step": 153 | |
| }, | |
| { | |
| "epoch": 0.9032258064516129, | |
| "grad_norm": 2.4868154973678545, | |
| "learning_rate": 3.9654553058014265e-05, | |
| "loss": 0.7639, | |
| "mean_token_accuracy": 0.8028886467218399, | |
| "step": 154 | |
| }, | |
| { | |
| "epoch": 0.9090909090909091, | |
| "grad_norm": 2.347709315999483, | |
| "learning_rate": 3.9647834852962825e-05, | |
| "loss": 0.6179, | |
| "mean_token_accuracy": 0.8445746973156929, | |
| "step": 155 | |
| }, | |
| { | |
| "epoch": 0.9149560117302052, | |
| "grad_norm": 2.7197025616507804, | |
| "learning_rate": 3.964105259339838e-05, | |
| "loss": 0.8655, | |
| "mean_token_accuracy": 0.7824793308973312, | |
| "step": 156 | |
| }, | |
| { | |
| "epoch": 0.9208211143695014, | |
| "grad_norm": 2.2237444753974063, | |
| "learning_rate": 3.9634206303937773e-05, | |
| "loss": 0.5044, | |
| "mean_token_accuracy": 0.8604275360703468, | |
| "step": 157 | |
| }, | |
| { | |
| "epoch": 0.9266862170087976, | |
| "grad_norm": 2.0464596548682663, | |
| "learning_rate": 3.962729600943028e-05, | |
| "loss": 0.5075, | |
| "mean_token_accuracy": 0.8705720156431198, | |
| "step": 158 | |
| }, | |
| { | |
| "epoch": 0.9325513196480938, | |
| "grad_norm": 2.8654977822500167, | |
| "learning_rate": 3.962032173495748e-05, | |
| "loss": 0.5989, | |
| "mean_token_accuracy": 0.8563590124249458, | |
| "step": 159 | |
| }, | |
| { | |
| "epoch": 0.9384164222873901, | |
| "grad_norm": 2.095493932904228, | |
| "learning_rate": 3.961328350583316e-05, | |
| "loss": 0.5158, | |
| "mean_token_accuracy": 0.8581189513206482, | |
| "step": 160 | |
| }, | |
| { | |
| "epoch": 0.9442815249266863, | |
| "grad_norm": 2.4110117754062923, | |
| "learning_rate": 3.960618134760327e-05, | |
| "loss": 0.6664, | |
| "mean_token_accuracy": 0.8309664577245712, | |
| "step": 161 | |
| }, | |
| { | |
| "epoch": 0.9501466275659824, | |
| "grad_norm": 2.130310237043839, | |
| "learning_rate": 3.959901528604575e-05, | |
| "loss": 0.438, | |
| "mean_token_accuracy": 0.8650497943162918, | |
| "step": 162 | |
| }, | |
| { | |
| "epoch": 0.9560117302052786, | |
| "grad_norm": 2.7110395857688534, | |
| "learning_rate": 3.959178534717053e-05, | |
| "loss": 0.7384, | |
| "mean_token_accuracy": 0.8108371719717979, | |
| "step": 163 | |
| }, | |
| { | |
| "epoch": 0.9618768328445748, | |
| "grad_norm": 2.3266814167894356, | |
| "learning_rate": 3.9584491557219366e-05, | |
| "loss": 0.692, | |
| "mean_token_accuracy": 0.8317501842975616, | |
| "step": 164 | |
| }, | |
| { | |
| "epoch": 0.967741935483871, | |
| "grad_norm": 2.227229660768314, | |
| "learning_rate": 3.957713394266576e-05, | |
| "loss": 0.5823, | |
| "mean_token_accuracy": 0.8335886374115944, | |
| "step": 165 | |
| }, | |
| { | |
| "epoch": 0.9736070381231672, | |
| "grad_norm": 2.5029650929135907, | |
| "learning_rate": 3.956971253021489e-05, | |
| "loss": 0.5433, | |
| "mean_token_accuracy": 0.8478502333164215, | |
| "step": 166 | |
| }, | |
| { | |
| "epoch": 0.9794721407624634, | |
| "grad_norm": 2.5935021305346884, | |
| "learning_rate": 3.956222734680348e-05, | |
| "loss": 0.6178, | |
| "mean_token_accuracy": 0.839194655418396, | |
| "step": 167 | |
| }, | |
| { | |
| "epoch": 0.9853372434017595, | |
| "grad_norm": 2.4490076558704454, | |
| "learning_rate": 3.955467841959972e-05, | |
| "loss": 0.6454, | |
| "mean_token_accuracy": 0.8422679975628853, | |
| "step": 168 | |
| }, | |
| { | |
| "epoch": 0.9912023460410557, | |
| "grad_norm": 2.3280083402193092, | |
| "learning_rate": 3.954706577600318e-05, | |
| "loss": 0.6046, | |
| "mean_token_accuracy": 0.8274341821670532, | |
| "step": 169 | |
| }, | |
| { | |
| "epoch": 0.9970674486803519, | |
| "grad_norm": 2.1986709980959795, | |
| "learning_rate": 3.953938944364467e-05, | |
| "loss": 0.7424, | |
| "mean_token_accuracy": 0.8125108182430267, | |
| "step": 170 | |
| }, | |
| { | |
| "epoch": 1.0, | |
| "grad_norm": 2.1986709980959795, | |
| "learning_rate": 3.953164945038618e-05, | |
| "loss": 0.6563, | |
| "mean_token_accuracy": 0.8526751548051834, | |
| "step": 171 | |
| }, | |
| { | |
| "epoch": 1.0058651026392962, | |
| "grad_norm": 3.2772084342860626, | |
| "learning_rate": 3.952384582432076e-05, | |
| "loss": 0.4182, | |
| "mean_token_accuracy": 0.8776650503277779, | |
| "step": 172 | |
| }, | |
| { | |
| "epoch": 1.0117302052785924, | |
| "grad_norm": 2.5235581743376145, | |
| "learning_rate": 3.9515978593772426e-05, | |
| "loss": 0.3567, | |
| "mean_token_accuracy": 0.8968701064586639, | |
| "step": 173 | |
| }, | |
| { | |
| "epoch": 1.0175953079178885, | |
| "grad_norm": 2.297685151844413, | |
| "learning_rate": 3.9508047787296034e-05, | |
| "loss": 0.2678, | |
| "mean_token_accuracy": 0.9174044728279114, | |
| "step": 174 | |
| }, | |
| { | |
| "epoch": 1.0234604105571847, | |
| "grad_norm": 2.2108090381629717, | |
| "learning_rate": 3.9500053433677226e-05, | |
| "loss": 0.3188, | |
| "mean_token_accuracy": 0.9073121473193169, | |
| "step": 175 | |
| }, | |
| { | |
| "epoch": 1.029325513196481, | |
| "grad_norm": 2.0055668794708144, | |
| "learning_rate": 3.949199556193226e-05, | |
| "loss": 0.3792, | |
| "mean_token_accuracy": 0.8863450139760971, | |
| "step": 176 | |
| }, | |
| { | |
| "epoch": 1.035190615835777, | |
| "grad_norm": 2.1578927026914116, | |
| "learning_rate": 3.948387420130796e-05, | |
| "loss": 0.268, | |
| "mean_token_accuracy": 0.926617331802845, | |
| "step": 177 | |
| }, | |
| { | |
| "epoch": 1.0410557184750733, | |
| "grad_norm": 2.107269518420395, | |
| "learning_rate": 3.94756893812816e-05, | |
| "loss": 0.3907, | |
| "mean_token_accuracy": 0.8920909613370895, | |
| "step": 178 | |
| }, | |
| { | |
| "epoch": 1.0469208211143695, | |
| "grad_norm": 2.528904479053577, | |
| "learning_rate": 3.946744113156075e-05, | |
| "loss": 0.3055, | |
| "mean_token_accuracy": 0.8956907019019127, | |
| "step": 179 | |
| }, | |
| { | |
| "epoch": 1.0527859237536656, | |
| "grad_norm": 2.3965215093079557, | |
| "learning_rate": 3.945912948208324e-05, | |
| "loss": 0.4794, | |
| "mean_token_accuracy": 0.8625766634941101, | |
| "step": 180 | |
| }, | |
| { | |
| "epoch": 1.0586510263929618, | |
| "grad_norm": 2.4900117595217726, | |
| "learning_rate": 3.9450754463016994e-05, | |
| "loss": 0.4096, | |
| "mean_token_accuracy": 0.8826896324753761, | |
| "step": 181 | |
| }, | |
| { | |
| "epoch": 1.064516129032258, | |
| "grad_norm": 2.4103564653720153, | |
| "learning_rate": 3.9442316104759955e-05, | |
| "loss": 0.3678, | |
| "mean_token_accuracy": 0.9017849788069725, | |
| "step": 182 | |
| }, | |
| { | |
| "epoch": 1.0703812316715542, | |
| "grad_norm": 1.861075597205315, | |
| "learning_rate": 3.943381443793994e-05, | |
| "loss": 0.4068, | |
| "mean_token_accuracy": 0.8929754197597504, | |
| "step": 183 | |
| }, | |
| { | |
| "epoch": 1.0762463343108504, | |
| "grad_norm": 2.4115954465623477, | |
| "learning_rate": 3.9425249493414585e-05, | |
| "loss": 0.6112, | |
| "mean_token_accuracy": 0.8370330631732941, | |
| "step": 184 | |
| }, | |
| { | |
| "epoch": 1.0821114369501466, | |
| "grad_norm": 4.567207793686968, | |
| "learning_rate": 3.941662130227118e-05, | |
| "loss": 0.4997, | |
| "mean_token_accuracy": 0.8561821803450584, | |
| "step": 185 | |
| }, | |
| { | |
| "epoch": 1.0879765395894427, | |
| "grad_norm": 2.4183931239021184, | |
| "learning_rate": 3.940792989582654e-05, | |
| "loss": 0.4121, | |
| "mean_token_accuracy": 0.8887381628155708, | |
| "step": 186 | |
| }, | |
| { | |
| "epoch": 1.093841642228739, | |
| "grad_norm": 1.970433465247901, | |
| "learning_rate": 3.939917530562701e-05, | |
| "loss": 0.3054, | |
| "mean_token_accuracy": 0.9093934372067451, | |
| "step": 187 | |
| }, | |
| { | |
| "epoch": 1.099706744868035, | |
| "grad_norm": 2.0703604995218243, | |
| "learning_rate": 3.939035756344818e-05, | |
| "loss": 0.3744, | |
| "mean_token_accuracy": 0.9033533856272697, | |
| "step": 188 | |
| }, | |
| { | |
| "epoch": 1.1055718475073313, | |
| "grad_norm": 2.101467735227217, | |
| "learning_rate": 3.93814767012949e-05, | |
| "loss": 0.4006, | |
| "mean_token_accuracy": 0.8802796825766563, | |
| "step": 189 | |
| }, | |
| { | |
| "epoch": 1.1114369501466275, | |
| "grad_norm": 2.455916126917878, | |
| "learning_rate": 3.937253275140113e-05, | |
| "loss": 0.2646, | |
| "mean_token_accuracy": 0.923853725194931, | |
| "step": 190 | |
| }, | |
| { | |
| "epoch": 1.1173020527859236, | |
| "grad_norm": 2.1535792407926446, | |
| "learning_rate": 3.936352574622978e-05, | |
| "loss": 0.2866, | |
| "mean_token_accuracy": 0.9053800255060196, | |
| "step": 191 | |
| }, | |
| { | |
| "epoch": 1.1231671554252198, | |
| "grad_norm": 1.775502805179243, | |
| "learning_rate": 3.9354455718472646e-05, | |
| "loss": 0.396, | |
| "mean_token_accuracy": 0.8891168534755707, | |
| "step": 192 | |
| }, | |
| { | |
| "epoch": 1.129032258064516, | |
| "grad_norm": 2.5487239609300123, | |
| "learning_rate": 3.934532270105026e-05, | |
| "loss": 0.4241, | |
| "mean_token_accuracy": 0.8907202184200287, | |
| "step": 193 | |
| }, | |
| { | |
| "epoch": 1.1348973607038122, | |
| "grad_norm": 2.7940170270563534, | |
| "learning_rate": 3.933612672711179e-05, | |
| "loss": 0.4029, | |
| "mean_token_accuracy": 0.8868995606899261, | |
| "step": 194 | |
| }, | |
| { | |
| "epoch": 1.1407624633431086, | |
| "grad_norm": 2.2167275228202854, | |
| "learning_rate": 3.9326867830034915e-05, | |
| "loss": 0.4188, | |
| "mean_token_accuracy": 0.8792544528841972, | |
| "step": 195 | |
| }, | |
| { | |
| "epoch": 1.1466275659824048, | |
| "grad_norm": 2.304107316447215, | |
| "learning_rate": 3.931754604342568e-05, | |
| "loss": 0.3405, | |
| "mean_token_accuracy": 0.8975262865424156, | |
| "step": 196 | |
| }, | |
| { | |
| "epoch": 1.152492668621701, | |
| "grad_norm": 1.9938659279661923, | |
| "learning_rate": 3.930816140111842e-05, | |
| "loss": 0.2866, | |
| "mean_token_accuracy": 0.909386046230793, | |
| "step": 197 | |
| }, | |
| { | |
| "epoch": 1.1583577712609971, | |
| "grad_norm": 2.1198141801508283, | |
| "learning_rate": 3.929871393717558e-05, | |
| "loss": 0.3839, | |
| "mean_token_accuracy": 0.9004511907696724, | |
| "step": 198 | |
| }, | |
| { | |
| "epoch": 1.1642228739002933, | |
| "grad_norm": 2.7406704106077573, | |
| "learning_rate": 3.9289203685887644e-05, | |
| "loss": 0.4047, | |
| "mean_token_accuracy": 0.8865419179201126, | |
| "step": 199 | |
| }, | |
| { | |
| "epoch": 1.1700879765395895, | |
| "grad_norm": 2.5501983405080817, | |
| "learning_rate": 3.927963068177299e-05, | |
| "loss": 0.4452, | |
| "mean_token_accuracy": 0.8658623695373535, | |
| "step": 200 | |
| }, | |
| { | |
| "epoch": 1.1759530791788857, | |
| "grad_norm": 2.5615420993615925, | |
| "learning_rate": 3.926999495957775e-05, | |
| "loss": 0.5242, | |
| "mean_token_accuracy": 0.8514630421996117, | |
| "step": 201 | |
| }, | |
| { | |
| "epoch": 1.1818181818181819, | |
| "grad_norm": 2.2053976497237104, | |
| "learning_rate": 3.9260296554275704e-05, | |
| "loss": 0.5312, | |
| "mean_token_accuracy": 0.8589539080858231, | |
| "step": 202 | |
| }, | |
| { | |
| "epoch": 1.187683284457478, | |
| "grad_norm": 2.112972722429536, | |
| "learning_rate": 3.925053550106815e-05, | |
| "loss": 0.3756, | |
| "mean_token_accuracy": 0.8889518976211548, | |
| "step": 203 | |
| }, | |
| { | |
| "epoch": 1.1935483870967742, | |
| "grad_norm": 1.9377609554676252, | |
| "learning_rate": 3.9240711835383766e-05, | |
| "loss": 0.3399, | |
| "mean_token_accuracy": 0.8915588706731796, | |
| "step": 204 | |
| }, | |
| { | |
| "epoch": 1.1994134897360704, | |
| "grad_norm": 2.21970803239968, | |
| "learning_rate": 3.9230825592878494e-05, | |
| "loss": 0.3734, | |
| "mean_token_accuracy": 0.8954818993806839, | |
| "step": 205 | |
| }, | |
| { | |
| "epoch": 1.2052785923753666, | |
| "grad_norm": 2.4553958141146444, | |
| "learning_rate": 3.92208768094354e-05, | |
| "loss": 0.3015, | |
| "mean_token_accuracy": 0.9162927344441414, | |
| "step": 206 | |
| }, | |
| { | |
| "epoch": 1.2111436950146628, | |
| "grad_norm": 1.9800319673267526, | |
| "learning_rate": 3.921086552116455e-05, | |
| "loss": 0.3349, | |
| "mean_token_accuracy": 0.9030973836779594, | |
| "step": 207 | |
| }, | |
| { | |
| "epoch": 1.217008797653959, | |
| "grad_norm": 2.1007218191930335, | |
| "learning_rate": 3.920079176440288e-05, | |
| "loss": 0.3028, | |
| "mean_token_accuracy": 0.916605718433857, | |
| "step": 208 | |
| }, | |
| { | |
| "epoch": 1.2228739002932552, | |
| "grad_norm": 2.547001474595905, | |
| "learning_rate": 3.9190655575714045e-05, | |
| "loss": 0.5017, | |
| "mean_token_accuracy": 0.8750176280736923, | |
| "step": 209 | |
| }, | |
| { | |
| "epoch": 1.2287390029325513, | |
| "grad_norm": 2.4821613370193045, | |
| "learning_rate": 3.918045699188833e-05, | |
| "loss": 0.3779, | |
| "mean_token_accuracy": 0.8920472636818886, | |
| "step": 210 | |
| }, | |
| { | |
| "epoch": 1.2346041055718475, | |
| "grad_norm": 1.9926481632824355, | |
| "learning_rate": 3.9170196049942474e-05, | |
| "loss": 0.3206, | |
| "mean_token_accuracy": 0.9034112691879272, | |
| "step": 211 | |
| }, | |
| { | |
| "epoch": 1.2404692082111437, | |
| "grad_norm": 1.8149346899691008, | |
| "learning_rate": 3.915987278711954e-05, | |
| "loss": 0.2996, | |
| "mean_token_accuracy": 0.9074381738901138, | |
| "step": 212 | |
| }, | |
| { | |
| "epoch": 1.2463343108504399, | |
| "grad_norm": 1.690371089487332, | |
| "learning_rate": 3.914948724088883e-05, | |
| "loss": 0.4503, | |
| "mean_token_accuracy": 0.8817943632602692, | |
| "step": 213 | |
| }, | |
| { | |
| "epoch": 1.252199413489736, | |
| "grad_norm": 2.3425310839455933, | |
| "learning_rate": 3.913903944894565e-05, | |
| "loss": 0.3848, | |
| "mean_token_accuracy": 0.8884705975651741, | |
| "step": 214 | |
| }, | |
| { | |
| "epoch": 1.2580645161290323, | |
| "grad_norm": 1.8082230444778677, | |
| "learning_rate": 3.912852944921129e-05, | |
| "loss": 0.3576, | |
| "mean_token_accuracy": 0.8995041996240616, | |
| "step": 215 | |
| }, | |
| { | |
| "epoch": 1.2639296187683284, | |
| "grad_norm": 2.156095363781735, | |
| "learning_rate": 3.911795727983279e-05, | |
| "loss": 0.3768, | |
| "mean_token_accuracy": 0.9000616893172264, | |
| "step": 216 | |
| }, | |
| { | |
| "epoch": 1.2697947214076246, | |
| "grad_norm": 2.056525651901662, | |
| "learning_rate": 3.910732297918285e-05, | |
| "loss": 0.4354, | |
| "mean_token_accuracy": 0.8829119503498077, | |
| "step": 217 | |
| }, | |
| { | |
| "epoch": 1.2756598240469208, | |
| "grad_norm": 2.569771491183553, | |
| "learning_rate": 3.90966265858597e-05, | |
| "loss": 0.4653, | |
| "mean_token_accuracy": 0.8823570907115936, | |
| "step": 218 | |
| }, | |
| { | |
| "epoch": 1.281524926686217, | |
| "grad_norm": 2.132704573151928, | |
| "learning_rate": 3.908586813868693e-05, | |
| "loss": 0.4343, | |
| "mean_token_accuracy": 0.884559653699398, | |
| "step": 219 | |
| }, | |
| { | |
| "epoch": 1.2873900293255132, | |
| "grad_norm": 2.4720965949429736, | |
| "learning_rate": 3.9075047676713354e-05, | |
| "loss": 0.4649, | |
| "mean_token_accuracy": 0.874346137046814, | |
| "step": 220 | |
| }, | |
| { | |
| "epoch": 1.2932551319648093, | |
| "grad_norm": 2.0429710071111704, | |
| "learning_rate": 3.9064165239212874e-05, | |
| "loss": 0.4416, | |
| "mean_token_accuracy": 0.8792015537619591, | |
| "step": 221 | |
| }, | |
| { | |
| "epoch": 1.2991202346041055, | |
| "grad_norm": 2.0216506902906413, | |
| "learning_rate": 3.905322086568434e-05, | |
| "loss": 0.4349, | |
| "mean_token_accuracy": 0.8829491958022118, | |
| "step": 222 | |
| }, | |
| { | |
| "epoch": 1.3049853372434017, | |
| "grad_norm": 2.7526271970057388, | |
| "learning_rate": 3.904221459585142e-05, | |
| "loss": 0.3743, | |
| "mean_token_accuracy": 0.887954942882061, | |
| "step": 223 | |
| }, | |
| { | |
| "epoch": 1.310850439882698, | |
| "grad_norm": 2.0368517762814884, | |
| "learning_rate": 3.903114646966242e-05, | |
| "loss": 0.4253, | |
| "mean_token_accuracy": 0.8919025957584381, | |
| "step": 224 | |
| }, | |
| { | |
| "epoch": 1.316715542521994, | |
| "grad_norm": 1.9783869883280967, | |
| "learning_rate": 3.9020016527290166e-05, | |
| "loss": 0.3979, | |
| "mean_token_accuracy": 0.8797405809164047, | |
| "step": 225 | |
| }, | |
| { | |
| "epoch": 1.3225806451612903, | |
| "grad_norm": 1.7298586661221587, | |
| "learning_rate": 3.900882480913185e-05, | |
| "loss": 0.2768, | |
| "mean_token_accuracy": 0.9170741513371468, | |
| "step": 226 | |
| }, | |
| { | |
| "epoch": 1.3284457478005864, | |
| "grad_norm": 1.9723648447844537, | |
| "learning_rate": 3.899757135580891e-05, | |
| "loss": 0.6138, | |
| "mean_token_accuracy": 0.8599758371710777, | |
| "step": 227 | |
| }, | |
| { | |
| "epoch": 1.3343108504398826, | |
| "grad_norm": 3.9254570692519297, | |
| "learning_rate": 3.898625620816681e-05, | |
| "loss": 0.3718, | |
| "mean_token_accuracy": 0.887211762368679, | |
| "step": 228 | |
| }, | |
| { | |
| "epoch": 1.3401759530791788, | |
| "grad_norm": 2.399705196668241, | |
| "learning_rate": 3.8974879407275e-05, | |
| "loss": 0.5152, | |
| "mean_token_accuracy": 0.858745202422142, | |
| "step": 229 | |
| }, | |
| { | |
| "epoch": 1.3460410557184752, | |
| "grad_norm": 2.5271848770288563, | |
| "learning_rate": 3.896344099442663e-05, | |
| "loss": 0.3902, | |
| "mean_token_accuracy": 0.8907849490642548, | |
| "step": 230 | |
| }, | |
| { | |
| "epoch": 1.3519061583577714, | |
| "grad_norm": 1.9404506617545383, | |
| "learning_rate": 3.895194101113855e-05, | |
| "loss": 0.3174, | |
| "mean_token_accuracy": 0.8844568431377411, | |
| "step": 231 | |
| }, | |
| { | |
| "epoch": 1.3577712609970676, | |
| "grad_norm": 2.0665741764137984, | |
| "learning_rate": 3.894037949915104e-05, | |
| "loss": 0.361, | |
| "mean_token_accuracy": 0.9073267132043839, | |
| "step": 232 | |
| }, | |
| { | |
| "epoch": 1.3636363636363638, | |
| "grad_norm": 1.7521385020924563, | |
| "learning_rate": 3.8928756500427735e-05, | |
| "loss": 0.3729, | |
| "mean_token_accuracy": 0.8856314644217491, | |
| "step": 233 | |
| }, | |
| { | |
| "epoch": 1.36950146627566, | |
| "grad_norm": 4.272339166861924, | |
| "learning_rate": 3.89170720571554e-05, | |
| "loss": 0.3891, | |
| "mean_token_accuracy": 0.8926250636577606, | |
| "step": 234 | |
| }, | |
| { | |
| "epoch": 1.3753665689149561, | |
| "grad_norm": 2.0844003036828957, | |
| "learning_rate": 3.890532621174387e-05, | |
| "loss": 0.3451, | |
| "mean_token_accuracy": 0.8915408626198769, | |
| "step": 235 | |
| }, | |
| { | |
| "epoch": 1.3812316715542523, | |
| "grad_norm": 1.9289770077407407, | |
| "learning_rate": 3.8893519006825806e-05, | |
| "loss": 0.3459, | |
| "mean_token_accuracy": 0.8924594298005104, | |
| "step": 236 | |
| }, | |
| { | |
| "epoch": 1.3870967741935485, | |
| "grad_norm": 2.2301290012638133, | |
| "learning_rate": 3.88816504852566e-05, | |
| "loss": 0.3396, | |
| "mean_token_accuracy": 0.9005770832300186, | |
| "step": 237 | |
| }, | |
| { | |
| "epoch": 1.3929618768328447, | |
| "grad_norm": 2.171361614601455, | |
| "learning_rate": 3.886972069011419e-05, | |
| "loss": 0.5869, | |
| "mean_token_accuracy": 0.8503864109516144, | |
| "step": 238 | |
| }, | |
| { | |
| "epoch": 1.3988269794721409, | |
| "grad_norm": 2.6775249334268616, | |
| "learning_rate": 3.885772966469891e-05, | |
| "loss": 0.366, | |
| "mean_token_accuracy": 0.8903994932770729, | |
| "step": 239 | |
| }, | |
| { | |
| "epoch": 1.404692082111437, | |
| "grad_norm": 4.079621344494869, | |
| "learning_rate": 3.884567745253335e-05, | |
| "loss": 0.2753, | |
| "mean_token_accuracy": 0.9139163419604301, | |
| "step": 240 | |
| }, | |
| { | |
| "epoch": 1.4105571847507332, | |
| "grad_norm": 1.776096950687329, | |
| "learning_rate": 3.8833564097362157e-05, | |
| "loss": 0.4289, | |
| "mean_token_accuracy": 0.8798037841916084, | |
| "step": 241 | |
| }, | |
| { | |
| "epoch": 1.4164222873900294, | |
| "grad_norm": 1.834917253645093, | |
| "learning_rate": 3.8821389643151924e-05, | |
| "loss": 0.2886, | |
| "mean_token_accuracy": 0.9188757091760635, | |
| "step": 242 | |
| }, | |
| { | |
| "epoch": 1.4222873900293256, | |
| "grad_norm": 1.902514045474563, | |
| "learning_rate": 3.880915413409102e-05, | |
| "loss": 0.3386, | |
| "mean_token_accuracy": 0.9072437360882759, | |
| "step": 243 | |
| }, | |
| { | |
| "epoch": 1.4281524926686218, | |
| "grad_norm": 1.6924480184967565, | |
| "learning_rate": 3.879685761458938e-05, | |
| "loss": 0.4339, | |
| "mean_token_accuracy": 0.8632926791906357, | |
| "step": 244 | |
| }, | |
| { | |
| "epoch": 1.434017595307918, | |
| "grad_norm": 1.911568055017853, | |
| "learning_rate": 3.8784500129278405e-05, | |
| "loss": 0.2714, | |
| "mean_token_accuracy": 0.9203036949038506, | |
| "step": 245 | |
| }, | |
| { | |
| "epoch": 1.4398826979472141, | |
| "grad_norm": 1.9969412513036282, | |
| "learning_rate": 3.877208172301079e-05, | |
| "loss": 0.4284, | |
| "mean_token_accuracy": 0.8679738715291023, | |
| "step": 246 | |
| }, | |
| { | |
| "epoch": 1.4457478005865103, | |
| "grad_norm": 1.787789982945381, | |
| "learning_rate": 3.875960244086032e-05, | |
| "loss": 0.345, | |
| "mean_token_accuracy": 0.9001687616109848, | |
| "step": 247 | |
| }, | |
| { | |
| "epoch": 1.4516129032258065, | |
| "grad_norm": 2.1914713980298757, | |
| "learning_rate": 3.8747062328121756e-05, | |
| "loss": 0.3879, | |
| "mean_token_accuracy": 0.8989882245659828, | |
| "step": 248 | |
| }, | |
| { | |
| "epoch": 1.4574780058651027, | |
| "grad_norm": 1.7197818602016577, | |
| "learning_rate": 3.873446143031064e-05, | |
| "loss": 0.2785, | |
| "mean_token_accuracy": 0.9228793308138847, | |
| "step": 249 | |
| }, | |
| { | |
| "epoch": 1.4633431085043989, | |
| "grad_norm": 1.9165780404977792, | |
| "learning_rate": 3.872179979316314e-05, | |
| "loss": 0.3064, | |
| "mean_token_accuracy": 0.908599853515625, | |
| "step": 250 | |
| }, | |
| { | |
| "epoch": 1.469208211143695, | |
| "grad_norm": 1.6704715586382999, | |
| "learning_rate": 3.870907746263589e-05, | |
| "loss": 0.2689, | |
| "mean_token_accuracy": 0.9201952144503593, | |
| "step": 251 | |
| }, | |
| { | |
| "epoch": 1.4750733137829912, | |
| "grad_norm": 1.6995923765648426, | |
| "learning_rate": 3.869629448490582e-05, | |
| "loss": 0.3333, | |
| "mean_token_accuracy": 0.9088203087449074, | |
| "step": 252 | |
| }, | |
| { | |
| "epoch": 1.4809384164222874, | |
| "grad_norm": 1.564555577131926, | |
| "learning_rate": 3.868345090636995e-05, | |
| "loss": 0.3582, | |
| "mean_token_accuracy": 0.9006411507725716, | |
| "step": 253 | |
| }, | |
| { | |
| "epoch": 1.4868035190615836, | |
| "grad_norm": 1.944612250081467, | |
| "learning_rate": 3.867054677364531e-05, | |
| "loss": 0.347, | |
| "mean_token_accuracy": 0.8948578387498856, | |
| "step": 254 | |
| }, | |
| { | |
| "epoch": 1.4926686217008798, | |
| "grad_norm": 1.9767571384054532, | |
| "learning_rate": 3.865758213356868e-05, | |
| "loss": 0.3588, | |
| "mean_token_accuracy": 0.892326109111309, | |
| "step": 255 | |
| }, | |
| { | |
| "epoch": 1.498533724340176, | |
| "grad_norm": 2.1786507345414625, | |
| "learning_rate": 3.8644557033196456e-05, | |
| "loss": 0.3443, | |
| "mean_token_accuracy": 0.900772362947464, | |
| "step": 256 | |
| }, | |
| { | |
| "epoch": 1.5043988269794721, | |
| "grad_norm": 1.727787016452984, | |
| "learning_rate": 3.8631471519804514e-05, | |
| "loss": 0.3939, | |
| "mean_token_accuracy": 0.899748906493187, | |
| "step": 257 | |
| }, | |
| { | |
| "epoch": 1.5102639296187683, | |
| "grad_norm": 2.2435616485276584, | |
| "learning_rate": 3.861832564088797e-05, | |
| "loss": 0.4526, | |
| "mean_token_accuracy": 0.8786605149507523, | |
| "step": 258 | |
| }, | |
| { | |
| "epoch": 1.5161290322580645, | |
| "grad_norm": 2.1226358626723485, | |
| "learning_rate": 3.860511944416105e-05, | |
| "loss": 0.2974, | |
| "mean_token_accuracy": 0.911198802292347, | |
| "step": 259 | |
| }, | |
| { | |
| "epoch": 1.5219941348973607, | |
| "grad_norm": 1.9277458511435361, | |
| "learning_rate": 3.859185297755693e-05, | |
| "loss": 0.3021, | |
| "mean_token_accuracy": 0.9092141538858414, | |
| "step": 260 | |
| }, | |
| { | |
| "epoch": 1.5278592375366569, | |
| "grad_norm": 1.5812796708802024, | |
| "learning_rate": 3.857852628922751e-05, | |
| "loss": 0.2656, | |
| "mean_token_accuracy": 0.9256365522742271, | |
| "step": 261 | |
| }, | |
| { | |
| "epoch": 1.533724340175953, | |
| "grad_norm": 2.037937113721398, | |
| "learning_rate": 3.856513942754329e-05, | |
| "loss": 0.3173, | |
| "mean_token_accuracy": 0.904233492910862, | |
| "step": 262 | |
| }, | |
| { | |
| "epoch": 1.5395894428152492, | |
| "grad_norm": 1.6287239034648493, | |
| "learning_rate": 3.8551692441093183e-05, | |
| "loss": 0.2402, | |
| "mean_token_accuracy": 0.927992507815361, | |
| "step": 263 | |
| }, | |
| { | |
| "epoch": 1.5454545454545454, | |
| "grad_norm": 1.7234547040810766, | |
| "learning_rate": 3.85381853786843e-05, | |
| "loss": 0.4069, | |
| "mean_token_accuracy": 0.8724810630083084, | |
| "step": 264 | |
| }, | |
| { | |
| "epoch": 1.5513196480938416, | |
| "grad_norm": 1.9738399030452272, | |
| "learning_rate": 3.852461828934184e-05, | |
| "loss": 0.3796, | |
| "mean_token_accuracy": 0.8994789123535156, | |
| "step": 265 | |
| }, | |
| { | |
| "epoch": 1.5571847507331378, | |
| "grad_norm": 1.7419666030613636, | |
| "learning_rate": 3.851099122230885e-05, | |
| "loss": 0.2919, | |
| "mean_token_accuracy": 0.9129809066653252, | |
| "step": 266 | |
| }, | |
| { | |
| "epoch": 1.563049853372434, | |
| "grad_norm": 1.628062291876651, | |
| "learning_rate": 3.849730422704608e-05, | |
| "loss": 0.4191, | |
| "mean_token_accuracy": 0.8906111344695091, | |
| "step": 267 | |
| }, | |
| { | |
| "epoch": 1.5689149560117301, | |
| "grad_norm": 2.0803530282960305, | |
| "learning_rate": 3.84835573532318e-05, | |
| "loss": 0.2656, | |
| "mean_token_accuracy": 0.9183074086904526, | |
| "step": 268 | |
| }, | |
| { | |
| "epoch": 1.5747800586510263, | |
| "grad_norm": 1.8607789222018924, | |
| "learning_rate": 3.84697506507616e-05, | |
| "loss": 0.3953, | |
| "mean_token_accuracy": 0.8888256177306175, | |
| "step": 269 | |
| }, | |
| { | |
| "epoch": 1.5806451612903225, | |
| "grad_norm": 2.130794994174186, | |
| "learning_rate": 3.845588416974824e-05, | |
| "loss": 0.3848, | |
| "mean_token_accuracy": 0.9071919843554497, | |
| "step": 270 | |
| }, | |
| { | |
| "epoch": 1.5865102639296187, | |
| "grad_norm": 1.9020220222823165, | |
| "learning_rate": 3.844195796052144e-05, | |
| "loss": 0.3578, | |
| "mean_token_accuracy": 0.9020521864295006, | |
| "step": 271 | |
| }, | |
| { | |
| "epoch": 1.5923753665689149, | |
| "grad_norm": 1.8658553019399349, | |
| "learning_rate": 3.8427972073627724e-05, | |
| "loss": 0.5285, | |
| "mean_token_accuracy": 0.8661686107516289, | |
| "step": 272 | |
| }, | |
| { | |
| "epoch": 1.598240469208211, | |
| "grad_norm": 2.0730360398150642, | |
| "learning_rate": 3.841392655983021e-05, | |
| "loss": 0.2402, | |
| "mean_token_accuracy": 0.9231050238013268, | |
| "step": 273 | |
| }, | |
| { | |
| "epoch": 1.6041055718475072, | |
| "grad_norm": 1.34555292540441, | |
| "learning_rate": 3.8399821470108444e-05, | |
| "loss": 0.2042, | |
| "mean_token_accuracy": 0.9374109655618668, | |
| "step": 274 | |
| }, | |
| { | |
| "epoch": 1.6099706744868034, | |
| "grad_norm": 1.9527205991987846, | |
| "learning_rate": 3.838565685565819e-05, | |
| "loss": 0.4687, | |
| "mean_token_accuracy": 0.8773292899131775, | |
| "step": 275 | |
| }, | |
| { | |
| "epoch": 1.6158357771260996, | |
| "grad_norm": 1.8883224403836536, | |
| "learning_rate": 3.8371432767891295e-05, | |
| "loss": 0.3526, | |
| "mean_token_accuracy": 0.903610423207283, | |
| "step": 276 | |
| }, | |
| { | |
| "epoch": 1.6217008797653958, | |
| "grad_norm": 1.7546704899486176, | |
| "learning_rate": 3.8357149258435444e-05, | |
| "loss": 0.2904, | |
| "mean_token_accuracy": 0.9173143953084946, | |
| "step": 277 | |
| }, | |
| { | |
| "epoch": 1.627565982404692, | |
| "grad_norm": 2.0511854336671473, | |
| "learning_rate": 3.8342806379134005e-05, | |
| "loss": 0.4361, | |
| "mean_token_accuracy": 0.8790148869156837, | |
| "step": 278 | |
| }, | |
| { | |
| "epoch": 1.6334310850439882, | |
| "grad_norm": 2.0568091503722035, | |
| "learning_rate": 3.8328404182045854e-05, | |
| "loss": 0.3654, | |
| "mean_token_accuracy": 0.9027048945426941, | |
| "step": 279 | |
| }, | |
| { | |
| "epoch": 1.6392961876832843, | |
| "grad_norm": 2.0230684624428235, | |
| "learning_rate": 3.831394271944512e-05, | |
| "loss": 0.358, | |
| "mean_token_accuracy": 0.9075604230165482, | |
| "step": 280 | |
| }, | |
| { | |
| "epoch": 1.6451612903225805, | |
| "grad_norm": 1.894319774470119, | |
| "learning_rate": 3.82994220438211e-05, | |
| "loss": 0.3609, | |
| "mean_token_accuracy": 0.8942530304193497, | |
| "step": 281 | |
| }, | |
| { | |
| "epoch": 1.6510263929618767, | |
| "grad_norm": 1.8981276803059295, | |
| "learning_rate": 3.828484220787797e-05, | |
| "loss": 0.3854, | |
| "mean_token_accuracy": 0.89190324395895, | |
| "step": 282 | |
| }, | |
| { | |
| "epoch": 1.6568914956011729, | |
| "grad_norm": 2.2732377477966326, | |
| "learning_rate": 3.8270203264534644e-05, | |
| "loss": 0.4659, | |
| "mean_token_accuracy": 0.8762017115950584, | |
| "step": 283 | |
| }, | |
| { | |
| "epoch": 1.662756598240469, | |
| "grad_norm": 1.836975254349824, | |
| "learning_rate": 3.8255505266924585e-05, | |
| "loss": 0.3508, | |
| "mean_token_accuracy": 0.897898942232132, | |
| "step": 284 | |
| }, | |
| { | |
| "epoch": 1.6686217008797652, | |
| "grad_norm": 1.8958858575917799, | |
| "learning_rate": 3.824074826839557e-05, | |
| "loss": 0.2678, | |
| "mean_token_accuracy": 0.9237991869449615, | |
| "step": 285 | |
| }, | |
| { | |
| "epoch": 1.6744868035190614, | |
| "grad_norm": 2.4908800796256765, | |
| "learning_rate": 3.822593232250956e-05, | |
| "loss": 0.4807, | |
| "mean_token_accuracy": 0.8728378862142563, | |
| "step": 286 | |
| }, | |
| { | |
| "epoch": 1.6803519061583576, | |
| "grad_norm": 2.449596026859127, | |
| "learning_rate": 3.8211057483042446e-05, | |
| "loss": 0.5197, | |
| "mean_token_accuracy": 0.8701219037175179, | |
| "step": 287 | |
| }, | |
| { | |
| "epoch": 1.6862170087976538, | |
| "grad_norm": 2.126362403478398, | |
| "learning_rate": 3.8196123803983895e-05, | |
| "loss": 0.3782, | |
| "mean_token_accuracy": 0.8976033478975296, | |
| "step": 288 | |
| }, | |
| { | |
| "epoch": 1.6920821114369502, | |
| "grad_norm": 1.9552263402442664, | |
| "learning_rate": 3.818113133953712e-05, | |
| "loss": 0.3354, | |
| "mean_token_accuracy": 0.9046521782875061, | |
| "step": 289 | |
| }, | |
| { | |
| "epoch": 1.6979472140762464, | |
| "grad_norm": 1.559218994516208, | |
| "learning_rate": 3.816608014411872e-05, | |
| "loss": 0.2451, | |
| "mean_token_accuracy": 0.9280602782964706, | |
| "step": 290 | |
| }, | |
| { | |
| "epoch": 1.7038123167155426, | |
| "grad_norm": 1.5955884312797561, | |
| "learning_rate": 3.815097027235845e-05, | |
| "loss": 0.3444, | |
| "mean_token_accuracy": 0.8998289778828621, | |
| "step": 291 | |
| }, | |
| { | |
| "epoch": 1.7096774193548387, | |
| "grad_norm": 1.923537625514968, | |
| "learning_rate": 3.813580177909906e-05, | |
| "loss": 0.2937, | |
| "mean_token_accuracy": 0.9103951752185822, | |
| "step": 292 | |
| }, | |
| { | |
| "epoch": 1.715542521994135, | |
| "grad_norm": 1.627753503508524, | |
| "learning_rate": 3.8120574719396023e-05, | |
| "loss": 0.3034, | |
| "mean_token_accuracy": 0.918914794921875, | |
| "step": 293 | |
| }, | |
| { | |
| "epoch": 1.721407624633431, | |
| "grad_norm": 2.3014152350526924, | |
| "learning_rate": 3.810528914851745e-05, | |
| "loss": 0.4585, | |
| "mean_token_accuracy": 0.878827765583992, | |
| "step": 294 | |
| }, | |
| { | |
| "epoch": 1.7272727272727273, | |
| "grad_norm": 1.9488757659462697, | |
| "learning_rate": 3.808994512194376e-05, | |
| "loss": 0.3841, | |
| "mean_token_accuracy": 0.8855108916759491, | |
| "step": 295 | |
| }, | |
| { | |
| "epoch": 1.7331378299120235, | |
| "grad_norm": 1.8510675195890514, | |
| "learning_rate": 3.807454269536758e-05, | |
| "loss": 0.4001, | |
| "mean_token_accuracy": 0.8860180526971817, | |
| "step": 296 | |
| }, | |
| { | |
| "epoch": 1.7390029325513197, | |
| "grad_norm": 1.9187885250043828, | |
| "learning_rate": 3.805908192469351e-05, | |
| "loss": 0.2789, | |
| "mean_token_accuracy": 0.9083529412746429, | |
| "step": 297 | |
| }, | |
| { | |
| "epoch": 1.7448680351906158, | |
| "grad_norm": 1.9174627349215367, | |
| "learning_rate": 3.80435628660379e-05, | |
| "loss": 0.3674, | |
| "mean_token_accuracy": 0.8953339979052544, | |
| "step": 298 | |
| }, | |
| { | |
| "epoch": 1.750733137829912, | |
| "grad_norm": 2.0113269897331416, | |
| "learning_rate": 3.802798557572867e-05, | |
| "loss": 0.3684, | |
| "mean_token_accuracy": 0.8994771614670753, | |
| "step": 299 | |
| }, | |
| { | |
| "epoch": 1.7565982404692082, | |
| "grad_norm": 2.2118619867672207, | |
| "learning_rate": 3.801235011030506e-05, | |
| "loss": 0.3636, | |
| "mean_token_accuracy": 0.896511547267437, | |
| "step": 300 | |
| }, | |
| { | |
| "epoch": 1.7624633431085044, | |
| "grad_norm": 1.7539387925625358, | |
| "learning_rate": 3.799665652651754e-05, | |
| "loss": 0.2227, | |
| "mean_token_accuracy": 0.9379914700984955, | |
| "step": 301 | |
| }, | |
| { | |
| "epoch": 1.7683284457478006, | |
| "grad_norm": 1.6438481796571975, | |
| "learning_rate": 3.7980904881327446e-05, | |
| "loss": 0.3014, | |
| "mean_token_accuracy": 0.9185338690876961, | |
| "step": 302 | |
| }, | |
| { | |
| "epoch": 1.7741935483870968, | |
| "grad_norm": 1.9612873325388276, | |
| "learning_rate": 3.796509523190691e-05, | |
| "loss": 0.3237, | |
| "mean_token_accuracy": 0.9047940298914909, | |
| "step": 303 | |
| }, | |
| { | |
| "epoch": 1.780058651026393, | |
| "grad_norm": 1.7622576932704024, | |
| "learning_rate": 3.794922763563857e-05, | |
| "loss": 0.2483, | |
| "mean_token_accuracy": 0.9288651943206787, | |
| "step": 304 | |
| }, | |
| { | |
| "epoch": 1.7859237536656891, | |
| "grad_norm": 2.176660314874238, | |
| "learning_rate": 3.793330215011538e-05, | |
| "loss": 0.3704, | |
| "mean_token_accuracy": 0.9133122861385345, | |
| "step": 305 | |
| }, | |
| { | |
| "epoch": 1.7917888563049853, | |
| "grad_norm": 1.8647037847416894, | |
| "learning_rate": 3.791731883314043e-05, | |
| "loss": 0.3288, | |
| "mean_token_accuracy": 0.9017655923962593, | |
| "step": 306 | |
| }, | |
| { | |
| "epoch": 1.7976539589442815, | |
| "grad_norm": 2.0287011162165647, | |
| "learning_rate": 3.790127774272671e-05, | |
| "loss": 0.2683, | |
| "mean_token_accuracy": 0.9209354743361473, | |
| "step": 307 | |
| }, | |
| { | |
| "epoch": 1.8035190615835777, | |
| "grad_norm": 1.5719010823135073, | |
| "learning_rate": 3.7885178937096884e-05, | |
| "loss": 0.4283, | |
| "mean_token_accuracy": 0.8839877769351006, | |
| "step": 308 | |
| }, | |
| { | |
| "epoch": 1.8093841642228738, | |
| "grad_norm": 2.060543329624031, | |
| "learning_rate": 3.7869022474683125e-05, | |
| "loss": 0.4768, | |
| "mean_token_accuracy": 0.8875997290015221, | |
| "step": 309 | |
| }, | |
| { | |
| "epoch": 1.8152492668621703, | |
| "grad_norm": 2.350189411212048, | |
| "learning_rate": 3.7852808414126876e-05, | |
| "loss": 0.4034, | |
| "mean_token_accuracy": 0.8856799080967903, | |
| "step": 310 | |
| }, | |
| { | |
| "epoch": 1.8211143695014664, | |
| "grad_norm": 1.5440198943153403, | |
| "learning_rate": 3.783653681427861e-05, | |
| "loss": 0.2551, | |
| "mean_token_accuracy": 0.9266308322548866, | |
| "step": 311 | |
| }, | |
| { | |
| "epoch": 1.8269794721407626, | |
| "grad_norm": 2.8616037129091634, | |
| "learning_rate": 3.7820207734197676e-05, | |
| "loss": 0.3565, | |
| "mean_token_accuracy": 0.8989051878452301, | |
| "step": 312 | |
| }, | |
| { | |
| "epoch": 1.8328445747800588, | |
| "grad_norm": 1.655020538470568, | |
| "learning_rate": 3.780382123315203e-05, | |
| "loss": 0.2381, | |
| "mean_token_accuracy": 0.932477205991745, | |
| "step": 313 | |
| }, | |
| { | |
| "epoch": 1.838709677419355, | |
| "grad_norm": 1.634046108030118, | |
| "learning_rate": 3.778737737061807e-05, | |
| "loss": 0.3528, | |
| "mean_token_accuracy": 0.901824451982975, | |
| "step": 314 | |
| }, | |
| { | |
| "epoch": 1.8445747800586512, | |
| "grad_norm": 1.8055968405353495, | |
| "learning_rate": 3.777087620628035e-05, | |
| "loss": 0.2607, | |
| "mean_token_accuracy": 0.9298326820135117, | |
| "step": 315 | |
| }, | |
| { | |
| "epoch": 1.8504398826979473, | |
| "grad_norm": 1.565331504140896, | |
| "learning_rate": 3.775431780003145e-05, | |
| "loss": 0.2588, | |
| "mean_token_accuracy": 0.9298610910773277, | |
| "step": 316 | |
| }, | |
| { | |
| "epoch": 1.8563049853372435, | |
| "grad_norm": 2.4969352173136725, | |
| "learning_rate": 3.7737702211971684e-05, | |
| "loss": 0.2831, | |
| "mean_token_accuracy": 0.9310391396284103, | |
| "step": 317 | |
| }, | |
| { | |
| "epoch": 1.8621700879765397, | |
| "grad_norm": 1.6415395253616827, | |
| "learning_rate": 3.772102950240895e-05, | |
| "loss": 0.2813, | |
| "mean_token_accuracy": 0.9285251498222351, | |
| "step": 318 | |
| }, | |
| { | |
| "epoch": 1.868035190615836, | |
| "grad_norm": 1.9305348853876765, | |
| "learning_rate": 3.770429973185842e-05, | |
| "loss": 0.3427, | |
| "mean_token_accuracy": 0.9079779386520386, | |
| "step": 319 | |
| }, | |
| { | |
| "epoch": 1.873900293255132, | |
| "grad_norm": 1.9953718686593638, | |
| "learning_rate": 3.768751296104243e-05, | |
| "loss": 0.2254, | |
| "mean_token_accuracy": 0.9326372891664505, | |
| "step": 320 | |
| }, | |
| { | |
| "epoch": 1.8797653958944283, | |
| "grad_norm": 1.320578324916696, | |
| "learning_rate": 3.767066925089017e-05, | |
| "loss": 0.3231, | |
| "mean_token_accuracy": 0.9050979241728783, | |
| "step": 321 | |
| }, | |
| { | |
| "epoch": 1.8856304985337244, | |
| "grad_norm": 1.630307723589402, | |
| "learning_rate": 3.765376866253749e-05, | |
| "loss": 0.2295, | |
| "mean_token_accuracy": 0.9237135499715805, | |
| "step": 322 | |
| }, | |
| { | |
| "epoch": 1.8914956011730206, | |
| "grad_norm": 1.8687730597426178, | |
| "learning_rate": 3.763681125732672e-05, | |
| "loss": 0.2979, | |
| "mean_token_accuracy": 0.9019790291786194, | |
| "step": 323 | |
| }, | |
| { | |
| "epoch": 1.8973607038123168, | |
| "grad_norm": 1.9421616274693134, | |
| "learning_rate": 3.7619797096806386e-05, | |
| "loss": 0.3197, | |
| "mean_token_accuracy": 0.9111294597387314, | |
| "step": 324 | |
| }, | |
| { | |
| "epoch": 1.903225806451613, | |
| "grad_norm": 1.7208098178259272, | |
| "learning_rate": 3.7602726242731016e-05, | |
| "loss": 0.366, | |
| "mean_token_accuracy": 0.9027081355452538, | |
| "step": 325 | |
| }, | |
| { | |
| "epoch": 1.9090909090909092, | |
| "grad_norm": 2.006658041095116, | |
| "learning_rate": 3.758559875706092e-05, | |
| "loss": 0.2679, | |
| "mean_token_accuracy": 0.9249737039208412, | |
| "step": 326 | |
| }, | |
| { | |
| "epoch": 1.9149560117302054, | |
| "grad_norm": 1.4026289668561007, | |
| "learning_rate": 3.756841470196195e-05, | |
| "loss": 0.3585, | |
| "mean_token_accuracy": 0.9043615832924843, | |
| "step": 327 | |
| }, | |
| { | |
| "epoch": 1.9208211143695015, | |
| "grad_norm": 1.694343333897619, | |
| "learning_rate": 3.7551174139805284e-05, | |
| "loss": 0.3979, | |
| "mean_token_accuracy": 0.8909201622009277, | |
| "step": 328 | |
| }, | |
| { | |
| "epoch": 1.9266862170087977, | |
| "grad_norm": 1.8540786148822945, | |
| "learning_rate": 3.75338771331672e-05, | |
| "loss": 0.369, | |
| "mean_token_accuracy": 0.8931919634342194, | |
| "step": 329 | |
| }, | |
| { | |
| "epoch": 1.932551319648094, | |
| "grad_norm": 1.8043389433770838, | |
| "learning_rate": 3.7516523744828856e-05, | |
| "loss": 0.4165, | |
| "mean_token_accuracy": 0.8895757496356964, | |
| "step": 330 | |
| }, | |
| { | |
| "epoch": 1.93841642228739, | |
| "grad_norm": 1.7007383330359485, | |
| "learning_rate": 3.7499114037776036e-05, | |
| "loss": 0.3277, | |
| "mean_token_accuracy": 0.8950676620006561, | |
| "step": 331 | |
| }, | |
| { | |
| "epoch": 1.9442815249266863, | |
| "grad_norm": 1.8371484503426208, | |
| "learning_rate": 3.748164807519894e-05, | |
| "loss": 0.4762, | |
| "mean_token_accuracy": 0.8785777390003204, | |
| "step": 332 | |
| }, | |
| { | |
| "epoch": 1.9501466275659824, | |
| "grad_norm": 1.9576933523351108, | |
| "learning_rate": 3.746412592049197e-05, | |
| "loss": 0.3305, | |
| "mean_token_accuracy": 0.9062154516577721, | |
| "step": 333 | |
| }, | |
| { | |
| "epoch": 1.9560117302052786, | |
| "grad_norm": 1.4572901805688738, | |
| "learning_rate": 3.7446547637253464e-05, | |
| "loss": 0.2221, | |
| "mean_token_accuracy": 0.9385078474879265, | |
| "step": 334 | |
| }, | |
| { | |
| "epoch": 1.9618768328445748, | |
| "grad_norm": 1.7468272756441137, | |
| "learning_rate": 3.742891328928549e-05, | |
| "loss": 0.3222, | |
| "mean_token_accuracy": 0.9129680246114731, | |
| "step": 335 | |
| }, | |
| { | |
| "epoch": 1.967741935483871, | |
| "grad_norm": 1.273850432393817, | |
| "learning_rate": 3.74112229405936e-05, | |
| "loss": 0.2864, | |
| "mean_token_accuracy": 0.9150940924882889, | |
| "step": 336 | |
| }, | |
| { | |
| "epoch": 1.9736070381231672, | |
| "grad_norm": 1.6128401848475018, | |
| "learning_rate": 3.739347665538664e-05, | |
| "loss": 0.3245, | |
| "mean_token_accuracy": 0.9111495912075043, | |
| "step": 337 | |
| }, | |
| { | |
| "epoch": 1.9794721407624634, | |
| "grad_norm": 1.8812573264821617, | |
| "learning_rate": 3.7375674498076445e-05, | |
| "loss": 0.4184, | |
| "mean_token_accuracy": 0.8886971697211266, | |
| "step": 338 | |
| }, | |
| { | |
| "epoch": 1.9853372434017595, | |
| "grad_norm": 2.12365255006061, | |
| "learning_rate": 3.7357816533277646e-05, | |
| "loss": 0.3149, | |
| "mean_token_accuracy": 0.9159363061189651, | |
| "step": 339 | |
| }, | |
| { | |
| "epoch": 1.9912023460410557, | |
| "grad_norm": 1.6503920768703404, | |
| "learning_rate": 3.733990282580745e-05, | |
| "loss": 0.3169, | |
| "mean_token_accuracy": 0.9083178415894508, | |
| "step": 340 | |
| }, | |
| { | |
| "epoch": 1.997067448680352, | |
| "grad_norm": 1.7743070899939994, | |
| "learning_rate": 3.732193344068539e-05, | |
| "loss": 0.3129, | |
| "mean_token_accuracy": 0.9127020239830017, | |
| "step": 341 | |
| }, | |
| { | |
| "epoch": 2.0, | |
| "grad_norm": 2.5233914351994167, | |
| "learning_rate": 3.7303908443133054e-05, | |
| "loss": 0.2022, | |
| "mean_token_accuracy": 0.9405500143766403, | |
| "step": 342 | |
| }, | |
| { | |
| "epoch": 2.005865102639296, | |
| "grad_norm": 1.6693513338029358, | |
| "learning_rate": 3.728582789857393e-05, | |
| "loss": 0.2042, | |
| "mean_token_accuracy": 0.9472683444619179, | |
| "step": 343 | |
| }, | |
| { | |
| "epoch": 2.0117302052785924, | |
| "grad_norm": 1.6340282087097624, | |
| "learning_rate": 3.726769187263308e-05, | |
| "loss": 0.2367, | |
| "mean_token_accuracy": 0.9289553239941597, | |
| "step": 344 | |
| }, | |
| { | |
| "epoch": 2.0175953079178885, | |
| "grad_norm": 1.2768152185347872, | |
| "learning_rate": 3.724950043113695e-05, | |
| "loss": 0.1532, | |
| "mean_token_accuracy": 0.9543287456035614, | |
| "step": 345 | |
| }, | |
| { | |
| "epoch": 2.0234604105571847, | |
| "grad_norm": 1.4037689102542255, | |
| "learning_rate": 3.723125364011313e-05, | |
| "loss": 0.1561, | |
| "mean_token_accuracy": 0.9580844938755035, | |
| "step": 346 | |
| }, | |
| { | |
| "epoch": 2.029325513196481, | |
| "grad_norm": 1.4427756644988328, | |
| "learning_rate": 3.7212951565790094e-05, | |
| "loss": 0.1636, | |
| "mean_token_accuracy": 0.9480728134512901, | |
| "step": 347 | |
| }, | |
| { | |
| "epoch": 2.035190615835777, | |
| "grad_norm": 1.7720364579189496, | |
| "learning_rate": 3.7194594274597e-05, | |
| "loss": 0.1859, | |
| "mean_token_accuracy": 0.940785750746727, | |
| "step": 348 | |
| }, | |
| { | |
| "epoch": 2.0410557184750733, | |
| "grad_norm": 1.6271299438235751, | |
| "learning_rate": 3.7176181833163385e-05, | |
| "loss": 0.1929, | |
| "mean_token_accuracy": 0.9450634941458702, | |
| "step": 349 | |
| }, | |
| { | |
| "epoch": 2.0469208211143695, | |
| "grad_norm": 1.7097076323820675, | |
| "learning_rate": 3.7157714308318966e-05, | |
| "loss": 0.1878, | |
| "mean_token_accuracy": 0.948820598423481, | |
| "step": 350 | |
| }, | |
| { | |
| "epoch": 2.0527859237536656, | |
| "grad_norm": 2.0125156134083753, | |
| "learning_rate": 3.713919176709343e-05, | |
| "loss": 0.217, | |
| "mean_token_accuracy": 0.9398675486445427, | |
| "step": 351 | |
| }, | |
| { | |
| "epoch": 2.058651026392962, | |
| "grad_norm": 1.513801237369741, | |
| "learning_rate": 3.712061427671609e-05, | |
| "loss": 0.1608, | |
| "mean_token_accuracy": 0.9536798968911171, | |
| "step": 352 | |
| }, | |
| { | |
| "epoch": 2.064516129032258, | |
| "grad_norm": 1.7205171569969244, | |
| "learning_rate": 3.710198190461575e-05, | |
| "loss": 0.1991, | |
| "mean_token_accuracy": 0.9481086954474449, | |
| "step": 353 | |
| }, | |
| { | |
| "epoch": 2.070381231671554, | |
| "grad_norm": 1.6612412428232648, | |
| "learning_rate": 3.7083294718420394e-05, | |
| "loss": 0.1958, | |
| "mean_token_accuracy": 0.94148388504982, | |
| "step": 354 | |
| }, | |
| { | |
| "epoch": 2.0762463343108504, | |
| "grad_norm": 1.6267501719327955, | |
| "learning_rate": 3.706455278595696e-05, | |
| "loss": 0.1845, | |
| "mean_token_accuracy": 0.9427782371640205, | |
| "step": 355 | |
| }, | |
| { | |
| "epoch": 2.0821114369501466, | |
| "grad_norm": 1.5866786415809722, | |
| "learning_rate": 3.7045756175251086e-05, | |
| "loss": 0.1861, | |
| "mean_token_accuracy": 0.942706435918808, | |
| "step": 356 | |
| }, | |
| { | |
| "epoch": 2.0879765395894427, | |
| "grad_norm": 1.5012592202625572, | |
| "learning_rate": 3.7026904954526884e-05, | |
| "loss": 0.161, | |
| "mean_token_accuracy": 0.9514438956975937, | |
| "step": 357 | |
| }, | |
| { | |
| "epoch": 2.093841642228739, | |
| "grad_norm": 1.341766760682961, | |
| "learning_rate": 3.7007999192206676e-05, | |
| "loss": 0.1527, | |
| "mean_token_accuracy": 0.9533711373806, | |
| "step": 358 | |
| }, | |
| { | |
| "epoch": 2.099706744868035, | |
| "grad_norm": 1.320127310766138, | |
| "learning_rate": 3.698903895691073e-05, | |
| "loss": 0.1811, | |
| "mean_token_accuracy": 0.9414249658584595, | |
| "step": 359 | |
| }, | |
| { | |
| "epoch": 2.1055718475073313, | |
| "grad_norm": 1.6248661841477403, | |
| "learning_rate": 3.697002431745706e-05, | |
| "loss": 0.1874, | |
| "mean_token_accuracy": 0.9447718486189842, | |
| "step": 360 | |
| }, | |
| { | |
| "epoch": 2.1114369501466275, | |
| "grad_norm": 1.6590729632222045, | |
| "learning_rate": 3.695095534286111e-05, | |
| "loss": 0.201, | |
| "mean_token_accuracy": 0.945906400680542, | |
| "step": 361 | |
| }, | |
| { | |
| "epoch": 2.1173020527859236, | |
| "grad_norm": 1.5275030906913138, | |
| "learning_rate": 3.693183210233557e-05, | |
| "loss": 0.1859, | |
| "mean_token_accuracy": 0.9475782960653305, | |
| "step": 362 | |
| }, | |
| { | |
| "epoch": 2.12316715542522, | |
| "grad_norm": 1.982158053574525, | |
| "learning_rate": 3.691265466529007e-05, | |
| "loss": 0.1766, | |
| "mean_token_accuracy": 0.9372566938400269, | |
| "step": 363 | |
| }, | |
| { | |
| "epoch": 2.129032258064516, | |
| "grad_norm": 1.9819879546520034, | |
| "learning_rate": 3.689342310133097e-05, | |
| "loss": 0.1494, | |
| "mean_token_accuracy": 0.9539884477853775, | |
| "step": 364 | |
| }, | |
| { | |
| "epoch": 2.134897360703812, | |
| "grad_norm": 1.5751039070401411, | |
| "learning_rate": 3.687413748026108e-05, | |
| "loss": 0.1735, | |
| "mean_token_accuracy": 0.9495319053530693, | |
| "step": 365 | |
| }, | |
| { | |
| "epoch": 2.1407624633431084, | |
| "grad_norm": 1.4585998236912616, | |
| "learning_rate": 3.68547978720794e-05, | |
| "loss": 0.1734, | |
| "mean_token_accuracy": 0.9489102214574814, | |
| "step": 366 | |
| }, | |
| { | |
| "epoch": 2.1466275659824046, | |
| "grad_norm": 1.5221638866423055, | |
| "learning_rate": 3.683540434698093e-05, | |
| "loss": 0.1723, | |
| "mean_token_accuracy": 0.9449229016900063, | |
| "step": 367 | |
| }, | |
| { | |
| "epoch": 2.1524926686217007, | |
| "grad_norm": 1.481044246133322, | |
| "learning_rate": 3.681595697535629e-05, | |
| "loss": 0.1545, | |
| "mean_token_accuracy": 0.9535570293664932, | |
| "step": 368 | |
| }, | |
| { | |
| "epoch": 2.158357771260997, | |
| "grad_norm": 1.5478884613470323, | |
| "learning_rate": 3.6796455827791614e-05, | |
| "loss": 0.16, | |
| "mean_token_accuracy": 0.9515524953603745, | |
| "step": 369 | |
| }, | |
| { | |
| "epoch": 2.164222873900293, | |
| "grad_norm": 1.5557336850213985, | |
| "learning_rate": 3.677690097506819e-05, | |
| "loss": 0.1896, | |
| "mean_token_accuracy": 0.9467919543385506, | |
| "step": 370 | |
| }, | |
| { | |
| "epoch": 2.1700879765395893, | |
| "grad_norm": 1.361152442880772, | |
| "learning_rate": 3.6757292488162224e-05, | |
| "loss": 0.1785, | |
| "mean_token_accuracy": 0.9459036141633987, | |
| "step": 371 | |
| }, | |
| { | |
| "epoch": 2.1759530791788855, | |
| "grad_norm": 1.6123633223927731, | |
| "learning_rate": 3.673763043824461e-05, | |
| "loss": 0.2162, | |
| "mean_token_accuracy": 0.9359398260712624, | |
| "step": 372 | |
| }, | |
| { | |
| "epoch": 2.1818181818181817, | |
| "grad_norm": 1.812017533620379, | |
| "learning_rate": 3.671791489668065e-05, | |
| "loss": 0.2023, | |
| "mean_token_accuracy": 0.9429754018783569, | |
| "step": 373 | |
| }, | |
| { | |
| "epoch": 2.187683284457478, | |
| "grad_norm": 1.6502036792084445, | |
| "learning_rate": 3.6698145935029794e-05, | |
| "loss": 0.1845, | |
| "mean_token_accuracy": 0.9518390074372292, | |
| "step": 374 | |
| }, | |
| { | |
| "epoch": 2.193548387096774, | |
| "grad_norm": 1.545266807845691, | |
| "learning_rate": 3.66783236250454e-05, | |
| "loss": 0.1823, | |
| "mean_token_accuracy": 0.9467039182782173, | |
| "step": 375 | |
| }, | |
| { | |
| "epoch": 2.19941348973607, | |
| "grad_norm": 1.6477618113035692, | |
| "learning_rate": 3.665844803867443e-05, | |
| "loss": 0.2145, | |
| "mean_token_accuracy": 0.9425032809376717, | |
| "step": 376 | |
| }, | |
| { | |
| "epoch": 2.2052785923753664, | |
| "grad_norm": 1.5604687026013282, | |
| "learning_rate": 3.663851924805725e-05, | |
| "loss": 0.1997, | |
| "mean_token_accuracy": 0.9416602700948715, | |
| "step": 377 | |
| }, | |
| { | |
| "epoch": 2.2111436950146626, | |
| "grad_norm": 1.550988527543912, | |
| "learning_rate": 3.66185373255273e-05, | |
| "loss": 0.1757, | |
| "mean_token_accuracy": 0.9436255395412445, | |
| "step": 378 | |
| }, | |
| { | |
| "epoch": 2.2170087976539588, | |
| "grad_norm": 1.2327976923264585, | |
| "learning_rate": 3.6598502343610906e-05, | |
| "loss": 0.1482, | |
| "mean_token_accuracy": 0.9562384858727455, | |
| "step": 379 | |
| }, | |
| { | |
| "epoch": 2.222873900293255, | |
| "grad_norm": 1.6237415577928969, | |
| "learning_rate": 3.657841437502697e-05, | |
| "loss": 0.2253, | |
| "mean_token_accuracy": 0.9289913475513458, | |
| "step": 380 | |
| }, | |
| { | |
| "epoch": 2.228739002932551, | |
| "grad_norm": 1.8607660215120763, | |
| "learning_rate": 3.6558273492686686e-05, | |
| "loss": 0.2089, | |
| "mean_token_accuracy": 0.9374095126986504, | |
| "step": 381 | |
| }, | |
| { | |
| "epoch": 2.2346041055718473, | |
| "grad_norm": 1.360745609090226, | |
| "learning_rate": 3.6538079769693334e-05, | |
| "loss": 0.1671, | |
| "mean_token_accuracy": 0.9510230720043182, | |
| "step": 382 | |
| }, | |
| { | |
| "epoch": 2.2404692082111435, | |
| "grad_norm": 1.257785870906961, | |
| "learning_rate": 3.6517833279341954e-05, | |
| "loss": 0.1522, | |
| "mean_token_accuracy": 0.9563170224428177, | |
| "step": 383 | |
| }, | |
| { | |
| "epoch": 2.2463343108504397, | |
| "grad_norm": 1.3172632320528042, | |
| "learning_rate": 3.649753409511916e-05, | |
| "loss": 0.1561, | |
| "mean_token_accuracy": 0.9573013335466385, | |
| "step": 384 | |
| }, | |
| { | |
| "epoch": 2.252199413489736, | |
| "grad_norm": 1.5928604842461005, | |
| "learning_rate": 3.6477182290702766e-05, | |
| "loss": 0.1973, | |
| "mean_token_accuracy": 0.9359570667147636, | |
| "step": 385 | |
| }, | |
| { | |
| "epoch": 2.258064516129032, | |
| "grad_norm": 1.6603723493729567, | |
| "learning_rate": 3.645677793996161e-05, | |
| "loss": 0.1963, | |
| "mean_token_accuracy": 0.9404471442103386, | |
| "step": 386 | |
| }, | |
| { | |
| "epoch": 2.263929618768328, | |
| "grad_norm": 1.6206060274389453, | |
| "learning_rate": 3.643632111695525e-05, | |
| "loss": 0.1939, | |
| "mean_token_accuracy": 0.9425263479351997, | |
| "step": 387 | |
| }, | |
| { | |
| "epoch": 2.2697947214076244, | |
| "grad_norm": 1.6657671791715885, | |
| "learning_rate": 3.6415811895933685e-05, | |
| "loss": 0.1856, | |
| "mean_token_accuracy": 0.9430495277047157, | |
| "step": 388 | |
| }, | |
| { | |
| "epoch": 2.2756598240469206, | |
| "grad_norm": 1.2712705911697364, | |
| "learning_rate": 3.639525035133712e-05, | |
| "loss": 0.1552, | |
| "mean_token_accuracy": 0.9557842835783958, | |
| "step": 389 | |
| }, | |
| { | |
| "epoch": 2.281524926686217, | |
| "grad_norm": 1.5205774866081296, | |
| "learning_rate": 3.637463655779563e-05, | |
| "loss": 0.1895, | |
| "mean_token_accuracy": 0.9432679936289787, | |
| "step": 390 | |
| }, | |
| { | |
| "epoch": 2.2873900293255134, | |
| "grad_norm": 1.4033621614345595, | |
| "learning_rate": 3.6353970590128975e-05, | |
| "loss": 0.1686, | |
| "mean_token_accuracy": 0.9537108987569809, | |
| "step": 391 | |
| }, | |
| { | |
| "epoch": 2.2932551319648096, | |
| "grad_norm": 1.5519352504831416, | |
| "learning_rate": 3.633325252334628e-05, | |
| "loss": 0.1778, | |
| "mean_token_accuracy": 0.9453356117010117, | |
| "step": 392 | |
| }, | |
| { | |
| "epoch": 2.2991202346041058, | |
| "grad_norm": 1.5128374434884113, | |
| "learning_rate": 3.6312482432645746e-05, | |
| "loss": 0.1947, | |
| "mean_token_accuracy": 0.9424128010869026, | |
| "step": 393 | |
| }, | |
| { | |
| "epoch": 2.304985337243402, | |
| "grad_norm": 1.390174798784679, | |
| "learning_rate": 3.6291660393414414e-05, | |
| "loss": 0.1598, | |
| "mean_token_accuracy": 0.9516776651144028, | |
| "step": 394 | |
| }, | |
| { | |
| "epoch": 2.310850439882698, | |
| "grad_norm": 1.5333512021648155, | |
| "learning_rate": 3.6270786481227885e-05, | |
| "loss": 0.1986, | |
| "mean_token_accuracy": 0.9424618110060692, | |
| "step": 395 | |
| }, | |
| { | |
| "epoch": 2.3167155425219943, | |
| "grad_norm": 1.3874329069918987, | |
| "learning_rate": 3.624986077185003e-05, | |
| "loss": 0.1862, | |
| "mean_token_accuracy": 0.9473976641893387, | |
| "step": 396 | |
| }, | |
| { | |
| "epoch": 2.3225806451612905, | |
| "grad_norm": 1.483625628532948, | |
| "learning_rate": 3.622888334123272e-05, | |
| "loss": 0.1714, | |
| "mean_token_accuracy": 0.9534925371408463, | |
| "step": 397 | |
| }, | |
| { | |
| "epoch": 2.3284457478005867, | |
| "grad_norm": 1.4383158878361242, | |
| "learning_rate": 3.620785426551555e-05, | |
| "loss": 0.1634, | |
| "mean_token_accuracy": 0.9539240747690201, | |
| "step": 398 | |
| }, | |
| { | |
| "epoch": 2.334310850439883, | |
| "grad_norm": 1.1273627628727119, | |
| "learning_rate": 3.618677362102558e-05, | |
| "loss": 0.1316, | |
| "mean_token_accuracy": 0.961469940841198, | |
| "step": 399 | |
| }, | |
| { | |
| "epoch": 2.340175953079179, | |
| "grad_norm": 1.5220516996155353, | |
| "learning_rate": 3.616564148427703e-05, | |
| "loss": 0.1792, | |
| "mean_token_accuracy": 0.9433942660689354, | |
| "step": 400 | |
| }, | |
| { | |
| "epoch": 2.346041055718475, | |
| "grad_norm": 1.4608955929711822, | |
| "learning_rate": 3.614445793197103e-05, | |
| "loss": 0.1638, | |
| "mean_token_accuracy": 0.9492457285523415, | |
| "step": 401 | |
| }, | |
| { | |
| "epoch": 2.3519061583577714, | |
| "grad_norm": 1.5063014900643235, | |
| "learning_rate": 3.61232230409953e-05, | |
| "loss": 0.1816, | |
| "mean_token_accuracy": 0.9452838078141212, | |
| "step": 402 | |
| }, | |
| { | |
| "epoch": 2.3577712609970676, | |
| "grad_norm": 1.5879916738204585, | |
| "learning_rate": 3.6101936888423936e-05, | |
| "loss": 0.199, | |
| "mean_token_accuracy": 0.9505869597196579, | |
| "step": 403 | |
| }, | |
| { | |
| "epoch": 2.3636363636363638, | |
| "grad_norm": 1.6190060093628065, | |
| "learning_rate": 3.6080599551517076e-05, | |
| "loss": 0.1829, | |
| "mean_token_accuracy": 0.9452532529830933, | |
| "step": 404 | |
| }, | |
| { | |
| "epoch": 2.36950146627566, | |
| "grad_norm": 1.591052066609119, | |
| "learning_rate": 3.605921110772063e-05, | |
| "loss": 0.1952, | |
| "mean_token_accuracy": 0.9436501488089561, | |
| "step": 405 | |
| }, | |
| { | |
| "epoch": 2.375366568914956, | |
| "grad_norm": 1.4651530670386483, | |
| "learning_rate": 3.603777163466601e-05, | |
| "loss": 0.1626, | |
| "mean_token_accuracy": 0.949164867401123, | |
| "step": 406 | |
| }, | |
| { | |
| "epoch": 2.3812316715542523, | |
| "grad_norm": 1.5197490454848215, | |
| "learning_rate": 3.6016281210169844e-05, | |
| "loss": 0.1892, | |
| "mean_token_accuracy": 0.9424419403076172, | |
| "step": 407 | |
| }, | |
| { | |
| "epoch": 2.3870967741935485, | |
| "grad_norm": 1.5088850567602756, | |
| "learning_rate": 3.599473991223369e-05, | |
| "loss": 0.1816, | |
| "mean_token_accuracy": 0.9471491128206253, | |
| "step": 408 | |
| }, | |
| { | |
| "epoch": 2.3929618768328447, | |
| "grad_norm": 1.6258161225406804, | |
| "learning_rate": 3.5973147819043765e-05, | |
| "loss": 0.2049, | |
| "mean_token_accuracy": 0.9332642704248428, | |
| "step": 409 | |
| }, | |
| { | |
| "epoch": 2.398826979472141, | |
| "grad_norm": 1.4444906258969137, | |
| "learning_rate": 3.595150500897065e-05, | |
| "loss": 0.1983, | |
| "mean_token_accuracy": 0.9415558204054832, | |
| "step": 410 | |
| }, | |
| { | |
| "epoch": 2.404692082111437, | |
| "grad_norm": 1.6160102435014505, | |
| "learning_rate": 3.5929811560569e-05, | |
| "loss": 0.2069, | |
| "mean_token_accuracy": 0.9475803673267365, | |
| "step": 411 | |
| }, | |
| { | |
| "epoch": 2.410557184750733, | |
| "grad_norm": 1.3447926697756785, | |
| "learning_rate": 3.590806755257726e-05, | |
| "loss": 0.1654, | |
| "mean_token_accuracy": 0.94842179864645, | |
| "step": 412 | |
| }, | |
| { | |
| "epoch": 2.4164222873900294, | |
| "grad_norm": 1.4798870575264065, | |
| "learning_rate": 3.5886273063917426e-05, | |
| "loss": 0.178, | |
| "mean_token_accuracy": 0.9414676427841187, | |
| "step": 413 | |
| }, | |
| { | |
| "epoch": 2.4222873900293256, | |
| "grad_norm": 1.519064400272912, | |
| "learning_rate": 3.586442817369467e-05, | |
| "loss": 0.1913, | |
| "mean_token_accuracy": 0.9318802356719971, | |
| "step": 414 | |
| }, | |
| { | |
| "epoch": 2.4281524926686218, | |
| "grad_norm": 1.3625628764482547, | |
| "learning_rate": 3.5842532961197114e-05, | |
| "loss": 0.1462, | |
| "mean_token_accuracy": 0.9540624096989632, | |
| "step": 415 | |
| }, | |
| { | |
| "epoch": 2.434017595307918, | |
| "grad_norm": 1.7373315854675837, | |
| "learning_rate": 3.582058750589555e-05, | |
| "loss": 0.2119, | |
| "mean_token_accuracy": 0.9377310499548912, | |
| "step": 416 | |
| }, | |
| { | |
| "epoch": 2.439882697947214, | |
| "grad_norm": 1.830130965645857, | |
| "learning_rate": 3.579859188744311e-05, | |
| "loss": 0.2505, | |
| "mean_token_accuracy": 0.9223317727446556, | |
| "step": 417 | |
| }, | |
| { | |
| "epoch": 2.4457478005865103, | |
| "grad_norm": 1.5168358594406, | |
| "learning_rate": 3.5776546185675014e-05, | |
| "loss": 0.1973, | |
| "mean_token_accuracy": 0.9436168894171715, | |
| "step": 418 | |
| }, | |
| { | |
| "epoch": 2.4516129032258065, | |
| "grad_norm": 1.7875580792704173, | |
| "learning_rate": 3.5754450480608244e-05, | |
| "loss": 0.2109, | |
| "mean_token_accuracy": 0.9373271837830544, | |
| "step": 419 | |
| }, | |
| { | |
| "epoch": 2.4574780058651027, | |
| "grad_norm": 1.5883528874075437, | |
| "learning_rate": 3.5732304852441294e-05, | |
| "loss": 0.2147, | |
| "mean_token_accuracy": 0.936000183224678, | |
| "step": 420 | |
| }, | |
| { | |
| "epoch": 2.463343108504399, | |
| "grad_norm": 1.8065099011242889, | |
| "learning_rate": 3.571010938155386e-05, | |
| "loss": 0.2619, | |
| "mean_token_accuracy": 0.9260461702942848, | |
| "step": 421 | |
| }, | |
| { | |
| "epoch": 2.469208211143695, | |
| "grad_norm": 1.6418092211911852, | |
| "learning_rate": 3.5687864148506515e-05, | |
| "loss": 0.1867, | |
| "mean_token_accuracy": 0.9402009174227715, | |
| "step": 422 | |
| }, | |
| { | |
| "epoch": 2.4750733137829912, | |
| "grad_norm": 1.381906338016382, | |
| "learning_rate": 3.566556923404048e-05, | |
| "loss": 0.1505, | |
| "mean_token_accuracy": 0.955479122698307, | |
| "step": 423 | |
| }, | |
| { | |
| "epoch": 2.4809384164222874, | |
| "grad_norm": 1.213287645392281, | |
| "learning_rate": 3.5643224719077294e-05, | |
| "loss": 0.1565, | |
| "mean_token_accuracy": 0.9560166001319885, | |
| "step": 424 | |
| }, | |
| { | |
| "epoch": 2.4868035190615836, | |
| "grad_norm": 1.375404431939761, | |
| "learning_rate": 3.5620830684718515e-05, | |
| "loss": 0.1706, | |
| "mean_token_accuracy": 0.9489535689353943, | |
| "step": 425 | |
| }, | |
| { | |
| "epoch": 2.4926686217008798, | |
| "grad_norm": 1.2465742978161383, | |
| "learning_rate": 3.5598387212245456e-05, | |
| "loss": 0.1606, | |
| "mean_token_accuracy": 0.9529033079743385, | |
| "step": 426 | |
| }, | |
| { | |
| "epoch": 2.498533724340176, | |
| "grad_norm": 1.4045133580111415, | |
| "learning_rate": 3.5575894383118846e-05, | |
| "loss": 0.1873, | |
| "mean_token_accuracy": 0.9499105364084244, | |
| "step": 427 | |
| }, | |
| { | |
| "epoch": 2.504398826979472, | |
| "grad_norm": 1.372490625148224, | |
| "learning_rate": 3.5553352278978574e-05, | |
| "loss": 0.1816, | |
| "mean_token_accuracy": 0.9412456303834915, | |
| "step": 428 | |
| }, | |
| { | |
| "epoch": 2.5102639296187683, | |
| "grad_norm": 1.5354559639804668, | |
| "learning_rate": 3.553076098164337e-05, | |
| "loss": 0.1774, | |
| "mean_token_accuracy": 0.9477965086698532, | |
| "step": 429 | |
| }, | |
| { | |
| "epoch": 2.5161290322580645, | |
| "grad_norm": 1.4778203902809248, | |
| "learning_rate": 3.5508120573110516e-05, | |
| "loss": 0.1958, | |
| "mean_token_accuracy": 0.9413925185799599, | |
| "step": 430 | |
| }, | |
| { | |
| "epoch": 2.5219941348973607, | |
| "grad_norm": 1.2514914338303438, | |
| "learning_rate": 3.548543113555557e-05, | |
| "loss": 0.1433, | |
| "mean_token_accuracy": 0.9577402174472809, | |
| "step": 431 | |
| }, | |
| { | |
| "epoch": 2.527859237536657, | |
| "grad_norm": 1.3305108203635878, | |
| "learning_rate": 3.5462692751332014e-05, | |
| "loss": 0.1778, | |
| "mean_token_accuracy": 0.9476026743650436, | |
| "step": 432 | |
| }, | |
| { | |
| "epoch": 2.533724340175953, | |
| "grad_norm": 1.1742572503020614, | |
| "learning_rate": 3.5439905502970996e-05, | |
| "loss": 0.1366, | |
| "mean_token_accuracy": 0.9584061652421951, | |
| "step": 433 | |
| }, | |
| { | |
| "epoch": 2.5395894428152492, | |
| "grad_norm": 1.230197597142779, | |
| "learning_rate": 3.541706947318103e-05, | |
| "loss": 0.157, | |
| "mean_token_accuracy": 0.9515904188156128, | |
| "step": 434 | |
| }, | |
| { | |
| "epoch": 2.5454545454545454, | |
| "grad_norm": 1.5660487728940458, | |
| "learning_rate": 3.539418474484768e-05, | |
| "loss": 0.2105, | |
| "mean_token_accuracy": 0.9401791095733643, | |
| "step": 435 | |
| }, | |
| { | |
| "epoch": 2.5513196480938416, | |
| "grad_norm": 1.553521419511032, | |
| "learning_rate": 3.537125140103327e-05, | |
| "loss": 0.1949, | |
| "mean_token_accuracy": 0.9433295205235481, | |
| "step": 436 | |
| }, | |
| { | |
| "epoch": 2.557184750733138, | |
| "grad_norm": 1.2711027329016076, | |
| "learning_rate": 3.534826952497657e-05, | |
| "loss": 0.1594, | |
| "mean_token_accuracy": 0.9537053257226944, | |
| "step": 437 | |
| }, | |
| { | |
| "epoch": 2.563049853372434, | |
| "grad_norm": 1.5314862483881024, | |
| "learning_rate": 3.5325239200092505e-05, | |
| "loss": 0.1896, | |
| "mean_token_accuracy": 0.9418998286128044, | |
| "step": 438 | |
| }, | |
| { | |
| "epoch": 2.56891495601173, | |
| "grad_norm": 1.5637934300785588, | |
| "learning_rate": 3.5302160509971866e-05, | |
| "loss": 0.2203, | |
| "mean_token_accuracy": 0.9355995953083038, | |
| "step": 439 | |
| }, | |
| { | |
| "epoch": 2.5747800586510263, | |
| "grad_norm": 1.4714637306784724, | |
| "learning_rate": 3.5279033538380974e-05, | |
| "loss": 0.1911, | |
| "mean_token_accuracy": 0.9404008537530899, | |
| "step": 440 | |
| }, | |
| { | |
| "epoch": 2.5806451612903225, | |
| "grad_norm": 1.2122932740008598, | |
| "learning_rate": 3.5255858369261385e-05, | |
| "loss": 0.1289, | |
| "mean_token_accuracy": 0.9619543105363846, | |
| "step": 441 | |
| }, | |
| { | |
| "epoch": 2.5865102639296187, | |
| "grad_norm": 1.505105430019669, | |
| "learning_rate": 3.523263508672961e-05, | |
| "loss": 0.203, | |
| "mean_token_accuracy": 0.939716748893261, | |
| "step": 442 | |
| }, | |
| { | |
| "epoch": 2.592375366568915, | |
| "grad_norm": 1.4166574253102397, | |
| "learning_rate": 3.520936377507679e-05, | |
| "loss": 0.1776, | |
| "mean_token_accuracy": 0.9428360909223557, | |
| "step": 443 | |
| }, | |
| { | |
| "epoch": 2.598240469208211, | |
| "grad_norm": 1.6648203766423546, | |
| "learning_rate": 3.5186044518768376e-05, | |
| "loss": 0.2281, | |
| "mean_token_accuracy": 0.9257840067148209, | |
| "step": 444 | |
| }, | |
| { | |
| "epoch": 2.6041055718475072, | |
| "grad_norm": 1.5336110988003941, | |
| "learning_rate": 3.5162677402443864e-05, | |
| "loss": 0.1976, | |
| "mean_token_accuracy": 0.938742958009243, | |
| "step": 445 | |
| }, | |
| { | |
| "epoch": 2.6099706744868034, | |
| "grad_norm": 1.4379546067534612, | |
| "learning_rate": 3.513926251091644e-05, | |
| "loss": 0.1971, | |
| "mean_token_accuracy": 0.9414190128445625, | |
| "step": 446 | |
| }, | |
| { | |
| "epoch": 2.6158357771260996, | |
| "grad_norm": 1.339537757201692, | |
| "learning_rate": 3.51157999291727e-05, | |
| "loss": 0.1814, | |
| "mean_token_accuracy": 0.9400840774178505, | |
| "step": 447 | |
| }, | |
| { | |
| "epoch": 2.621700879765396, | |
| "grad_norm": 1.9285181710925006, | |
| "learning_rate": 3.509228974237235e-05, | |
| "loss": 0.2581, | |
| "mean_token_accuracy": 0.9255566149950027, | |
| "step": 448 | |
| }, | |
| { | |
| "epoch": 2.627565982404692, | |
| "grad_norm": 1.424766252886363, | |
| "learning_rate": 3.506873203584787e-05, | |
| "loss": 0.1849, | |
| "mean_token_accuracy": 0.9422862157225609, | |
| "step": 449 | |
| }, | |
| { | |
| "epoch": 2.633431085043988, | |
| "grad_norm": 1.203628816210816, | |
| "learning_rate": 3.504512689510422e-05, | |
| "loss": 0.1477, | |
| "mean_token_accuracy": 0.9604510739445686, | |
| "step": 450 | |
| }, | |
| { | |
| "epoch": 2.6392961876832843, | |
| "grad_norm": 1.3126311664229962, | |
| "learning_rate": 3.5021474405818525e-05, | |
| "loss": 0.1648, | |
| "mean_token_accuracy": 0.9464056417346001, | |
| "step": 451 | |
| }, | |
| { | |
| "epoch": 2.6451612903225805, | |
| "grad_norm": 1.5683417522165155, | |
| "learning_rate": 3.499777465383977e-05, | |
| "loss": 0.2085, | |
| "mean_token_accuracy": 0.9423654973506927, | |
| "step": 452 | |
| }, | |
| { | |
| "epoch": 2.6510263929618767, | |
| "grad_norm": 1.7124391179299574, | |
| "learning_rate": 3.497402772518848e-05, | |
| "loss": 0.2197, | |
| "mean_token_accuracy": 0.9286736026406288, | |
| "step": 453 | |
| }, | |
| { | |
| "epoch": 2.656891495601173, | |
| "grad_norm": 1.2928522615635958, | |
| "learning_rate": 3.4950233706056415e-05, | |
| "loss": 0.1646, | |
| "mean_token_accuracy": 0.9472004845738411, | |
| "step": 454 | |
| }, | |
| { | |
| "epoch": 2.662756598240469, | |
| "grad_norm": 1.6263040380796228, | |
| "learning_rate": 3.4926392682806265e-05, | |
| "loss": 0.2055, | |
| "mean_token_accuracy": 0.9389370456337929, | |
| "step": 455 | |
| }, | |
| { | |
| "epoch": 2.6686217008797652, | |
| "grad_norm": 1.5520555225769765, | |
| "learning_rate": 3.490250474197131e-05, | |
| "loss": 0.1979, | |
| "mean_token_accuracy": 0.9429981112480164, | |
| "step": 456 | |
| }, | |
| { | |
| "epoch": 2.6744868035190614, | |
| "grad_norm": 1.520377373848019, | |
| "learning_rate": 3.4878569970255116e-05, | |
| "loss": 0.1817, | |
| "mean_token_accuracy": 0.9455464854836464, | |
| "step": 457 | |
| }, | |
| { | |
| "epoch": 2.6803519061583576, | |
| "grad_norm": 1.6159409423049518, | |
| "learning_rate": 3.485458845453125e-05, | |
| "loss": 0.1919, | |
| "mean_token_accuracy": 0.9442776069045067, | |
| "step": 458 | |
| }, | |
| { | |
| "epoch": 2.686217008797654, | |
| "grad_norm": 1.536515683927506, | |
| "learning_rate": 3.483056028184293e-05, | |
| "loss": 0.1514, | |
| "mean_token_accuracy": 0.9531370028853416, | |
| "step": 459 | |
| }, | |
| { | |
| "epoch": 2.6920821114369504, | |
| "grad_norm": 1.5998217769260568, | |
| "learning_rate": 3.4806485539402716e-05, | |
| "loss": 0.1886, | |
| "mean_token_accuracy": 0.940488263964653, | |
| "step": 460 | |
| }, | |
| { | |
| "epoch": 2.6979472140762466, | |
| "grad_norm": 1.2681668634545191, | |
| "learning_rate": 3.4782364314592186e-05, | |
| "loss": 0.1491, | |
| "mean_token_accuracy": 0.956335611641407, | |
| "step": 461 | |
| }, | |
| { | |
| "epoch": 2.703812316715543, | |
| "grad_norm": 1.618911195594093, | |
| "learning_rate": 3.475819669496167e-05, | |
| "loss": 0.1757, | |
| "mean_token_accuracy": 0.9452315121889114, | |
| "step": 462 | |
| }, | |
| { | |
| "epoch": 2.709677419354839, | |
| "grad_norm": 1.356819620967187, | |
| "learning_rate": 3.473398276822985e-05, | |
| "loss": 0.1736, | |
| "mean_token_accuracy": 0.9476948976516724, | |
| "step": 463 | |
| }, | |
| { | |
| "epoch": 2.715542521994135, | |
| "grad_norm": 1.6105950634922812, | |
| "learning_rate": 3.47097226222835e-05, | |
| "loss": 0.2048, | |
| "mean_token_accuracy": 0.9410142675042152, | |
| "step": 464 | |
| }, | |
| { | |
| "epoch": 2.7214076246334313, | |
| "grad_norm": 1.441168567633283, | |
| "learning_rate": 3.468541634517716e-05, | |
| "loss": 0.1733, | |
| "mean_token_accuracy": 0.9470472857356071, | |
| "step": 465 | |
| }, | |
| { | |
| "epoch": 2.7272727272727275, | |
| "grad_norm": 1.219154125397772, | |
| "learning_rate": 3.4661064025132796e-05, | |
| "loss": 0.1407, | |
| "mean_token_accuracy": 0.9544103369116783, | |
| "step": 466 | |
| }, | |
| { | |
| "epoch": 2.7331378299120237, | |
| "grad_norm": 1.735956549135718, | |
| "learning_rate": 3.463666575053949e-05, | |
| "loss": 0.2209, | |
| "mean_token_accuracy": 0.9372992217540741, | |
| "step": 467 | |
| }, | |
| { | |
| "epoch": 2.73900293255132, | |
| "grad_norm": 1.0723058902393905, | |
| "learning_rate": 3.4612221609953126e-05, | |
| "loss": 0.1465, | |
| "mean_token_accuracy": 0.9592811986804008, | |
| "step": 468 | |
| }, | |
| { | |
| "epoch": 2.744868035190616, | |
| "grad_norm": 1.266106561576398, | |
| "learning_rate": 3.4587731692096065e-05, | |
| "loss": 0.1616, | |
| "mean_token_accuracy": 0.9548813477158546, | |
| "step": 469 | |
| }, | |
| { | |
| "epoch": 2.7507331378299122, | |
| "grad_norm": 1.5709556980717925, | |
| "learning_rate": 3.4563196085856815e-05, | |
| "loss": 0.2231, | |
| "mean_token_accuracy": 0.9351802319288254, | |
| "step": 470 | |
| }, | |
| { | |
| "epoch": 2.7565982404692084, | |
| "grad_norm": 1.6033382298160437, | |
| "learning_rate": 3.4538614880289724e-05, | |
| "loss": 0.2066, | |
| "mean_token_accuracy": 0.9434246122837067, | |
| "step": 471 | |
| }, | |
| { | |
| "epoch": 2.7624633431085046, | |
| "grad_norm": 1.2791754585446349, | |
| "learning_rate": 3.4513988164614635e-05, | |
| "loss": 0.1566, | |
| "mean_token_accuracy": 0.9580570235848427, | |
| "step": 472 | |
| }, | |
| { | |
| "epoch": 2.768328445747801, | |
| "grad_norm": 1.1734761099061624, | |
| "learning_rate": 3.4489316028216584e-05, | |
| "loss": 0.1465, | |
| "mean_token_accuracy": 0.9552415683865547, | |
| "step": 473 | |
| }, | |
| { | |
| "epoch": 2.774193548387097, | |
| "grad_norm": 1.2022871722577544, | |
| "learning_rate": 3.446459856064545e-05, | |
| "loss": 0.1517, | |
| "mean_token_accuracy": 0.9557396098971367, | |
| "step": 474 | |
| }, | |
| { | |
| "epoch": 2.780058651026393, | |
| "grad_norm": 1.6434058067923774, | |
| "learning_rate": 3.443983585161568e-05, | |
| "loss": 0.2084, | |
| "mean_token_accuracy": 0.9364630356431007, | |
| "step": 475 | |
| }, | |
| { | |
| "epoch": 2.7859237536656893, | |
| "grad_norm": 1.2827362600117416, | |
| "learning_rate": 3.441502799100588e-05, | |
| "loss": 0.1602, | |
| "mean_token_accuracy": 0.960084393620491, | |
| "step": 476 | |
| }, | |
| { | |
| "epoch": 2.7917888563049855, | |
| "grad_norm": 1.4811407319146348, | |
| "learning_rate": 3.439017506885858e-05, | |
| "loss": 0.1867, | |
| "mean_token_accuracy": 0.948580302298069, | |
| "step": 477 | |
| }, | |
| { | |
| "epoch": 2.7976539589442817, | |
| "grad_norm": 1.4737225017755107, | |
| "learning_rate": 3.436527717537985e-05, | |
| "loss": 0.1942, | |
| "mean_token_accuracy": 0.9491779133677483, | |
| "step": 478 | |
| }, | |
| { | |
| "epoch": 2.803519061583578, | |
| "grad_norm": 1.4212779057861695, | |
| "learning_rate": 3.434033440093899e-05, | |
| "loss": 0.1898, | |
| "mean_token_accuracy": 0.9374239072203636, | |
| "step": 479 | |
| }, | |
| { | |
| "epoch": 2.809384164222874, | |
| "grad_norm": 1.4834687519316923, | |
| "learning_rate": 3.431534683606818e-05, | |
| "loss": 0.2084, | |
| "mean_token_accuracy": 0.9470439925789833, | |
| "step": 480 | |
| }, | |
| { | |
| "epoch": 2.8152492668621703, | |
| "grad_norm": 1.3536053573095326, | |
| "learning_rate": 3.4290314571462214e-05, | |
| "loss": 0.1664, | |
| "mean_token_accuracy": 0.9505494609475136, | |
| "step": 481 | |
| }, | |
| { | |
| "epoch": 2.8211143695014664, | |
| "grad_norm": 1.4654358825691278, | |
| "learning_rate": 3.426523769797808e-05, | |
| "loss": 0.1719, | |
| "mean_token_accuracy": 0.9511361643671989, | |
| "step": 482 | |
| }, | |
| { | |
| "epoch": 2.8269794721407626, | |
| "grad_norm": 1.534865173849534, | |
| "learning_rate": 3.424011630663472e-05, | |
| "loss": 0.2035, | |
| "mean_token_accuracy": 0.9370648711919785, | |
| "step": 483 | |
| }, | |
| { | |
| "epoch": 2.832844574780059, | |
| "grad_norm": 1.5975723094077148, | |
| "learning_rate": 3.421495048861262e-05, | |
| "loss": 0.1841, | |
| "mean_token_accuracy": 0.9466887265443802, | |
| "step": 484 | |
| }, | |
| { | |
| "epoch": 2.838709677419355, | |
| "grad_norm": 1.380939441601992, | |
| "learning_rate": 3.418974033525355e-05, | |
| "loss": 0.1764, | |
| "mean_token_accuracy": 0.9512438848614693, | |
| "step": 485 | |
| }, | |
| { | |
| "epoch": 2.844574780058651, | |
| "grad_norm": 1.44638081838838, | |
| "learning_rate": 3.416448593806019e-05, | |
| "loss": 0.2103, | |
| "mean_token_accuracy": 0.9422366619110107, | |
| "step": 486 | |
| }, | |
| { | |
| "epoch": 2.8504398826979473, | |
| "grad_norm": 1.3027147165778983, | |
| "learning_rate": 3.4139187388695774e-05, | |
| "loss": 0.1751, | |
| "mean_token_accuracy": 0.9415677487850189, | |
| "step": 487 | |
| }, | |
| { | |
| "epoch": 2.8563049853372435, | |
| "grad_norm": 1.442286918678071, | |
| "learning_rate": 3.411384477898385e-05, | |
| "loss": 0.1784, | |
| "mean_token_accuracy": 0.9486387521028519, | |
| "step": 488 | |
| }, | |
| { | |
| "epoch": 2.8621700879765397, | |
| "grad_norm": 1.1864310979576265, | |
| "learning_rate": 3.408845820090784e-05, | |
| "loss": 0.1641, | |
| "mean_token_accuracy": 0.9475547969341278, | |
| "step": 489 | |
| }, | |
| { | |
| "epoch": 2.868035190615836, | |
| "grad_norm": 1.5938391452894787, | |
| "learning_rate": 3.406302774661077e-05, | |
| "loss": 0.2135, | |
| "mean_token_accuracy": 0.9362959042191505, | |
| "step": 490 | |
| }, | |
| { | |
| "epoch": 2.873900293255132, | |
| "grad_norm": 1.8187898306480366, | |
| "learning_rate": 3.403755350839492e-05, | |
| "loss": 0.2208, | |
| "mean_token_accuracy": 0.9381490647792816, | |
| "step": 491 | |
| }, | |
| { | |
| "epoch": 2.8797653958944283, | |
| "grad_norm": 1.0176203937436277, | |
| "learning_rate": 3.401203557872149e-05, | |
| "loss": 0.131, | |
| "mean_token_accuracy": 0.963982343673706, | |
| "step": 492 | |
| }, | |
| { | |
| "epoch": 2.8856304985337244, | |
| "grad_norm": 1.2944881554086942, | |
| "learning_rate": 3.398647405021026e-05, | |
| "loss": 0.1791, | |
| "mean_token_accuracy": 0.9504873305559158, | |
| "step": 493 | |
| }, | |
| { | |
| "epoch": 2.8914956011730206, | |
| "grad_norm": 1.7979282502116185, | |
| "learning_rate": 3.396086901563925e-05, | |
| "loss": 0.2224, | |
| "mean_token_accuracy": 0.9352485239505768, | |
| "step": 494 | |
| }, | |
| { | |
| "epoch": 2.897360703812317, | |
| "grad_norm": 1.1750005366326866, | |
| "learning_rate": 3.3935220567944395e-05, | |
| "loss": 0.1545, | |
| "mean_token_accuracy": 0.950984425842762, | |
| "step": 495 | |
| }, | |
| { | |
| "epoch": 2.903225806451613, | |
| "grad_norm": 1.5949219047411234, | |
| "learning_rate": 3.39095288002192e-05, | |
| "loss": 0.2162, | |
| "mean_token_accuracy": 0.9337630718946457, | |
| "step": 496 | |
| }, | |
| { | |
| "epoch": 2.909090909090909, | |
| "grad_norm": 1.3433147451555392, | |
| "learning_rate": 3.3883793805714406e-05, | |
| "loss": 0.1659, | |
| "mean_token_accuracy": 0.9481701776385307, | |
| "step": 497 | |
| }, | |
| { | |
| "epoch": 2.9149560117302054, | |
| "grad_norm": 1.5996494396127081, | |
| "learning_rate": 3.3858015677837656e-05, | |
| "loss": 0.2149, | |
| "mean_token_accuracy": 0.9419268667697906, | |
| "step": 498 | |
| }, | |
| { | |
| "epoch": 2.9208211143695015, | |
| "grad_norm": 1.3050500946865204, | |
| "learning_rate": 3.3832194510153126e-05, | |
| "loss": 0.1924, | |
| "mean_token_accuracy": 0.9457806721329689, | |
| "step": 499 | |
| }, | |
| { | |
| "epoch": 2.9266862170087977, | |
| "grad_norm": 1.3938033327743358, | |
| "learning_rate": 3.380633039638125e-05, | |
| "loss": 0.1871, | |
| "mean_token_accuracy": 0.9511330723762512, | |
| "step": 500 | |
| }, | |
| { | |
| "epoch": 2.932551319648094, | |
| "grad_norm": 1.7111259960860905, | |
| "learning_rate": 3.37804234303983e-05, | |
| "loss": 0.2148, | |
| "mean_token_accuracy": 0.9375216588377953, | |
| "step": 501 | |
| }, | |
| { | |
| "epoch": 2.93841642228739, | |
| "grad_norm": 1.5676984126616746, | |
| "learning_rate": 3.37544737062361e-05, | |
| "loss": 0.2069, | |
| "mean_token_accuracy": 0.9395978674292564, | |
| "step": 502 | |
| }, | |
| { | |
| "epoch": 2.9442815249266863, | |
| "grad_norm": 1.3024069551411093, | |
| "learning_rate": 3.372848131808167e-05, | |
| "loss": 0.1775, | |
| "mean_token_accuracy": 0.94802475720644, | |
| "step": 503 | |
| }, | |
| { | |
| "epoch": 2.9501466275659824, | |
| "grad_norm": 1.6781903083104455, | |
| "learning_rate": 3.370244636027688e-05, | |
| "loss": 0.1932, | |
| "mean_token_accuracy": 0.9439413473010063, | |
| "step": 504 | |
| }, | |
| { | |
| "epoch": 2.9560117302052786, | |
| "grad_norm": 1.2084010552335631, | |
| "learning_rate": 3.367636892731812e-05, | |
| "loss": 0.1705, | |
| "mean_token_accuracy": 0.9432723671197891, | |
| "step": 505 | |
| }, | |
| { | |
| "epoch": 2.961876832844575, | |
| "grad_norm": 1.4876928960475226, | |
| "learning_rate": 3.365024911385593e-05, | |
| "loss": 0.1669, | |
| "mean_token_accuracy": 0.9567491114139557, | |
| "step": 506 | |
| }, | |
| { | |
| "epoch": 2.967741935483871, | |
| "grad_norm": 1.4129569433402656, | |
| "learning_rate": 3.362408701469469e-05, | |
| "loss": 0.1779, | |
| "mean_token_accuracy": 0.9438919052481651, | |
| "step": 507 | |
| }, | |
| { | |
| "epoch": 2.973607038123167, | |
| "grad_norm": 1.4008768614077696, | |
| "learning_rate": 3.359788272479225e-05, | |
| "loss": 0.1988, | |
| "mean_token_accuracy": 0.9402952864766121, | |
| "step": 508 | |
| }, | |
| { | |
| "epoch": 2.9794721407624634, | |
| "grad_norm": 1.2490328336150844, | |
| "learning_rate": 3.35716363392596e-05, | |
| "loss": 0.1846, | |
| "mean_token_accuracy": 0.9396477043628693, | |
| "step": 509 | |
| }, | |
| { | |
| "epoch": 2.9853372434017595, | |
| "grad_norm": 1.8703998203747494, | |
| "learning_rate": 3.354534795336052e-05, | |
| "loss": 0.267, | |
| "mean_token_accuracy": 0.9222179055213928, | |
| "step": 510 | |
| }, | |
| { | |
| "epoch": 2.9912023460410557, | |
| "grad_norm": 1.3688615608986425, | |
| "learning_rate": 3.351901766251123e-05, | |
| "loss": 0.189, | |
| "mean_token_accuracy": 0.9435115680098534, | |
| "step": 511 | |
| }, | |
| { | |
| "epoch": 2.997067448680352, | |
| "grad_norm": 1.7584850077090424, | |
| "learning_rate": 3.349264556228006e-05, | |
| "loss": 0.2295, | |
| "mean_token_accuracy": 0.9362204223871231, | |
| "step": 512 | |
| }, | |
| { | |
| "epoch": 3.0, | |
| "grad_norm": 1.7584850077090424, | |
| "learning_rate": 3.3466231748387077e-05, | |
| "loss": 0.2552, | |
| "mean_token_accuracy": 0.9123900383710861, | |
| "step": 513 | |
| }, | |
| { | |
| "epoch": 3.005865102639296, | |
| "grad_norm": 2.1995675752841906, | |
| "learning_rate": 3.343977631670376e-05, | |
| "loss": 0.1173, | |
| "mean_token_accuracy": 0.9707249626517296, | |
| "step": 514 | |
| }, | |
| { | |
| "epoch": 3.0117302052785924, | |
| "grad_norm": 1.1331312828942772, | |
| "learning_rate": 3.341327936325264e-05, | |
| "loss": 0.146, | |
| "mean_token_accuracy": 0.9599213600158691, | |
| "step": 515 | |
| }, | |
| { | |
| "epoch": 3.0175953079178885, | |
| "grad_norm": 1.066284672763136, | |
| "learning_rate": 3.338674098420695e-05, | |
| "loss": 0.1109, | |
| "mean_token_accuracy": 0.965148963034153, | |
| "step": 516 | |
| }, | |
| { | |
| "epoch": 3.0234604105571847, | |
| "grad_norm": 1.0074944736951232, | |
| "learning_rate": 3.33601612758903e-05, | |
| "loss": 0.1283, | |
| "mean_token_accuracy": 0.9635849222540855, | |
| "step": 517 | |
| }, | |
| { | |
| "epoch": 3.029325513196481, | |
| "grad_norm": 1.2258415386092127, | |
| "learning_rate": 3.3333540334776286e-05, | |
| "loss": 0.1329, | |
| "mean_token_accuracy": 0.9582738950848579, | |
| "step": 518 | |
| }, | |
| { | |
| "epoch": 3.035190615835777, | |
| "grad_norm": 1.0778695406875511, | |
| "learning_rate": 3.330687825748818e-05, | |
| "loss": 0.118, | |
| "mean_token_accuracy": 0.966682031750679, | |
| "step": 519 | |
| }, | |
| { | |
| "epoch": 3.0410557184750733, | |
| "grad_norm": 1.0364880624857191, | |
| "learning_rate": 3.328017514079855e-05, | |
| "loss": 0.1216, | |
| "mean_token_accuracy": 0.9677037075161934, | |
| "step": 520 | |
| }, | |
| { | |
| "epoch": 3.0469208211143695, | |
| "grad_norm": 1.1236650847876453, | |
| "learning_rate": 3.325343108162893e-05, | |
| "loss": 0.1116, | |
| "mean_token_accuracy": 0.9645716473460197, | |
| "step": 521 | |
| }, | |
| { | |
| "epoch": 3.0527859237536656, | |
| "grad_norm": 1.1551918085779922, | |
| "learning_rate": 3.3226646177049446e-05, | |
| "loss": 0.1425, | |
| "mean_token_accuracy": 0.963393472135067, | |
| "step": 522 | |
| }, | |
| { | |
| "epoch": 3.058651026392962, | |
| "grad_norm": 1.2648329188721996, | |
| "learning_rate": 3.3199820524278485e-05, | |
| "loss": 0.1299, | |
| "mean_token_accuracy": 0.9622451663017273, | |
| "step": 523 | |
| }, | |
| { | |
| "epoch": 3.064516129032258, | |
| "grad_norm": 1.325697584264539, | |
| "learning_rate": 3.317295422068234e-05, | |
| "loss": 0.1263, | |
| "mean_token_accuracy": 0.9606219977140427, | |
| "step": 524 | |
| }, | |
| { | |
| "epoch": 3.070381231671554, | |
| "grad_norm": 1.130741369640608, | |
| "learning_rate": 3.314604736377484e-05, | |
| "loss": 0.0992, | |
| "mean_token_accuracy": 0.9693296328186989, | |
| "step": 525 | |
| }, | |
| { | |
| "epoch": 3.0762463343108504, | |
| "grad_norm": 0.8955946893420774, | |
| "learning_rate": 3.3119100051217005e-05, | |
| "loss": 0.0895, | |
| "mean_token_accuracy": 0.9741750881075859, | |
| "step": 526 | |
| }, | |
| { | |
| "epoch": 3.0821114369501466, | |
| "grad_norm": 1.0121677552730586, | |
| "learning_rate": 3.3092112380816696e-05, | |
| "loss": 0.1324, | |
| "mean_token_accuracy": 0.9608965739607811, | |
| "step": 527 | |
| }, | |
| { | |
| "epoch": 3.0879765395894427, | |
| "grad_norm": 1.060519939355375, | |
| "learning_rate": 3.306508445052826e-05, | |
| "loss": 0.1335, | |
| "mean_token_accuracy": 0.9608549624681473, | |
| "step": 528 | |
| }, | |
| { | |
| "epoch": 3.093841642228739, | |
| "grad_norm": 1.3572690475788862, | |
| "learning_rate": 3.303801635845216e-05, | |
| "loss": 0.1198, | |
| "mean_token_accuracy": 0.9593236073851585, | |
| "step": 529 | |
| }, | |
| { | |
| "epoch": 3.099706744868035, | |
| "grad_norm": 1.1464512354729663, | |
| "learning_rate": 3.301090820283465e-05, | |
| "loss": 0.1367, | |
| "mean_token_accuracy": 0.9591869786381721, | |
| "step": 530 | |
| }, | |
| { | |
| "epoch": 3.1055718475073313, | |
| "grad_norm": 1.2823035767765665, | |
| "learning_rate": 3.298376008206739e-05, | |
| "loss": 0.113, | |
| "mean_token_accuracy": 0.9657314345240593, | |
| "step": 531 | |
| }, | |
| { | |
| "epoch": 3.1114369501466275, | |
| "grad_norm": 1.0403767904872765, | |
| "learning_rate": 3.295657209468707e-05, | |
| "loss": 0.112, | |
| "mean_token_accuracy": 0.9685370773077011, | |
| "step": 532 | |
| }, | |
| { | |
| "epoch": 3.1173020527859236, | |
| "grad_norm": 1.15498558857311, | |
| "learning_rate": 3.2929344339375125e-05, | |
| "loss": 0.159, | |
| "mean_token_accuracy": 0.9546686410903931, | |
| "step": 533 | |
| }, | |
| { | |
| "epoch": 3.12316715542522, | |
| "grad_norm": 1.3853285053067845, | |
| "learning_rate": 3.290207691495731e-05, | |
| "loss": 0.1386, | |
| "mean_token_accuracy": 0.9635892808437347, | |
| "step": 534 | |
| }, | |
| { | |
| "epoch": 3.129032258064516, | |
| "grad_norm": 1.0913823383162526, | |
| "learning_rate": 3.2874769920403355e-05, | |
| "loss": 0.1264, | |
| "mean_token_accuracy": 0.9625230208039284, | |
| "step": 535 | |
| }, | |
| { | |
| "epoch": 3.134897360703812, | |
| "grad_norm": 1.0080636135286674, | |
| "learning_rate": 3.2847423454826616e-05, | |
| "loss": 0.1255, | |
| "mean_token_accuracy": 0.9642705172300339, | |
| "step": 536 | |
| }, | |
| { | |
| "epoch": 3.1407624633431084, | |
| "grad_norm": 1.3384707349035834, | |
| "learning_rate": 3.2820037617483734e-05, | |
| "loss": 0.1438, | |
| "mean_token_accuracy": 0.958034835755825, | |
| "step": 537 | |
| }, | |
| { | |
| "epoch": 3.1466275659824046, | |
| "grad_norm": 1.3200934144993954, | |
| "learning_rate": 3.2792612507774224e-05, | |
| "loss": 0.1273, | |
| "mean_token_accuracy": 0.9666285142302513, | |
| "step": 538 | |
| }, | |
| { | |
| "epoch": 3.1524926686217007, | |
| "grad_norm": 1.2829027100001151, | |
| "learning_rate": 3.2765148225240176e-05, | |
| "loss": 0.1325, | |
| "mean_token_accuracy": 0.9617817550897598, | |
| "step": 539 | |
| }, | |
| { | |
| "epoch": 3.158357771260997, | |
| "grad_norm": 1.2749195398082585, | |
| "learning_rate": 3.273764486956583e-05, | |
| "loss": 0.1396, | |
| "mean_token_accuracy": 0.9613074734807014, | |
| "step": 540 | |
| }, | |
| { | |
| "epoch": 3.164222873900293, | |
| "grad_norm": 1.8832759061462432, | |
| "learning_rate": 3.2710102540577256e-05, | |
| "loss": 0.1359, | |
| "mean_token_accuracy": 0.9631753191351891, | |
| "step": 541 | |
| }, | |
| { | |
| "epoch": 3.1700879765395893, | |
| "grad_norm": 1.3360701106894814, | |
| "learning_rate": 3.268252133824198e-05, | |
| "loss": 0.1511, | |
| "mean_token_accuracy": 0.9591165855526924, | |
| "step": 542 | |
| }, | |
| { | |
| "epoch": 3.1759530791788855, | |
| "grad_norm": 1.0402537478170215, | |
| "learning_rate": 3.2654901362668656e-05, | |
| "loss": 0.1131, | |
| "mean_token_accuracy": 0.9691095277667046, | |
| "step": 543 | |
| }, | |
| { | |
| "epoch": 3.1818181818181817, | |
| "grad_norm": 0.9125849553677988, | |
| "learning_rate": 3.262724271410661e-05, | |
| "loss": 0.1236, | |
| "mean_token_accuracy": 0.9633389338850975, | |
| "step": 544 | |
| }, | |
| { | |
| "epoch": 3.187683284457478, | |
| "grad_norm": 1.3682014683553543, | |
| "learning_rate": 3.2599545492945584e-05, | |
| "loss": 0.1454, | |
| "mean_token_accuracy": 0.9624962136149406, | |
| "step": 545 | |
| }, | |
| { | |
| "epoch": 3.193548387096774, | |
| "grad_norm": 1.3325018585193886, | |
| "learning_rate": 3.257180979971529e-05, | |
| "loss": 0.1328, | |
| "mean_token_accuracy": 0.9589158594608307, | |
| "step": 546 | |
| }, | |
| { | |
| "epoch": 3.19941348973607, | |
| "grad_norm": 1.0097361820340343, | |
| "learning_rate": 3.25440357350851e-05, | |
| "loss": 0.144, | |
| "mean_token_accuracy": 0.959138460457325, | |
| "step": 547 | |
| }, | |
| { | |
| "epoch": 3.2052785923753664, | |
| "grad_norm": 1.323820448706562, | |
| "learning_rate": 3.251622339986366e-05, | |
| "loss": 0.1345, | |
| "mean_token_accuracy": 0.9602171406149864, | |
| "step": 548 | |
| }, | |
| { | |
| "epoch": 3.2111436950146626, | |
| "grad_norm": 1.2388531251498671, | |
| "learning_rate": 3.24883728949985e-05, | |
| "loss": 0.1327, | |
| "mean_token_accuracy": 0.9595544189214706, | |
| "step": 549 | |
| }, | |
| { | |
| "epoch": 3.2170087976539588, | |
| "grad_norm": 1.0310891892624763, | |
| "learning_rate": 3.2460484321575714e-05, | |
| "loss": 0.1168, | |
| "mean_token_accuracy": 0.9652495458722115, | |
| "step": 550 | |
| }, | |
| { | |
| "epoch": 3.222873900293255, | |
| "grad_norm": 1.4715352145558056, | |
| "learning_rate": 3.2432557780819556e-05, | |
| "loss": 0.1132, | |
| "mean_token_accuracy": 0.9686658978462219, | |
| "step": 551 | |
| }, | |
| { | |
| "epoch": 3.228739002932551, | |
| "grad_norm": 1.1419372965685666, | |
| "learning_rate": 3.240459337409209e-05, | |
| "loss": 0.1374, | |
| "mean_token_accuracy": 0.9581699892878532, | |
| "step": 552 | |
| }, | |
| { | |
| "epoch": 3.2346041055718473, | |
| "grad_norm": 1.065263225106119, | |
| "learning_rate": 3.237659120289282e-05, | |
| "loss": 0.1221, | |
| "mean_token_accuracy": 0.9630059227347374, | |
| "step": 553 | |
| }, | |
| { | |
| "epoch": 3.2404692082111435, | |
| "grad_norm": 1.4131885769796437, | |
| "learning_rate": 3.2348551368858315e-05, | |
| "loss": 0.1315, | |
| "mean_token_accuracy": 0.9611981958150864, | |
| "step": 554 | |
| }, | |
| { | |
| "epoch": 3.2463343108504397, | |
| "grad_norm": 1.1809907629919947, | |
| "learning_rate": 3.2320473973761845e-05, | |
| "loss": 0.1451, | |
| "mean_token_accuracy": 0.9615741968154907, | |
| "step": 555 | |
| }, | |
| { | |
| "epoch": 3.252199413489736, | |
| "grad_norm": 1.2393516472369321, | |
| "learning_rate": 3.229235911951303e-05, | |
| "loss": 0.1331, | |
| "mean_token_accuracy": 0.9634448885917664, | |
| "step": 556 | |
| }, | |
| { | |
| "epoch": 3.258064516129032, | |
| "grad_norm": 1.172077426283371, | |
| "learning_rate": 3.2264206908157425e-05, | |
| "loss": 0.1116, | |
| "mean_token_accuracy": 0.9665073528885841, | |
| "step": 557 | |
| }, | |
| { | |
| "epoch": 3.263929618768328, | |
| "grad_norm": 0.935534799272401, | |
| "learning_rate": 3.2236017441876185e-05, | |
| "loss": 0.133, | |
| "mean_token_accuracy": 0.9626928493380547, | |
| "step": 558 | |
| }, | |
| { | |
| "epoch": 3.2697947214076244, | |
| "grad_norm": 1.2339532266360087, | |
| "learning_rate": 3.220779082298569e-05, | |
| "loss": 0.1448, | |
| "mean_token_accuracy": 0.9604134261608124, | |
| "step": 559 | |
| }, | |
| { | |
| "epoch": 3.2756598240469206, | |
| "grad_norm": 1.329389534253784, | |
| "learning_rate": 3.2179527153937165e-05, | |
| "loss": 0.1369, | |
| "mean_token_accuracy": 0.9576560631394386, | |
| "step": 560 | |
| }, | |
| { | |
| "epoch": 3.281524926686217, | |
| "grad_norm": 1.156112312255903, | |
| "learning_rate": 3.2151226537316315e-05, | |
| "loss": 0.1158, | |
| "mean_token_accuracy": 0.9680624380707741, | |
| "step": 561 | |
| }, | |
| { | |
| "epoch": 3.2873900293255134, | |
| "grad_norm": 1.00465704885852, | |
| "learning_rate": 3.212288907584296e-05, | |
| "loss": 0.1212, | |
| "mean_token_accuracy": 0.9627675563097, | |
| "step": 562 | |
| }, | |
| { | |
| "epoch": 3.2932551319648096, | |
| "grad_norm": 1.3728468957769566, | |
| "learning_rate": 3.209451487237062e-05, | |
| "loss": 0.1641, | |
| "mean_token_accuracy": 0.9527490735054016, | |
| "step": 563 | |
| }, | |
| { | |
| "epoch": 3.2991202346041058, | |
| "grad_norm": 1.4900435856401562, | |
| "learning_rate": 3.206610402988621e-05, | |
| "loss": 0.1214, | |
| "mean_token_accuracy": 0.9644360318779945, | |
| "step": 564 | |
| }, | |
| { | |
| "epoch": 3.304985337243402, | |
| "grad_norm": 1.0411190450621626, | |
| "learning_rate": 3.20376566515096e-05, | |
| "loss": 0.1249, | |
| "mean_token_accuracy": 0.9645769596099854, | |
| "step": 565 | |
| }, | |
| { | |
| "epoch": 3.310850439882698, | |
| "grad_norm": 1.0952909991789344, | |
| "learning_rate": 3.20091728404933e-05, | |
| "loss": 0.1105, | |
| "mean_token_accuracy": 0.9669332653284073, | |
| "step": 566 | |
| }, | |
| { | |
| "epoch": 3.3167155425219943, | |
| "grad_norm": 0.9567829518623286, | |
| "learning_rate": 3.1980652700222024e-05, | |
| "loss": 0.1217, | |
| "mean_token_accuracy": 0.9665136188268661, | |
| "step": 567 | |
| }, | |
| { | |
| "epoch": 3.3225806451612905, | |
| "grad_norm": 1.054550029779028, | |
| "learning_rate": 3.195209633421237e-05, | |
| "loss": 0.1319, | |
| "mean_token_accuracy": 0.959479384124279, | |
| "step": 568 | |
| }, | |
| { | |
| "epoch": 3.3284457478005867, | |
| "grad_norm": 1.3054160354686266, | |
| "learning_rate": 3.192350384611242e-05, | |
| "loss": 0.1488, | |
| "mean_token_accuracy": 0.9539597705006599, | |
| "step": 569 | |
| }, | |
| { | |
| "epoch": 3.334310850439883, | |
| "grad_norm": 1.2721386924556894, | |
| "learning_rate": 3.1894875339701354e-05, | |
| "loss": 0.1147, | |
| "mean_token_accuracy": 0.9713543429970741, | |
| "step": 570 | |
| }, | |
| { | |
| "epoch": 3.340175953079179, | |
| "grad_norm": 1.1485327319472471, | |
| "learning_rate": 3.186621091888909e-05, | |
| "loss": 0.1427, | |
| "mean_token_accuracy": 0.9585296213626862, | |
| "step": 571 | |
| }, | |
| { | |
| "epoch": 3.346041055718475, | |
| "grad_norm": 1.1203261027735305, | |
| "learning_rate": 3.183751068771588e-05, | |
| "loss": 0.1331, | |
| "mean_token_accuracy": 0.9624553993344307, | |
| "step": 572 | |
| }, | |
| { | |
| "epoch": 3.3519061583577714, | |
| "grad_norm": 1.3902351538919793, | |
| "learning_rate": 3.180877475035199e-05, | |
| "loss": 0.1117, | |
| "mean_token_accuracy": 0.9641240537166595, | |
| "step": 573 | |
| }, | |
| { | |
| "epoch": 3.3577712609970676, | |
| "grad_norm": 0.9631311874139538, | |
| "learning_rate": 3.178000321109727e-05, | |
| "loss": 0.155, | |
| "mean_token_accuracy": 0.9597619101405144, | |
| "step": 574 | |
| }, | |
| { | |
| "epoch": 3.3636363636363638, | |
| "grad_norm": 1.1864850108982898, | |
| "learning_rate": 3.175119617438078e-05, | |
| "loss": 0.1393, | |
| "mean_token_accuracy": 0.9603194743394852, | |
| "step": 575 | |
| }, | |
| { | |
| "epoch": 3.36950146627566, | |
| "grad_norm": 1.389013217967289, | |
| "learning_rate": 3.172235374476043e-05, | |
| "loss": 0.1252, | |
| "mean_token_accuracy": 0.9618887901306152, | |
| "step": 576 | |
| }, | |
| { | |
| "epoch": 3.375366568914956, | |
| "grad_norm": 1.1171938023918093, | |
| "learning_rate": 3.169347602692259e-05, | |
| "loss": 0.1534, | |
| "mean_token_accuracy": 0.9561700448393822, | |
| "step": 577 | |
| }, | |
| { | |
| "epoch": 3.3812316715542523, | |
| "grad_norm": 1.7092816964612618, | |
| "learning_rate": 3.166456312568171e-05, | |
| "loss": 0.1325, | |
| "mean_token_accuracy": 0.9581268802285194, | |
| "step": 578 | |
| }, | |
| { | |
| "epoch": 3.3870967741935485, | |
| "grad_norm": 1.625258065533391, | |
| "learning_rate": 3.1635615145979955e-05, | |
| "loss": 0.1511, | |
| "mean_token_accuracy": 0.9582557752728462, | |
| "step": 579 | |
| }, | |
| { | |
| "epoch": 3.3929618768328447, | |
| "grad_norm": 1.1331081762433888, | |
| "learning_rate": 3.160663219288679e-05, | |
| "loss": 0.1215, | |
| "mean_token_accuracy": 0.9631478041410446, | |
| "step": 580 | |
| }, | |
| { | |
| "epoch": 3.398826979472141, | |
| "grad_norm": 1.1978807973332801, | |
| "learning_rate": 3.157761437159863e-05, | |
| "loss": 0.152, | |
| "mean_token_accuracy": 0.9533499404788017, | |
| "step": 581 | |
| }, | |
| { | |
| "epoch": 3.404692082111437, | |
| "grad_norm": 1.4814381302951631, | |
| "learning_rate": 3.1548561787438445e-05, | |
| "loss": 0.1301, | |
| "mean_token_accuracy": 0.9643861651420593, | |
| "step": 582 | |
| }, | |
| { | |
| "epoch": 3.410557184750733, | |
| "grad_norm": 0.8692151588216875, | |
| "learning_rate": 3.15194745458554e-05, | |
| "loss": 0.1218, | |
| "mean_token_accuracy": 0.9649089276790619, | |
| "step": 583 | |
| }, | |
| { | |
| "epoch": 3.4164222873900294, | |
| "grad_norm": 1.0681213749638105, | |
| "learning_rate": 3.149035275242441e-05, | |
| "loss": 0.1139, | |
| "mean_token_accuracy": 0.9638341292738914, | |
| "step": 584 | |
| }, | |
| { | |
| "epoch": 3.4222873900293256, | |
| "grad_norm": 1.1840279179274757, | |
| "learning_rate": 3.1461196512845834e-05, | |
| "loss": 0.156, | |
| "mean_token_accuracy": 0.9580442979931831, | |
| "step": 585 | |
| }, | |
| { | |
| "epoch": 3.4281524926686218, | |
| "grad_norm": 1.1755866291718038, | |
| "learning_rate": 3.143200593294504e-05, | |
| "loss": 0.1299, | |
| "mean_token_accuracy": 0.9646251425147057, | |
| "step": 586 | |
| }, | |
| { | |
| "epoch": 3.434017595307918, | |
| "grad_norm": 1.2021040143685826, | |
| "learning_rate": 3.1402781118672065e-05, | |
| "loss": 0.1452, | |
| "mean_token_accuracy": 0.960387721657753, | |
| "step": 587 | |
| }, | |
| { | |
| "epoch": 3.439882697947214, | |
| "grad_norm": 1.2658177391122667, | |
| "learning_rate": 3.137352217610115e-05, | |
| "loss": 0.1318, | |
| "mean_token_accuracy": 0.9596589729189873, | |
| "step": 588 | |
| }, | |
| { | |
| "epoch": 3.4457478005865103, | |
| "grad_norm": 1.161450002460737, | |
| "learning_rate": 3.1344229211430465e-05, | |
| "loss": 0.1424, | |
| "mean_token_accuracy": 0.9619096294045448, | |
| "step": 589 | |
| }, | |
| { | |
| "epoch": 3.4516129032258065, | |
| "grad_norm": 1.1764503462466256, | |
| "learning_rate": 3.131490233098164e-05, | |
| "loss": 0.1086, | |
| "mean_token_accuracy": 0.9722210243344307, | |
| "step": 590 | |
| }, | |
| { | |
| "epoch": 3.4574780058651027, | |
| "grad_norm": 1.0200394197140978, | |
| "learning_rate": 3.1285541641199383e-05, | |
| "loss": 0.1298, | |
| "mean_token_accuracy": 0.9627177715301514, | |
| "step": 591 | |
| }, | |
| { | |
| "epoch": 3.463343108504399, | |
| "grad_norm": 1.0891847830733288, | |
| "learning_rate": 3.1256147248651166e-05, | |
| "loss": 0.1121, | |
| "mean_token_accuracy": 0.9680601879954338, | |
| "step": 592 | |
| }, | |
| { | |
| "epoch": 3.469208211143695, | |
| "grad_norm": 1.2405132487377248, | |
| "learning_rate": 3.122671926002675e-05, | |
| "loss": 0.1446, | |
| "mean_token_accuracy": 0.9562314078211784, | |
| "step": 593 | |
| }, | |
| { | |
| "epoch": 3.4750733137829912, | |
| "grad_norm": 1.0110802653353386, | |
| "learning_rate": 3.119725778213785e-05, | |
| "loss": 0.1387, | |
| "mean_token_accuracy": 0.9566402360796928, | |
| "step": 594 | |
| }, | |
| { | |
| "epoch": 3.4809384164222874, | |
| "grad_norm": 1.1974141187037954, | |
| "learning_rate": 3.116776292191774e-05, | |
| "loss": 0.1597, | |
| "mean_token_accuracy": 0.9548348411917686, | |
| "step": 595 | |
| }, | |
| { | |
| "epoch": 3.4868035190615836, | |
| "grad_norm": 1.1165196094909804, | |
| "learning_rate": 3.1138234786420834e-05, | |
| "loss": 0.1148, | |
| "mean_token_accuracy": 0.9665255323052406, | |
| "step": 596 | |
| }, | |
| { | |
| "epoch": 3.4926686217008798, | |
| "grad_norm": 1.0169170518057122, | |
| "learning_rate": 3.110867348282235e-05, | |
| "loss": 0.1295, | |
| "mean_token_accuracy": 0.9588236883282661, | |
| "step": 597 | |
| }, | |
| { | |
| "epoch": 3.498533724340176, | |
| "grad_norm": 1.153587103488231, | |
| "learning_rate": 3.107907911841787e-05, | |
| "loss": 0.1223, | |
| "mean_token_accuracy": 0.9607840701937675, | |
| "step": 598 | |
| }, | |
| { | |
| "epoch": 3.504398826979472, | |
| "grad_norm": 1.087199024371378, | |
| "learning_rate": 3.104945180062301e-05, | |
| "loss": 0.1179, | |
| "mean_token_accuracy": 0.9659548401832581, | |
| "step": 599 | |
| }, | |
| { | |
| "epoch": 3.5102639296187683, | |
| "grad_norm": 1.0293281607388844, | |
| "learning_rate": 3.1019791636972936e-05, | |
| "loss": 0.1238, | |
| "mean_token_accuracy": 0.9601342305541039, | |
| "step": 600 | |
| }, | |
| { | |
| "epoch": 3.5161290322580645, | |
| "grad_norm": 1.031917003106208, | |
| "learning_rate": 3.099009873512208e-05, | |
| "loss": 0.1261, | |
| "mean_token_accuracy": 0.9628048911690712, | |
| "step": 601 | |
| }, | |
| { | |
| "epoch": 3.5219941348973607, | |
| "grad_norm": 0.9953120358900215, | |
| "learning_rate": 3.0960373202843685e-05, | |
| "loss": 0.1124, | |
| "mean_token_accuracy": 0.9684698581695557, | |
| "step": 602 | |
| }, | |
| { | |
| "epoch": 3.527859237536657, | |
| "grad_norm": 1.1759720953251884, | |
| "learning_rate": 3.093061514802943e-05, | |
| "loss": 0.1526, | |
| "mean_token_accuracy": 0.9586720243096352, | |
| "step": 603 | |
| }, | |
| { | |
| "epoch": 3.533724340175953, | |
| "grad_norm": 1.2509364663673235, | |
| "learning_rate": 3.090082467868901e-05, | |
| "loss": 0.1153, | |
| "mean_token_accuracy": 0.9655359834432602, | |
| "step": 604 | |
| }, | |
| { | |
| "epoch": 3.5395894428152492, | |
| "grad_norm": 1.184032879476615, | |
| "learning_rate": 3.087100190294983e-05, | |
| "loss": 0.1387, | |
| "mean_token_accuracy": 0.9585594981908798, | |
| "step": 605 | |
| }, | |
| { | |
| "epoch": 3.5454545454545454, | |
| "grad_norm": 1.2334299730537033, | |
| "learning_rate": 3.0841146929056505e-05, | |
| "loss": 0.1336, | |
| "mean_token_accuracy": 0.9633355215191841, | |
| "step": 606 | |
| }, | |
| { | |
| "epoch": 3.5513196480938416, | |
| "grad_norm": 1.3935758743568316, | |
| "learning_rate": 3.0811259865370535e-05, | |
| "loss": 0.1196, | |
| "mean_token_accuracy": 0.9636550173163414, | |
| "step": 607 | |
| }, | |
| { | |
| "epoch": 3.557184750733138, | |
| "grad_norm": 1.2490583488517504, | |
| "learning_rate": 3.07813408203699e-05, | |
| "loss": 0.1272, | |
| "mean_token_accuracy": 0.9602638855576515, | |
| "step": 608 | |
| }, | |
| { | |
| "epoch": 3.563049853372434, | |
| "grad_norm": 1.0415243566749401, | |
| "learning_rate": 3.075138990264863e-05, | |
| "loss": 0.1651, | |
| "mean_token_accuracy": 0.9521684423089027, | |
| "step": 609 | |
| }, | |
| { | |
| "epoch": 3.56891495601173, | |
| "grad_norm": 1.3002877876142311, | |
| "learning_rate": 3.072140722091648e-05, | |
| "loss": 0.1157, | |
| "mean_token_accuracy": 0.9622488170862198, | |
| "step": 610 | |
| }, | |
| { | |
| "epoch": 3.5747800586510263, | |
| "grad_norm": 1.1177745433027027, | |
| "learning_rate": 3.0691392883998455e-05, | |
| "loss": 0.1553, | |
| "mean_token_accuracy": 0.9571073427796364, | |
| "step": 611 | |
| }, | |
| { | |
| "epoch": 3.5806451612903225, | |
| "grad_norm": 1.097455875690594, | |
| "learning_rate": 3.0661347000834496e-05, | |
| "loss": 0.1207, | |
| "mean_token_accuracy": 0.966348297894001, | |
| "step": 612 | |
| }, | |
| { | |
| "epoch": 3.5865102639296187, | |
| "grad_norm": 0.92337518870407, | |
| "learning_rate": 3.063126968047901e-05, | |
| "loss": 0.1241, | |
| "mean_token_accuracy": 0.9607681557536125, | |
| "step": 613 | |
| }, | |
| { | |
| "epoch": 3.592375366568915, | |
| "grad_norm": 1.1556433274194684, | |
| "learning_rate": 3.060116103210053e-05, | |
| "loss": 0.103, | |
| "mean_token_accuracy": 0.9667187258601189, | |
| "step": 614 | |
| }, | |
| { | |
| "epoch": 3.598240469208211, | |
| "grad_norm": 0.8803863695333437, | |
| "learning_rate": 3.057102116498129e-05, | |
| "loss": 0.1184, | |
| "mean_token_accuracy": 0.9620434492826462, | |
| "step": 615 | |
| }, | |
| { | |
| "epoch": 3.6041055718475072, | |
| "grad_norm": 1.1066441109805365, | |
| "learning_rate": 3.0540850188516826e-05, | |
| "loss": 0.1278, | |
| "mean_token_accuracy": 0.9623741805553436, | |
| "step": 616 | |
| }, | |
| { | |
| "epoch": 3.6099706744868034, | |
| "grad_norm": 0.9565613650928324, | |
| "learning_rate": 3.051064821221561e-05, | |
| "loss": 0.1007, | |
| "mean_token_accuracy": 0.9718799218535423, | |
| "step": 617 | |
| }, | |
| { | |
| "epoch": 3.6158357771260996, | |
| "grad_norm": 1.1056816724683267, | |
| "learning_rate": 3.0480415345698606e-05, | |
| "loss": 0.1548, | |
| "mean_token_accuracy": 0.9537332579493523, | |
| "step": 618 | |
| }, | |
| { | |
| "epoch": 3.621700879765396, | |
| "grad_norm": 1.1557201136270507, | |
| "learning_rate": 3.045015169869892e-05, | |
| "loss": 0.1302, | |
| "mean_token_accuracy": 0.9655744209885597, | |
| "step": 619 | |
| }, | |
| { | |
| "epoch": 3.627565982404692, | |
| "grad_norm": 1.0673108801225792, | |
| "learning_rate": 3.0419857381061355e-05, | |
| "loss": 0.1308, | |
| "mean_token_accuracy": 0.9608449414372444, | |
| "step": 620 | |
| }, | |
| { | |
| "epoch": 3.633431085043988, | |
| "grad_norm": 0.9556116815559285, | |
| "learning_rate": 3.0389532502742066e-05, | |
| "loss": 0.1151, | |
| "mean_token_accuracy": 0.9625567346811295, | |
| "step": 621 | |
| }, | |
| { | |
| "epoch": 3.6392961876832843, | |
| "grad_norm": 1.2310880687685863, | |
| "learning_rate": 3.0359177173808104e-05, | |
| "loss": 0.1298, | |
| "mean_token_accuracy": 0.9569227620959282, | |
| "step": 622 | |
| }, | |
| { | |
| "epoch": 3.6451612903225805, | |
| "grad_norm": 1.2080524487010553, | |
| "learning_rate": 3.032879150443705e-05, | |
| "loss": 0.1309, | |
| "mean_token_accuracy": 0.9636347144842148, | |
| "step": 623 | |
| }, | |
| { | |
| "epoch": 3.6510263929618767, | |
| "grad_norm": 1.1297303257876061, | |
| "learning_rate": 3.029837560491662e-05, | |
| "loss": 0.1222, | |
| "mean_token_accuracy": 0.9660240784287453, | |
| "step": 624 | |
| }, | |
| { | |
| "epoch": 3.656891495601173, | |
| "grad_norm": 1.1107968747130714, | |
| "learning_rate": 3.0267929585644236e-05, | |
| "loss": 0.1432, | |
| "mean_token_accuracy": 0.9562357887625694, | |
| "step": 625 | |
| }, | |
| { | |
| "epoch": 3.662756598240469, | |
| "grad_norm": 1.0536153520517821, | |
| "learning_rate": 3.0237453557126656e-05, | |
| "loss": 0.1141, | |
| "mean_token_accuracy": 0.9647841155529022, | |
| "step": 626 | |
| }, | |
| { | |
| "epoch": 3.6686217008797652, | |
| "grad_norm": 0.9835257570608045, | |
| "learning_rate": 3.020694762997956e-05, | |
| "loss": 0.1219, | |
| "mean_token_accuracy": 0.9642433151602745, | |
| "step": 627 | |
| }, | |
| { | |
| "epoch": 3.6744868035190614, | |
| "grad_norm": 0.9339989411565239, | |
| "learning_rate": 3.017641191492714e-05, | |
| "loss": 0.1064, | |
| "mean_token_accuracy": 0.9676420092582703, | |
| "step": 628 | |
| }, | |
| { | |
| "epoch": 3.6803519061583576, | |
| "grad_norm": 0.9153048871411351, | |
| "learning_rate": 3.0145846522801703e-05, | |
| "loss": 0.1108, | |
| "mean_token_accuracy": 0.9640639424324036, | |
| "step": 629 | |
| }, | |
| { | |
| "epoch": 3.686217008797654, | |
| "grad_norm": 1.1702691358319068, | |
| "learning_rate": 3.0115251564543287e-05, | |
| "loss": 0.1546, | |
| "mean_token_accuracy": 0.9555616602301598, | |
| "step": 630 | |
| }, | |
| { | |
| "epoch": 3.6920821114369504, | |
| "grad_norm": 1.4176417497299738, | |
| "learning_rate": 3.008462715119922e-05, | |
| "loss": 0.1784, | |
| "mean_token_accuracy": 0.9492093324661255, | |
| "step": 631 | |
| }, | |
| { | |
| "epoch": 3.6979472140762466, | |
| "grad_norm": 1.6185985821173499, | |
| "learning_rate": 3.0053973393923768e-05, | |
| "loss": 0.1197, | |
| "mean_token_accuracy": 0.9643609151244164, | |
| "step": 632 | |
| }, | |
| { | |
| "epoch": 3.703812316715543, | |
| "grad_norm": 1.1281455495325605, | |
| "learning_rate": 3.0023290403977694e-05, | |
| "loss": 0.1435, | |
| "mean_token_accuracy": 0.9565146565437317, | |
| "step": 633 | |
| }, | |
| { | |
| "epoch": 3.709677419354839, | |
| "grad_norm": 1.3100047889969146, | |
| "learning_rate": 2.9992578292727842e-05, | |
| "loss": 0.1398, | |
| "mean_token_accuracy": 0.9574306532740593, | |
| "step": 634 | |
| }, | |
| { | |
| "epoch": 3.715542521994135, | |
| "grad_norm": 0.9544359363476402, | |
| "learning_rate": 2.9961837171646778e-05, | |
| "loss": 0.1197, | |
| "mean_token_accuracy": 0.9641690477728844, | |
| "step": 635 | |
| }, | |
| { | |
| "epoch": 3.7214076246334313, | |
| "grad_norm": 1.3561935467580721, | |
| "learning_rate": 2.993106715231237e-05, | |
| "loss": 0.1426, | |
| "mean_token_accuracy": 0.9598167091608047, | |
| "step": 636 | |
| }, | |
| { | |
| "epoch": 3.7272727272727275, | |
| "grad_norm": 1.3660455931780044, | |
| "learning_rate": 2.9900268346407336e-05, | |
| "loss": 0.1389, | |
| "mean_token_accuracy": 0.9577052295207977, | |
| "step": 637 | |
| }, | |
| { | |
| "epoch": 3.7331378299120237, | |
| "grad_norm": 1.2249643635751344, | |
| "learning_rate": 2.986944086571893e-05, | |
| "loss": 0.1589, | |
| "mean_token_accuracy": 0.9522917121648788, | |
| "step": 638 | |
| }, | |
| { | |
| "epoch": 3.73900293255132, | |
| "grad_norm": 1.1782525356070939, | |
| "learning_rate": 2.983858482213843e-05, | |
| "loss": 0.1173, | |
| "mean_token_accuracy": 0.9656160920858383, | |
| "step": 639 | |
| }, | |
| { | |
| "epoch": 3.744868035190616, | |
| "grad_norm": 0.9773028695412396, | |
| "learning_rate": 2.9807700327660834e-05, | |
| "loss": 0.1376, | |
| "mean_token_accuracy": 0.9574134349822998, | |
| "step": 640 | |
| }, | |
| { | |
| "epoch": 3.7507331378299122, | |
| "grad_norm": 1.3161873189358173, | |
| "learning_rate": 2.977678749438437e-05, | |
| "loss": 0.134, | |
| "mean_token_accuracy": 0.9588236212730408, | |
| "step": 641 | |
| }, | |
| { | |
| "epoch": 3.7565982404692084, | |
| "grad_norm": 0.9442201655104536, | |
| "learning_rate": 2.9745846434510146e-05, | |
| "loss": 0.1181, | |
| "mean_token_accuracy": 0.965650200843811, | |
| "step": 642 | |
| }, | |
| { | |
| "epoch": 3.7624633431085046, | |
| "grad_norm": 1.1777506571798453, | |
| "learning_rate": 2.9714877260341705e-05, | |
| "loss": 0.1451, | |
| "mean_token_accuracy": 0.9522349908947945, | |
| "step": 643 | |
| }, | |
| { | |
| "epoch": 3.768328445747801, | |
| "grad_norm": 0.8478782526737215, | |
| "learning_rate": 2.9683880084284648e-05, | |
| "loss": 0.0946, | |
| "mean_token_accuracy": 0.9708394706249237, | |
| "step": 644 | |
| }, | |
| { | |
| "epoch": 3.774193548387097, | |
| "grad_norm": 1.1002209843660342, | |
| "learning_rate": 2.96528550188462e-05, | |
| "loss": 0.1452, | |
| "mean_token_accuracy": 0.9609697312116623, | |
| "step": 645 | |
| }, | |
| { | |
| "epoch": 3.780058651026393, | |
| "grad_norm": 1.0977670049624841, | |
| "learning_rate": 2.962180217663483e-05, | |
| "loss": 0.1444, | |
| "mean_token_accuracy": 0.95732332020998, | |
| "step": 646 | |
| }, | |
| { | |
| "epoch": 3.7859237536656893, | |
| "grad_norm": 1.2094117212667317, | |
| "learning_rate": 2.95907216703598e-05, | |
| "loss": 0.131, | |
| "mean_token_accuracy": 0.9598681703209877, | |
| "step": 647 | |
| }, | |
| { | |
| "epoch": 3.7917888563049855, | |
| "grad_norm": 1.0971502312685675, | |
| "learning_rate": 2.9559613612830797e-05, | |
| "loss": 0.1468, | |
| "mean_token_accuracy": 0.9556907713413239, | |
| "step": 648 | |
| }, | |
| { | |
| "epoch": 3.7976539589442817, | |
| "grad_norm": 1.0613202166159812, | |
| "learning_rate": 2.952847811695751e-05, | |
| "loss": 0.1305, | |
| "mean_token_accuracy": 0.965257078409195, | |
| "step": 649 | |
| }, | |
| { | |
| "epoch": 3.803519061583578, | |
| "grad_norm": 1.0099659020872942, | |
| "learning_rate": 2.9497315295749218e-05, | |
| "loss": 0.1319, | |
| "mean_token_accuracy": 0.9646901562809944, | |
| "step": 650 | |
| }, | |
| { | |
| "epoch": 3.809384164222874, | |
| "grad_norm": 1.2808080436278302, | |
| "learning_rate": 2.9466125262314368e-05, | |
| "loss": 0.1712, | |
| "mean_token_accuracy": 0.9523722976446152, | |
| "step": 651 | |
| }, | |
| { | |
| "epoch": 3.8152492668621703, | |
| "grad_norm": 1.7647260306652934, | |
| "learning_rate": 2.9434908129860193e-05, | |
| "loss": 0.1405, | |
| "mean_token_accuracy": 0.9618514180183411, | |
| "step": 652 | |
| }, | |
| { | |
| "epoch": 3.8211143695014664, | |
| "grad_norm": 1.1534843084754791, | |
| "learning_rate": 2.9403664011692276e-05, | |
| "loss": 0.1544, | |
| "mean_token_accuracy": 0.954981729388237, | |
| "step": 653 | |
| }, | |
| { | |
| "epoch": 3.8269794721407626, | |
| "grad_norm": 1.2939646511087926, | |
| "learning_rate": 2.9372393021214134e-05, | |
| "loss": 0.1588, | |
| "mean_token_accuracy": 0.9539552256464958, | |
| "step": 654 | |
| }, | |
| { | |
| "epoch": 3.832844574780059, | |
| "grad_norm": 1.188278943842319, | |
| "learning_rate": 2.9341095271926842e-05, | |
| "loss": 0.1359, | |
| "mean_token_accuracy": 0.9616807624697685, | |
| "step": 655 | |
| }, | |
| { | |
| "epoch": 3.838709677419355, | |
| "grad_norm": 1.0856399855222887, | |
| "learning_rate": 2.930977087742859e-05, | |
| "loss": 0.1249, | |
| "mean_token_accuracy": 0.9609912484884262, | |
| "step": 656 | |
| }, | |
| { | |
| "epoch": 3.844574780058651, | |
| "grad_norm": 0.9803813921122001, | |
| "learning_rate": 2.9278419951414277e-05, | |
| "loss": 0.144, | |
| "mean_token_accuracy": 0.96265958994627, | |
| "step": 657 | |
| }, | |
| { | |
| "epoch": 3.8504398826979473, | |
| "grad_norm": 1.0825588438292633, | |
| "learning_rate": 2.9247042607675105e-05, | |
| "loss": 0.1355, | |
| "mean_token_accuracy": 0.9593116492033005, | |
| "step": 658 | |
| }, | |
| { | |
| "epoch": 3.8563049853372435, | |
| "grad_norm": 0.7272092106468886, | |
| "learning_rate": 2.9215638960098164e-05, | |
| "loss": 0.0831, | |
| "mean_token_accuracy": 0.9734610915184021, | |
| "step": 659 | |
| }, | |
| { | |
| "epoch": 3.8621700879765397, | |
| "grad_norm": 0.7981455863455788, | |
| "learning_rate": 2.9184209122665996e-05, | |
| "loss": 0.1209, | |
| "mean_token_accuracy": 0.9625027552247047, | |
| "step": 660 | |
| }, | |
| { | |
| "epoch": 3.868035190615836, | |
| "grad_norm": 1.0644155114590241, | |
| "learning_rate": 2.915275320945623e-05, | |
| "loss": 0.1342, | |
| "mean_token_accuracy": 0.962261438369751, | |
| "step": 661 | |
| }, | |
| { | |
| "epoch": 3.873900293255132, | |
| "grad_norm": 1.3058968639705204, | |
| "learning_rate": 2.9121271334641127e-05, | |
| "loss": 0.1375, | |
| "mean_token_accuracy": 0.9608859419822693, | |
| "step": 662 | |
| }, | |
| { | |
| "epoch": 3.8797653958944283, | |
| "grad_norm": 1.1888950650427426, | |
| "learning_rate": 2.908976361248717e-05, | |
| "loss": 0.1335, | |
| "mean_token_accuracy": 0.9634340405464172, | |
| "step": 663 | |
| }, | |
| { | |
| "epoch": 3.8856304985337244, | |
| "grad_norm": 0.9694975031185143, | |
| "learning_rate": 2.9058230157354674e-05, | |
| "loss": 0.1444, | |
| "mean_token_accuracy": 0.9566892609000206, | |
| "step": 664 | |
| }, | |
| { | |
| "epoch": 3.8914956011730206, | |
| "grad_norm": 1.1575330596885278, | |
| "learning_rate": 2.902667108369734e-05, | |
| "loss": 0.1252, | |
| "mean_token_accuracy": 0.9613283276557922, | |
| "step": 665 | |
| }, | |
| { | |
| "epoch": 3.897360703812317, | |
| "grad_norm": 1.1277073069882797, | |
| "learning_rate": 2.8995086506061862e-05, | |
| "loss": 0.1279, | |
| "mean_token_accuracy": 0.9618002399802208, | |
| "step": 666 | |
| }, | |
| { | |
| "epoch": 3.903225806451613, | |
| "grad_norm": 1.121397297414267, | |
| "learning_rate": 2.896347653908749e-05, | |
| "loss": 0.1151, | |
| "mean_token_accuracy": 0.9665411338210106, | |
| "step": 667 | |
| }, | |
| { | |
| "epoch": 3.909090909090909, | |
| "grad_norm": 0.9012462300498526, | |
| "learning_rate": 2.8931841297505657e-05, | |
| "loss": 0.1127, | |
| "mean_token_accuracy": 0.9637566730380058, | |
| "step": 668 | |
| }, | |
| { | |
| "epoch": 3.9149560117302054, | |
| "grad_norm": 0.9109562227585535, | |
| "learning_rate": 2.8900180896139503e-05, | |
| "loss": 0.0955, | |
| "mean_token_accuracy": 0.969886414706707, | |
| "step": 669 | |
| }, | |
| { | |
| "epoch": 3.9208211143695015, | |
| "grad_norm": 1.1687790994100906, | |
| "learning_rate": 2.8868495449903498e-05, | |
| "loss": 0.0936, | |
| "mean_token_accuracy": 0.9717061296105385, | |
| "step": 670 | |
| }, | |
| { | |
| "epoch": 3.9266862170087977, | |
| "grad_norm": 0.7755929180254743, | |
| "learning_rate": 2.8836785073803014e-05, | |
| "loss": 0.1094, | |
| "mean_token_accuracy": 0.9656796231865883, | |
| "step": 671 | |
| }, | |
| { | |
| "epoch": 3.932551319648094, | |
| "grad_norm": 0.9446774901165217, | |
| "learning_rate": 2.880504988293391e-05, | |
| "loss": 0.1344, | |
| "mean_token_accuracy": 0.9652603045105934, | |
| "step": 672 | |
| }, | |
| { | |
| "epoch": 3.93841642228739, | |
| "grad_norm": 1.111729269543401, | |
| "learning_rate": 2.8773289992482115e-05, | |
| "loss": 0.1215, | |
| "mean_token_accuracy": 0.9619983360171318, | |
| "step": 673 | |
| }, | |
| { | |
| "epoch": 3.9442815249266863, | |
| "grad_norm": 1.3795999777982066, | |
| "learning_rate": 2.87415055177232e-05, | |
| "loss": 0.1157, | |
| "mean_token_accuracy": 0.9651115536689758, | |
| "step": 674 | |
| }, | |
| { | |
| "epoch": 3.9501466275659824, | |
| "grad_norm": 1.0118979486417967, | |
| "learning_rate": 2.870969657402197e-05, | |
| "loss": 0.1345, | |
| "mean_token_accuracy": 0.9584571942687035, | |
| "step": 675 | |
| }, | |
| { | |
| "epoch": 3.9560117302052786, | |
| "grad_norm": 1.326426843091007, | |
| "learning_rate": 2.867786327683205e-05, | |
| "loss": 0.158, | |
| "mean_token_accuracy": 0.9536311253905296, | |
| "step": 676 | |
| }, | |
| { | |
| "epoch": 3.961876832844575, | |
| "grad_norm": 0.9390313577441797, | |
| "learning_rate": 2.864600574169545e-05, | |
| "loss": 0.1238, | |
| "mean_token_accuracy": 0.9652499556541443, | |
| "step": 677 | |
| }, | |
| { | |
| "epoch": 3.967741935483871, | |
| "grad_norm": 1.0821150137608206, | |
| "learning_rate": 2.861412408424216e-05, | |
| "loss": 0.1205, | |
| "mean_token_accuracy": 0.9640393927693367, | |
| "step": 678 | |
| }, | |
| { | |
| "epoch": 3.973607038123167, | |
| "grad_norm": 1.1046019376949232, | |
| "learning_rate": 2.8582218420189706e-05, | |
| "loss": 0.1419, | |
| "mean_token_accuracy": 0.9601811021566391, | |
| "step": 679 | |
| }, | |
| { | |
| "epoch": 3.9794721407624634, | |
| "grad_norm": 1.3616091121662732, | |
| "learning_rate": 2.855028886534278e-05, | |
| "loss": 0.1511, | |
| "mean_token_accuracy": 0.9565163180232048, | |
| "step": 680 | |
| }, | |
| { | |
| "epoch": 3.9853372434017595, | |
| "grad_norm": 1.2082659273486551, | |
| "learning_rate": 2.851833553559276e-05, | |
| "loss": 0.1271, | |
| "mean_token_accuracy": 0.9636896699666977, | |
| "step": 681 | |
| }, | |
| { | |
| "epoch": 3.9912023460410557, | |
| "grad_norm": 1.0850403614399995, | |
| "learning_rate": 2.848635854691733e-05, | |
| "loss": 0.1368, | |
| "mean_token_accuracy": 0.9602132365107536, | |
| "step": 682 | |
| }, | |
| { | |
| "epoch": 3.997067448680352, | |
| "grad_norm": 0.9493712048082743, | |
| "learning_rate": 2.8454358015380046e-05, | |
| "loss": 0.0986, | |
| "mean_token_accuracy": 0.9702668786048889, | |
| "step": 683 | |
| }, | |
| { | |
| "epoch": 4.0, | |
| "grad_norm": 1.558059945609864, | |
| "learning_rate": 2.8422334057129913e-05, | |
| "loss": 0.1191, | |
| "mean_token_accuracy": 0.9623651206493378, | |
| "step": 684 | |
| }, | |
| { | |
| "epoch": 4.005865102639296, | |
| "grad_norm": 0.842862500749142, | |
| "learning_rate": 2.8390286788400967e-05, | |
| "loss": 0.0957, | |
| "mean_token_accuracy": 0.9664551466703415, | |
| "step": 685 | |
| }, | |
| { | |
| "epoch": 4.011730205278592, | |
| "grad_norm": 0.6795835077138793, | |
| "learning_rate": 2.8358216325511847e-05, | |
| "loss": 0.0801, | |
| "mean_token_accuracy": 0.9761439710855484, | |
| "step": 686 | |
| }, | |
| { | |
| "epoch": 4.0175953079178885, | |
| "grad_norm": 1.1456567621753069, | |
| "learning_rate": 2.832612278486538e-05, | |
| "loss": 0.1198, | |
| "mean_token_accuracy": 0.963876485824585, | |
| "step": 687 | |
| }, | |
| { | |
| "epoch": 4.023460410557185, | |
| "grad_norm": 7.8979363537657035, | |
| "learning_rate": 2.8294006282948165e-05, | |
| "loss": 0.2698, | |
| "mean_token_accuracy": 0.9692637398838997, | |
| "step": 688 | |
| }, | |
| { | |
| "epoch": 4.029325513196481, | |
| "grad_norm": 1.1045222161731982, | |
| "learning_rate": 2.8261866936330123e-05, | |
| "loss": 0.1049, | |
| "mean_token_accuracy": 0.965548038482666, | |
| "step": 689 | |
| }, | |
| { | |
| "epoch": 4.035190615835777, | |
| "grad_norm": 1.2554508151838817, | |
| "learning_rate": 2.8229704861664113e-05, | |
| "loss": 0.1036, | |
| "mean_token_accuracy": 0.9706982374191284, | |
| "step": 690 | |
| }, | |
| { | |
| "epoch": 4.041055718475073, | |
| "grad_norm": 1.1041277073123883, | |
| "learning_rate": 2.8197520175685462e-05, | |
| "loss": 0.0998, | |
| "mean_token_accuracy": 0.9741102978587151, | |
| "step": 691 | |
| }, | |
| { | |
| "epoch": 4.0469208211143695, | |
| "grad_norm": 0.8006560792352735, | |
| "learning_rate": 2.8165312995211596e-05, | |
| "loss": 0.0927, | |
| "mean_token_accuracy": 0.9737097099423409, | |
| "step": 692 | |
| }, | |
| { | |
| "epoch": 4.052785923753666, | |
| "grad_norm": 0.8655138713705752, | |
| "learning_rate": 2.813308343714156e-05, | |
| "loss": 0.084, | |
| "mean_token_accuracy": 0.9713312909007072, | |
| "step": 693 | |
| }, | |
| { | |
| "epoch": 4.058651026392962, | |
| "grad_norm": 0.9516836003751642, | |
| "learning_rate": 2.810083161845564e-05, | |
| "loss": 0.111, | |
| "mean_token_accuracy": 0.9674246981739998, | |
| "step": 694 | |
| }, | |
| { | |
| "epoch": 4.064516129032258, | |
| "grad_norm": 0.8720519712759406, | |
| "learning_rate": 2.8068557656214913e-05, | |
| "loss": 0.0866, | |
| "mean_token_accuracy": 0.9744721204042435, | |
| "step": 695 | |
| }, | |
| { | |
| "epoch": 4.070381231671554, | |
| "grad_norm": 0.6222607867296382, | |
| "learning_rate": 2.8036261667560826e-05, | |
| "loss": 0.0932, | |
| "mean_token_accuracy": 0.9742299988865852, | |
| "step": 696 | |
| }, | |
| { | |
| "epoch": 4.07624633431085, | |
| "grad_norm": 0.7960989210401829, | |
| "learning_rate": 2.8003943769714776e-05, | |
| "loss": 0.1093, | |
| "mean_token_accuracy": 0.9663450121879578, | |
| "step": 697 | |
| }, | |
| { | |
| "epoch": 4.0821114369501466, | |
| "grad_norm": 1.1019502994755033, | |
| "learning_rate": 2.7971604079977673e-05, | |
| "loss": 0.1185, | |
| "mean_token_accuracy": 0.96397615224123, | |
| "step": 698 | |
| }, | |
| { | |
| "epoch": 4.087976539589443, | |
| "grad_norm": 0.7470501123132239, | |
| "learning_rate": 2.793924271572954e-05, | |
| "loss": 0.0873, | |
| "mean_token_accuracy": 0.9746774211525917, | |
| "step": 699 | |
| }, | |
| { | |
| "epoch": 4.093841642228739, | |
| "grad_norm": 0.8010505075241883, | |
| "learning_rate": 2.7906859794429047e-05, | |
| "loss": 0.0995, | |
| "mean_token_accuracy": 0.9679571017622948, | |
| "step": 700 | |
| }, | |
| { | |
| "epoch": 4.099706744868035, | |
| "grad_norm": 0.8813053810673643, | |
| "learning_rate": 2.787445543361313e-05, | |
| "loss": 0.0925, | |
| "mean_token_accuracy": 0.9711523503065109, | |
| "step": 701 | |
| }, | |
| { | |
| "epoch": 4.105571847507331, | |
| "grad_norm": 1.1338479600606155, | |
| "learning_rate": 2.7842029750896525e-05, | |
| "loss": 0.1293, | |
| "mean_token_accuracy": 0.9662352055311203, | |
| "step": 702 | |
| }, | |
| { | |
| "epoch": 4.1114369501466275, | |
| "grad_norm": 1.1524380891393422, | |
| "learning_rate": 2.7809582863971373e-05, | |
| "loss": 0.1123, | |
| "mean_token_accuracy": 0.9671714976429939, | |
| "step": 703 | |
| }, | |
| { | |
| "epoch": 4.117302052785924, | |
| "grad_norm": 0.9880154262384708, | |
| "learning_rate": 2.777711489060676e-05, | |
| "loss": 0.1175, | |
| "mean_token_accuracy": 0.9668581932783127, | |
| "step": 704 | |
| }, | |
| { | |
| "epoch": 4.12316715542522, | |
| "grad_norm": 0.8689806304769432, | |
| "learning_rate": 2.7744625948648316e-05, | |
| "loss": 0.0974, | |
| "mean_token_accuracy": 0.9735423848032951, | |
| "step": 705 | |
| }, | |
| { | |
| "epoch": 4.129032258064516, | |
| "grad_norm": 0.941884543587427, | |
| "learning_rate": 2.7712116156017783e-05, | |
| "loss": 0.1003, | |
| "mean_token_accuracy": 0.9737986624240875, | |
| "step": 706 | |
| }, | |
| { | |
| "epoch": 4.134897360703812, | |
| "grad_norm": 0.9450484339416181, | |
| "learning_rate": 2.7679585630712585e-05, | |
| "loss": 0.1072, | |
| "mean_token_accuracy": 0.9653824865818024, | |
| "step": 707 | |
| }, | |
| { | |
| "epoch": 4.140762463343108, | |
| "grad_norm": 0.7072259315634002, | |
| "learning_rate": 2.764703449080538e-05, | |
| "loss": 0.0998, | |
| "mean_token_accuracy": 0.9729399904608727, | |
| "step": 708 | |
| }, | |
| { | |
| "epoch": 4.146627565982405, | |
| "grad_norm": 0.950898645982569, | |
| "learning_rate": 2.761446285444366e-05, | |
| "loss": 0.1123, | |
| "mean_token_accuracy": 0.9680687040090561, | |
| "step": 709 | |
| }, | |
| { | |
| "epoch": 4.152492668621701, | |
| "grad_norm": 0.738092645530511, | |
| "learning_rate": 2.758187083984931e-05, | |
| "loss": 0.0884, | |
| "mean_token_accuracy": 0.9752467274665833, | |
| "step": 710 | |
| }, | |
| { | |
| "epoch": 4.158357771260997, | |
| "grad_norm": 1.0204267654285655, | |
| "learning_rate": 2.754925856531819e-05, | |
| "loss": 0.1179, | |
| "mean_token_accuracy": 0.9653342291712761, | |
| "step": 711 | |
| }, | |
| { | |
| "epoch": 4.164222873900293, | |
| "grad_norm": 0.9770563801600024, | |
| "learning_rate": 2.7516626149219678e-05, | |
| "loss": 0.0976, | |
| "mean_token_accuracy": 0.9727638140320778, | |
| "step": 712 | |
| }, | |
| { | |
| "epoch": 4.170087976539589, | |
| "grad_norm": 1.0725235604843808, | |
| "learning_rate": 2.7483973709996267e-05, | |
| "loss": 0.1082, | |
| "mean_token_accuracy": 0.9659662619233131, | |
| "step": 713 | |
| }, | |
| { | |
| "epoch": 4.1759530791788855, | |
| "grad_norm": 0.8958740426822329, | |
| "learning_rate": 2.7451301366163116e-05, | |
| "loss": 0.1224, | |
| "mean_token_accuracy": 0.9643898904323578, | |
| "step": 714 | |
| }, | |
| { | |
| "epoch": 4.181818181818182, | |
| "grad_norm": 0.6766982108825446, | |
| "learning_rate": 2.741860923630765e-05, | |
| "loss": 0.0815, | |
| "mean_token_accuracy": 0.9771873876452446, | |
| "step": 715 | |
| }, | |
| { | |
| "epoch": 4.187683284457478, | |
| "grad_norm": 1.048072499756525, | |
| "learning_rate": 2.7385897439089086e-05, | |
| "loss": 0.1171, | |
| "mean_token_accuracy": 0.9654709845781326, | |
| "step": 716 | |
| }, | |
| { | |
| "epoch": 4.193548387096774, | |
| "grad_norm": 0.8912351764553597, | |
| "learning_rate": 2.735316609323804e-05, | |
| "loss": 0.1169, | |
| "mean_token_accuracy": 0.9646259918808937, | |
| "step": 717 | |
| }, | |
| { | |
| "epoch": 4.19941348973607, | |
| "grad_norm": 0.9390469569028205, | |
| "learning_rate": 2.7320415317556085e-05, | |
| "loss": 0.1046, | |
| "mean_token_accuracy": 0.9704753458499908, | |
| "step": 718 | |
| }, | |
| { | |
| "epoch": 4.205278592375366, | |
| "grad_norm": 0.8560329829131477, | |
| "learning_rate": 2.72876452309153e-05, | |
| "loss": 0.0841, | |
| "mean_token_accuracy": 0.9733430370688438, | |
| "step": 719 | |
| }, | |
| { | |
| "epoch": 4.211143695014663, | |
| "grad_norm": 0.8933889301548646, | |
| "learning_rate": 2.7254855952257867e-05, | |
| "loss": 0.102, | |
| "mean_token_accuracy": 0.9689731150865555, | |
| "step": 720 | |
| }, | |
| { | |
| "epoch": 4.217008797653959, | |
| "grad_norm": 1.102744978983958, | |
| "learning_rate": 2.7222047600595626e-05, | |
| "loss": 0.1391, | |
| "mean_token_accuracy": 0.9602887108922005, | |
| "step": 721 | |
| }, | |
| { | |
| "epoch": 4.222873900293255, | |
| "grad_norm": 0.7769746447260107, | |
| "learning_rate": 2.718922029500965e-05, | |
| "loss": 0.1051, | |
| "mean_token_accuracy": 0.9703327119350433, | |
| "step": 722 | |
| }, | |
| { | |
| "epoch": 4.228739002932551, | |
| "grad_norm": 0.8830152108120385, | |
| "learning_rate": 2.7156374154649787e-05, | |
| "loss": 0.1, | |
| "mean_token_accuracy": 0.9667406901717186, | |
| "step": 723 | |
| }, | |
| { | |
| "epoch": 4.234604105571847, | |
| "grad_norm": 0.9571228011975423, | |
| "learning_rate": 2.7123509298734267e-05, | |
| "loss": 0.1015, | |
| "mean_token_accuracy": 0.970310315489769, | |
| "step": 724 | |
| }, | |
| { | |
| "epoch": 4.2404692082111435, | |
| "grad_norm": 0.9452784130844039, | |
| "learning_rate": 2.7090625846549247e-05, | |
| "loss": 0.1046, | |
| "mean_token_accuracy": 0.9675817862153053, | |
| "step": 725 | |
| }, | |
| { | |
| "epoch": 4.24633431085044, | |
| "grad_norm": 1.0691326279227324, | |
| "learning_rate": 2.705772391744837e-05, | |
| "loss": 0.123, | |
| "mean_token_accuracy": 0.970280796289444, | |
| "step": 726 | |
| }, | |
| { | |
| "epoch": 4.252199413489736, | |
| "grad_norm": 0.9002102935983834, | |
| "learning_rate": 2.7024803630852362e-05, | |
| "loss": 0.1127, | |
| "mean_token_accuracy": 0.9712927043437958, | |
| "step": 727 | |
| }, | |
| { | |
| "epoch": 4.258064516129032, | |
| "grad_norm": 1.1839994161421517, | |
| "learning_rate": 2.699186510624856e-05, | |
| "loss": 0.1183, | |
| "mean_token_accuracy": 0.9680011197924614, | |
| "step": 728 | |
| }, | |
| { | |
| "epoch": 4.263929618768328, | |
| "grad_norm": 0.9930191904934016, | |
| "learning_rate": 2.6958908463190506e-05, | |
| "loss": 0.1281, | |
| "mean_token_accuracy": 0.9630226120352745, | |
| "step": 729 | |
| }, | |
| { | |
| "epoch": 4.269794721407624, | |
| "grad_norm": 0.9676556571129576, | |
| "learning_rate": 2.6925933821297497e-05, | |
| "loss": 0.1171, | |
| "mean_token_accuracy": 0.9654005244374275, | |
| "step": 730 | |
| }, | |
| { | |
| "epoch": 4.275659824046921, | |
| "grad_norm": 1.1227004079371043, | |
| "learning_rate": 2.6892941300254176e-05, | |
| "loss": 0.1151, | |
| "mean_token_accuracy": 0.9704194962978363, | |
| "step": 731 | |
| }, | |
| { | |
| "epoch": 4.281524926686217, | |
| "grad_norm": 0.938382034990061, | |
| "learning_rate": 2.685993101981007e-05, | |
| "loss": 0.1058, | |
| "mean_token_accuracy": 0.968720979988575, | |
| "step": 732 | |
| }, | |
| { | |
| "epoch": 4.287390029325513, | |
| "grad_norm": 0.9381935165005029, | |
| "learning_rate": 2.6826903099779157e-05, | |
| "loss": 0.1034, | |
| "mean_token_accuracy": 0.9666604846715927, | |
| "step": 733 | |
| }, | |
| { | |
| "epoch": 4.293255131964809, | |
| "grad_norm": 0.9325917617385888, | |
| "learning_rate": 2.679385766003945e-05, | |
| "loss": 0.1131, | |
| "mean_token_accuracy": 0.9669909775257111, | |
| "step": 734 | |
| }, | |
| { | |
| "epoch": 4.299120234604105, | |
| "grad_norm": 0.831981688674269, | |
| "learning_rate": 2.676079482053255e-05, | |
| "loss": 0.1094, | |
| "mean_token_accuracy": 0.9680541455745697, | |
| "step": 735 | |
| }, | |
| { | |
| "epoch": 4.3049853372434015, | |
| "grad_norm": 1.1028897779609337, | |
| "learning_rate": 2.6727714701263212e-05, | |
| "loss": 0.1122, | |
| "mean_token_accuracy": 0.9693733751773834, | |
| "step": 736 | |
| }, | |
| { | |
| "epoch": 4.310850439882698, | |
| "grad_norm": 0.8459161475890958, | |
| "learning_rate": 2.669461742229891e-05, | |
| "loss": 0.0998, | |
| "mean_token_accuracy": 0.9723160266876221, | |
| "step": 737 | |
| }, | |
| { | |
| "epoch": 4.316715542521994, | |
| "grad_norm": 0.8842003257292316, | |
| "learning_rate": 2.6661503103769404e-05, | |
| "loss": 0.1023, | |
| "mean_token_accuracy": 0.971040166914463, | |
| "step": 738 | |
| }, | |
| { | |
| "epoch": 4.32258064516129, | |
| "grad_norm": 0.9346441595866154, | |
| "learning_rate": 2.6628371865866286e-05, | |
| "loss": 0.1192, | |
| "mean_token_accuracy": 0.9664890840649605, | |
| "step": 739 | |
| }, | |
| { | |
| "epoch": 4.328445747800586, | |
| "grad_norm": 0.7353206267430158, | |
| "learning_rate": 2.6595223828842578e-05, | |
| "loss": 0.1031, | |
| "mean_token_accuracy": 0.9708716943860054, | |
| "step": 740 | |
| }, | |
| { | |
| "epoch": 4.334310850439882, | |
| "grad_norm": 0.8150751328388751, | |
| "learning_rate": 2.6562059113012253e-05, | |
| "loss": 0.0953, | |
| "mean_token_accuracy": 0.9729399308562279, | |
| "step": 741 | |
| }, | |
| { | |
| "epoch": 4.340175953079179, | |
| "grad_norm": 0.800064137257197, | |
| "learning_rate": 2.6528877838749853e-05, | |
| "loss": 0.0888, | |
| "mean_token_accuracy": 0.9730389565229416, | |
| "step": 742 | |
| }, | |
| { | |
| "epoch": 4.346041055718475, | |
| "grad_norm": 0.8094750752535848, | |
| "learning_rate": 2.6495680126489984e-05, | |
| "loss": 0.099, | |
| "mean_token_accuracy": 0.9690711572766304, | |
| "step": 743 | |
| }, | |
| { | |
| "epoch": 4.351906158357771, | |
| "grad_norm": 0.7956878136167119, | |
| "learning_rate": 2.6462466096726954e-05, | |
| "loss": 0.1071, | |
| "mean_token_accuracy": 0.9694640338420868, | |
| "step": 744 | |
| }, | |
| { | |
| "epoch": 4.357771260997067, | |
| "grad_norm": 0.9902999973042317, | |
| "learning_rate": 2.6429235870014256e-05, | |
| "loss": 0.0993, | |
| "mean_token_accuracy": 0.9691943004727364, | |
| "step": 745 | |
| }, | |
| { | |
| "epoch": 4.363636363636363, | |
| "grad_norm": 1.138966726928597, | |
| "learning_rate": 2.639598956696421e-05, | |
| "loss": 0.1296, | |
| "mean_token_accuracy": 0.9653717577457428, | |
| "step": 746 | |
| }, | |
| { | |
| "epoch": 4.3695014662756595, | |
| "grad_norm": 0.8390006741125879, | |
| "learning_rate": 2.6362727308247458e-05, | |
| "loss": 0.0895, | |
| "mean_token_accuracy": 0.9693955853581429, | |
| "step": 747 | |
| }, | |
| { | |
| "epoch": 4.375366568914956, | |
| "grad_norm": 0.9486724629738932, | |
| "learning_rate": 2.6329449214592568e-05, | |
| "loss": 0.1134, | |
| "mean_token_accuracy": 0.9696745052933693, | |
| "step": 748 | |
| }, | |
| { | |
| "epoch": 4.381231671554252, | |
| "grad_norm": 1.1974747553210443, | |
| "learning_rate": 2.6296155406785578e-05, | |
| "loss": 0.1097, | |
| "mean_token_accuracy": 0.9677048176527023, | |
| "step": 749 | |
| }, | |
| { | |
| "epoch": 4.387096774193548, | |
| "grad_norm": 0.8717615502118474, | |
| "learning_rate": 2.6262846005669572e-05, | |
| "loss": 0.0939, | |
| "mean_token_accuracy": 0.9716934859752655, | |
| "step": 750 | |
| }, | |
| { | |
| "epoch": 4.392961876832844, | |
| "grad_norm": 0.8679886161549661, | |
| "learning_rate": 2.6229521132144212e-05, | |
| "loss": 0.0983, | |
| "mean_token_accuracy": 0.9671519547700882, | |
| "step": 751 | |
| }, | |
| { | |
| "epoch": 4.39882697947214, | |
| "grad_norm": 0.9182101559650703, | |
| "learning_rate": 2.619618090716534e-05, | |
| "loss": 0.1026, | |
| "mean_token_accuracy": 0.9699187725782394, | |
| "step": 752 | |
| }, | |
| { | |
| "epoch": 4.404692082111437, | |
| "grad_norm": 0.7557881709721908, | |
| "learning_rate": 2.61628254517445e-05, | |
| "loss": 0.0937, | |
| "mean_token_accuracy": 0.9722714796662331, | |
| "step": 753 | |
| }, | |
| { | |
| "epoch": 4.410557184750733, | |
| "grad_norm": 0.9753894611598296, | |
| "learning_rate": 2.612945488694853e-05, | |
| "loss": 0.1196, | |
| "mean_token_accuracy": 0.9658621773123741, | |
| "step": 754 | |
| }, | |
| { | |
| "epoch": 4.416422287390029, | |
| "grad_norm": 0.8489499216990712, | |
| "learning_rate": 2.6096069333899094e-05, | |
| "loss": 0.1004, | |
| "mean_token_accuracy": 0.9738951250910759, | |
| "step": 755 | |
| }, | |
| { | |
| "epoch": 4.422287390029325, | |
| "grad_norm": 1.0767290195457695, | |
| "learning_rate": 2.6062668913772275e-05, | |
| "loss": 0.1339, | |
| "mean_token_accuracy": 0.9635637626051903, | |
| "step": 756 | |
| }, | |
| { | |
| "epoch": 4.428152492668621, | |
| "grad_norm": 2.284013657163814, | |
| "learning_rate": 2.60292537477981e-05, | |
| "loss": 0.1072, | |
| "mean_token_accuracy": 0.9691743478178978, | |
| "step": 757 | |
| }, | |
| { | |
| "epoch": 4.4340175953079175, | |
| "grad_norm": 0.9671090732833222, | |
| "learning_rate": 2.5995823957260132e-05, | |
| "loss": 0.127, | |
| "mean_token_accuracy": 0.9612483829259872, | |
| "step": 758 | |
| }, | |
| { | |
| "epoch": 4.439882697947214, | |
| "grad_norm": 0.7101994331100041, | |
| "learning_rate": 2.596237966349501e-05, | |
| "loss": 0.0992, | |
| "mean_token_accuracy": 0.968992717564106, | |
| "step": 759 | |
| }, | |
| { | |
| "epoch": 4.44574780058651, | |
| "grad_norm": 0.7693183821795423, | |
| "learning_rate": 2.592892098789201e-05, | |
| "loss": 0.0911, | |
| "mean_token_accuracy": 0.9723697900772095, | |
| "step": 760 | |
| }, | |
| { | |
| "epoch": 4.451612903225806, | |
| "grad_norm": 1.4733487666954843, | |
| "learning_rate": 2.589544805189261e-05, | |
| "loss": 0.0984, | |
| "mean_token_accuracy": 0.9718485102057457, | |
| "step": 761 | |
| }, | |
| { | |
| "epoch": 4.457478005865102, | |
| "grad_norm": 1.1763193238951348, | |
| "learning_rate": 2.5861960976990056e-05, | |
| "loss": 0.0965, | |
| "mean_token_accuracy": 0.9736879169940948, | |
| "step": 762 | |
| }, | |
| { | |
| "epoch": 4.463343108504398, | |
| "grad_norm": 1.1738256998389192, | |
| "learning_rate": 2.5828459884728898e-05, | |
| "loss": 0.1122, | |
| "mean_token_accuracy": 0.9685143828392029, | |
| "step": 763 | |
| }, | |
| { | |
| "epoch": 4.469208211143695, | |
| "grad_norm": 0.8326366862614876, | |
| "learning_rate": 2.5794944896704572e-05, | |
| "loss": 0.0956, | |
| "mean_token_accuracy": 0.971512608230114, | |
| "step": 764 | |
| }, | |
| { | |
| "epoch": 4.475073313782991, | |
| "grad_norm": 0.6126184789066507, | |
| "learning_rate": 2.5761416134562955e-05, | |
| "loss": 0.0904, | |
| "mean_token_accuracy": 0.9716848284006119, | |
| "step": 765 | |
| }, | |
| { | |
| "epoch": 4.480938416422287, | |
| "grad_norm": 0.9032550905603639, | |
| "learning_rate": 2.5727873719999904e-05, | |
| "loss": 0.0974, | |
| "mean_token_accuracy": 0.9702484384179115, | |
| "step": 766 | |
| }, | |
| { | |
| "epoch": 4.486803519061583, | |
| "grad_norm": 0.8188882179465061, | |
| "learning_rate": 2.569431777476084e-05, | |
| "loss": 0.1032, | |
| "mean_token_accuracy": 0.9705123081803322, | |
| "step": 767 | |
| }, | |
| { | |
| "epoch": 4.492668621700879, | |
| "grad_norm": 0.7156230718197097, | |
| "learning_rate": 2.566074842064029e-05, | |
| "loss": 0.0836, | |
| "mean_token_accuracy": 0.9748258590698242, | |
| "step": 768 | |
| }, | |
| { | |
| "epoch": 4.4985337243401755, | |
| "grad_norm": 0.7243676974968629, | |
| "learning_rate": 2.562716577948145e-05, | |
| "loss": 0.0913, | |
| "mean_token_accuracy": 0.9733809903264046, | |
| "step": 769 | |
| }, | |
| { | |
| "epoch": 4.504398826979472, | |
| "grad_norm": 0.956927225584052, | |
| "learning_rate": 2.5593569973175757e-05, | |
| "loss": 0.109, | |
| "mean_token_accuracy": 0.9674596711993217, | |
| "step": 770 | |
| }, | |
| { | |
| "epoch": 4.510263929618768, | |
| "grad_norm": 0.839214059168221, | |
| "learning_rate": 2.5559961123662405e-05, | |
| "loss": 0.1025, | |
| "mean_token_accuracy": 0.9712679237127304, | |
| "step": 771 | |
| }, | |
| { | |
| "epoch": 4.516129032258064, | |
| "grad_norm": 0.9417512208409332, | |
| "learning_rate": 2.5526339352927956e-05, | |
| "loss": 0.1198, | |
| "mean_token_accuracy": 0.9672188460826874, | |
| "step": 772 | |
| }, | |
| { | |
| "epoch": 4.52199413489736, | |
| "grad_norm": 0.9006219316570291, | |
| "learning_rate": 2.5492704783005847e-05, | |
| "loss": 0.1067, | |
| "mean_token_accuracy": 0.9678135216236115, | |
| "step": 773 | |
| }, | |
| { | |
| "epoch": 4.527859237536656, | |
| "grad_norm": 1.5476058537705526, | |
| "learning_rate": 2.5459057535975985e-05, | |
| "loss": 0.1365, | |
| "mean_token_accuracy": 0.9669143557548523, | |
| "step": 774 | |
| }, | |
| { | |
| "epoch": 4.533724340175953, | |
| "grad_norm": 0.956697552558829, | |
| "learning_rate": 2.542539773396429e-05, | |
| "loss": 0.1115, | |
| "mean_token_accuracy": 0.9633852392435074, | |
| "step": 775 | |
| }, | |
| { | |
| "epoch": 4.539589442815249, | |
| "grad_norm": 1.197546908803636, | |
| "learning_rate": 2.5391725499142253e-05, | |
| "loss": 0.1357, | |
| "mean_token_accuracy": 0.9662409871816635, | |
| "step": 776 | |
| }, | |
| { | |
| "epoch": 4.545454545454545, | |
| "grad_norm": 0.7588534732881831, | |
| "learning_rate": 2.535804095372648e-05, | |
| "loss": 0.0901, | |
| "mean_token_accuracy": 0.9716041162610054, | |
| "step": 777 | |
| }, | |
| { | |
| "epoch": 4.551319648093841, | |
| "grad_norm": 0.8780606997286142, | |
| "learning_rate": 2.5324344219978273e-05, | |
| "loss": 0.0973, | |
| "mean_token_accuracy": 0.9695460423827171, | |
| "step": 778 | |
| }, | |
| { | |
| "epoch": 4.557184750733137, | |
| "grad_norm": 0.8045552377061164, | |
| "learning_rate": 2.5290635420203162e-05, | |
| "loss": 0.1021, | |
| "mean_token_accuracy": 0.9703366085886955, | |
| "step": 779 | |
| }, | |
| { | |
| "epoch": 4.563049853372434, | |
| "grad_norm": 0.9358205564032939, | |
| "learning_rate": 2.525691467675048e-05, | |
| "loss": 0.1159, | |
| "mean_token_accuracy": 0.9681701958179474, | |
| "step": 780 | |
| }, | |
| { | |
| "epoch": 4.568914956011731, | |
| "grad_norm": 0.7554713795930256, | |
| "learning_rate": 2.5223182112012897e-05, | |
| "loss": 0.101, | |
| "mean_token_accuracy": 0.9721342325210571, | |
| "step": 781 | |
| }, | |
| { | |
| "epoch": 4.574780058651027, | |
| "grad_norm": 0.5804191186796761, | |
| "learning_rate": 2.5189437848426016e-05, | |
| "loss": 0.0809, | |
| "mean_token_accuracy": 0.9757244363427162, | |
| "step": 782 | |
| }, | |
| { | |
| "epoch": 4.580645161290323, | |
| "grad_norm": 1.0167268132012308, | |
| "learning_rate": 2.515568200846787e-05, | |
| "loss": 0.1193, | |
| "mean_token_accuracy": 0.9649367853999138, | |
| "step": 783 | |
| }, | |
| { | |
| "epoch": 4.586510263929619, | |
| "grad_norm": 0.9322914799762485, | |
| "learning_rate": 2.5121914714658526e-05, | |
| "loss": 0.1034, | |
| "mean_token_accuracy": 0.9690342247486115, | |
| "step": 784 | |
| }, | |
| { | |
| "epoch": 4.592375366568915, | |
| "grad_norm": 0.7802061746954657, | |
| "learning_rate": 2.5088136089559636e-05, | |
| "loss": 0.0898, | |
| "mean_token_accuracy": 0.9727255925536156, | |
| "step": 785 | |
| }, | |
| { | |
| "epoch": 4.5982404692082115, | |
| "grad_norm": 0.5844702630089805, | |
| "learning_rate": 2.5054346255773952e-05, | |
| "loss": 0.0783, | |
| "mean_token_accuracy": 0.9748023822903633, | |
| "step": 786 | |
| }, | |
| { | |
| "epoch": 4.604105571847508, | |
| "grad_norm": 0.7971317025860829, | |
| "learning_rate": 2.502054533594493e-05, | |
| "loss": 0.0911, | |
| "mean_token_accuracy": 0.9730968326330185, | |
| "step": 787 | |
| }, | |
| { | |
| "epoch": 4.609970674486804, | |
| "grad_norm": 0.7592860917285904, | |
| "learning_rate": 2.4986733452756264e-05, | |
| "loss": 0.0994, | |
| "mean_token_accuracy": 0.9701942130923271, | |
| "step": 788 | |
| }, | |
| { | |
| "epoch": 4.6158357771261, | |
| "grad_norm": 0.8974946786275971, | |
| "learning_rate": 2.495291072893142e-05, | |
| "loss": 0.1094, | |
| "mean_token_accuracy": 0.9706485345959663, | |
| "step": 789 | |
| }, | |
| { | |
| "epoch": 4.621700879765396, | |
| "grad_norm": 0.9918330093432357, | |
| "learning_rate": 2.4919077287233237e-05, | |
| "loss": 0.1131, | |
| "mean_token_accuracy": 0.9685276672244072, | |
| "step": 790 | |
| }, | |
| { | |
| "epoch": 4.627565982404692, | |
| "grad_norm": 0.7579797556002252, | |
| "learning_rate": 2.4885233250463445e-05, | |
| "loss": 0.0999, | |
| "mean_token_accuracy": 0.9702299237251282, | |
| "step": 791 | |
| }, | |
| { | |
| "epoch": 4.633431085043989, | |
| "grad_norm": 0.8102978461438669, | |
| "learning_rate": 2.485137874146222e-05, | |
| "loss": 0.1013, | |
| "mean_token_accuracy": 0.9672133773565292, | |
| "step": 792 | |
| }, | |
| { | |
| "epoch": 4.639296187683285, | |
| "grad_norm": 0.9750006795393742, | |
| "learning_rate": 2.4817513883107762e-05, | |
| "loss": 0.1169, | |
| "mean_token_accuracy": 0.963016502559185, | |
| "step": 793 | |
| }, | |
| { | |
| "epoch": 4.645161290322581, | |
| "grad_norm": 0.6784007500449948, | |
| "learning_rate": 2.4783638798315822e-05, | |
| "loss": 0.0879, | |
| "mean_token_accuracy": 0.9738304242491722, | |
| "step": 794 | |
| }, | |
| { | |
| "epoch": 4.651026392961877, | |
| "grad_norm": 0.9884915387381938, | |
| "learning_rate": 2.4749753610039288e-05, | |
| "loss": 0.0928, | |
| "mean_token_accuracy": 0.9698395952582359, | |
| "step": 795 | |
| }, | |
| { | |
| "epoch": 4.656891495601173, | |
| "grad_norm": 0.804556017449245, | |
| "learning_rate": 2.4715858441267706e-05, | |
| "loss": 0.0949, | |
| "mean_token_accuracy": 0.9707165434956551, | |
| "step": 796 | |
| }, | |
| { | |
| "epoch": 4.6627565982404695, | |
| "grad_norm": 1.2020438772374247, | |
| "learning_rate": 2.4681953415026845e-05, | |
| "loss": 0.1252, | |
| "mean_token_accuracy": 0.9631582796573639, | |
| "step": 797 | |
| }, | |
| { | |
| "epoch": 4.668621700879766, | |
| "grad_norm": 0.8366726200311924, | |
| "learning_rate": 2.464803865437826e-05, | |
| "loss": 0.0954, | |
| "mean_token_accuracy": 0.9677169546484947, | |
| "step": 798 | |
| }, | |
| { | |
| "epoch": 4.674486803519062, | |
| "grad_norm": 1.0400561866526332, | |
| "learning_rate": 2.461411428241883e-05, | |
| "loss": 0.1256, | |
| "mean_token_accuracy": 0.964072935283184, | |
| "step": 799 | |
| }, | |
| { | |
| "epoch": 4.680351906158358, | |
| "grad_norm": 0.9623680214082836, | |
| "learning_rate": 2.4580180422280325e-05, | |
| "loss": 0.1042, | |
| "mean_token_accuracy": 0.9697307646274567, | |
| "step": 800 | |
| }, | |
| { | |
| "epoch": 4.686217008797654, | |
| "grad_norm": 0.9936707065124232, | |
| "learning_rate": 2.4546237197128955e-05, | |
| "loss": 0.1056, | |
| "mean_token_accuracy": 0.9692076668143272, | |
| "step": 801 | |
| }, | |
| { | |
| "epoch": 4.69208211143695, | |
| "grad_norm": 0.7547232442792623, | |
| "learning_rate": 2.451228473016492e-05, | |
| "loss": 0.1019, | |
| "mean_token_accuracy": 0.9720749333500862, | |
| "step": 802 | |
| }, | |
| { | |
| "epoch": 4.697947214076247, | |
| "grad_norm": 0.6539188611720734, | |
| "learning_rate": 2.447832314462196e-05, | |
| "loss": 0.0988, | |
| "mean_token_accuracy": 0.9708434194326401, | |
| "step": 803 | |
| }, | |
| { | |
| "epoch": 4.703812316715543, | |
| "grad_norm": 0.787149153305339, | |
| "learning_rate": 2.444435256376692e-05, | |
| "loss": 0.0969, | |
| "mean_token_accuracy": 0.9698523208498955, | |
| "step": 804 | |
| }, | |
| { | |
| "epoch": 4.709677419354839, | |
| "grad_norm": 0.9214253887479193, | |
| "learning_rate": 2.4410373110899278e-05, | |
| "loss": 0.0831, | |
| "mean_token_accuracy": 0.9752322733402252, | |
| "step": 805 | |
| }, | |
| { | |
| "epoch": 4.715542521994135, | |
| "grad_norm": 0.96985990470589, | |
| "learning_rate": 2.4376384909350735e-05, | |
| "loss": 0.1133, | |
| "mean_token_accuracy": 0.9648006334900856, | |
| "step": 806 | |
| }, | |
| { | |
| "epoch": 4.721407624633431, | |
| "grad_norm": 0.7705243515793144, | |
| "learning_rate": 2.434238808248472e-05, | |
| "loss": 0.0932, | |
| "mean_token_accuracy": 0.9724510610103607, | |
| "step": 807 | |
| }, | |
| { | |
| "epoch": 4.7272727272727275, | |
| "grad_norm": 0.8385330995599455, | |
| "learning_rate": 2.4308382753696e-05, | |
| "loss": 0.101, | |
| "mean_token_accuracy": 0.9696679040789604, | |
| "step": 808 | |
| }, | |
| { | |
| "epoch": 4.733137829912024, | |
| "grad_norm": 0.9962089643314436, | |
| "learning_rate": 2.4274369046410183e-05, | |
| "loss": 0.1164, | |
| "mean_token_accuracy": 0.9683369919657707, | |
| "step": 809 | |
| }, | |
| { | |
| "epoch": 4.73900293255132, | |
| "grad_norm": 0.7772156088138843, | |
| "learning_rate": 2.4240347084083284e-05, | |
| "loss": 0.096, | |
| "mean_token_accuracy": 0.9717853516340256, | |
| "step": 810 | |
| }, | |
| { | |
| "epoch": 4.744868035190616, | |
| "grad_norm": 3.9377209302415968, | |
| "learning_rate": 2.4206316990201288e-05, | |
| "loss": 0.1192, | |
| "mean_token_accuracy": 0.9630280062556267, | |
| "step": 811 | |
| }, | |
| { | |
| "epoch": 4.750733137829912, | |
| "grad_norm": 1.052866557087443, | |
| "learning_rate": 2.4172278888279686e-05, | |
| "loss": 0.1171, | |
| "mean_token_accuracy": 0.9658652395009995, | |
| "step": 812 | |
| }, | |
| { | |
| "epoch": 4.756598240469208, | |
| "grad_norm": 0.8332989638125091, | |
| "learning_rate": 2.4138232901863053e-05, | |
| "loss": 0.1067, | |
| "mean_token_accuracy": 0.9677643477916718, | |
| "step": 813 | |
| }, | |
| { | |
| "epoch": 4.762463343108505, | |
| "grad_norm": 0.7676532634589678, | |
| "learning_rate": 2.4104179154524557e-05, | |
| "loss": 0.0861, | |
| "mean_token_accuracy": 0.9748517572879791, | |
| "step": 814 | |
| }, | |
| { | |
| "epoch": 4.768328445747801, | |
| "grad_norm": 0.7491375439484187, | |
| "learning_rate": 2.4070117769865554e-05, | |
| "loss": 0.0935, | |
| "mean_token_accuracy": 0.9721824452280998, | |
| "step": 815 | |
| }, | |
| { | |
| "epoch": 4.774193548387097, | |
| "grad_norm": 0.7082397718002087, | |
| "learning_rate": 2.403604887151512e-05, | |
| "loss": 0.0986, | |
| "mean_token_accuracy": 0.9698486328125, | |
| "step": 816 | |
| }, | |
| { | |
| "epoch": 4.780058651026393, | |
| "grad_norm": 0.8602958678227502, | |
| "learning_rate": 2.400197258312959e-05, | |
| "loss": 0.0968, | |
| "mean_token_accuracy": 0.9724563658237457, | |
| "step": 817 | |
| }, | |
| { | |
| "epoch": 4.785923753665689, | |
| "grad_norm": 0.7826309437388894, | |
| "learning_rate": 2.3967889028392115e-05, | |
| "loss": 0.0854, | |
| "mean_token_accuracy": 0.9723799675703049, | |
| "step": 818 | |
| }, | |
| { | |
| "epoch": 4.7917888563049855, | |
| "grad_norm": 0.8084978346561186, | |
| "learning_rate": 2.3933798331012255e-05, | |
| "loss": 0.1024, | |
| "mean_token_accuracy": 0.9672785773873329, | |
| "step": 819 | |
| }, | |
| { | |
| "epoch": 4.797653958944282, | |
| "grad_norm": 0.8416916669164977, | |
| "learning_rate": 2.3899700614725458e-05, | |
| "loss": 0.1021, | |
| "mean_token_accuracy": 0.9661071300506592, | |
| "step": 820 | |
| }, | |
| { | |
| "epoch": 4.803519061583578, | |
| "grad_norm": 0.8279382705506013, | |
| "learning_rate": 2.3865596003292674e-05, | |
| "loss": 0.101, | |
| "mean_token_accuracy": 0.9684831723570824, | |
| "step": 821 | |
| }, | |
| { | |
| "epoch": 4.809384164222874, | |
| "grad_norm": 0.8786095068042057, | |
| "learning_rate": 2.3831484620499867e-05, | |
| "loss": 0.1074, | |
| "mean_token_accuracy": 0.9685612991452217, | |
| "step": 822 | |
| }, | |
| { | |
| "epoch": 4.81524926686217, | |
| "grad_norm": 1.0040505276363547, | |
| "learning_rate": 2.3797366590157565e-05, | |
| "loss": 0.1195, | |
| "mean_token_accuracy": 0.9612911120057106, | |
| "step": 823 | |
| }, | |
| { | |
| "epoch": 4.821114369501466, | |
| "grad_norm": 0.8072349724088913, | |
| "learning_rate": 2.3763242036100457e-05, | |
| "loss": 0.0987, | |
| "mean_token_accuracy": 0.9705159142613411, | |
| "step": 824 | |
| }, | |
| { | |
| "epoch": 4.826979472140763, | |
| "grad_norm": 0.9064708195199968, | |
| "learning_rate": 2.372911108218688e-05, | |
| "loss": 0.1018, | |
| "mean_token_accuracy": 0.9692626893520355, | |
| "step": 825 | |
| }, | |
| { | |
| "epoch": 4.832844574780059, | |
| "grad_norm": 1.0817212148035031, | |
| "learning_rate": 2.3694973852298425e-05, | |
| "loss": 0.1185, | |
| "mean_token_accuracy": 0.9651465937495232, | |
| "step": 826 | |
| }, | |
| { | |
| "epoch": 4.838709677419355, | |
| "grad_norm": 0.8543951865046087, | |
| "learning_rate": 2.3660830470339436e-05, | |
| "loss": 0.1029, | |
| "mean_token_accuracy": 0.9690027311444283, | |
| "step": 827 | |
| }, | |
| { | |
| "epoch": 4.844574780058651, | |
| "grad_norm": 0.6162780385937312, | |
| "learning_rate": 2.362668106023661e-05, | |
| "loss": 0.0916, | |
| "mean_token_accuracy": 0.9742050394415855, | |
| "step": 828 | |
| }, | |
| { | |
| "epoch": 4.850439882697947, | |
| "grad_norm": 0.9398325918918323, | |
| "learning_rate": 2.3592525745938515e-05, | |
| "loss": 0.1007, | |
| "mean_token_accuracy": 0.9716757461428642, | |
| "step": 829 | |
| }, | |
| { | |
| "epoch": 4.8563049853372435, | |
| "grad_norm": 0.7602257778379286, | |
| "learning_rate": 2.355836465141513e-05, | |
| "loss": 0.0794, | |
| "mean_token_accuracy": 0.9756976217031479, | |
| "step": 830 | |
| }, | |
| { | |
| "epoch": 4.86217008797654, | |
| "grad_norm": 0.9022011629225589, | |
| "learning_rate": 2.3524197900657447e-05, | |
| "loss": 0.1169, | |
| "mean_token_accuracy": 0.9684564545750618, | |
| "step": 831 | |
| }, | |
| { | |
| "epoch": 4.868035190615836, | |
| "grad_norm": 0.650860880701899, | |
| "learning_rate": 2.3490025617676966e-05, | |
| "loss": 0.0837, | |
| "mean_token_accuracy": 0.975691981613636, | |
| "step": 832 | |
| }, | |
| { | |
| "epoch": 4.873900293255132, | |
| "grad_norm": 0.8971676331873251, | |
| "learning_rate": 2.3455847926505283e-05, | |
| "loss": 0.1129, | |
| "mean_token_accuracy": 0.9650413244962692, | |
| "step": 833 | |
| }, | |
| { | |
| "epoch": 4.879765395894428, | |
| "grad_norm": 0.937427582210325, | |
| "learning_rate": 2.3421664951193596e-05, | |
| "loss": 0.1098, | |
| "mean_token_accuracy": 0.9684758335351944, | |
| "step": 834 | |
| }, | |
| { | |
| "epoch": 4.885630498533724, | |
| "grad_norm": 0.8814505193033745, | |
| "learning_rate": 2.3387476815812313e-05, | |
| "loss": 0.1032, | |
| "mean_token_accuracy": 0.9702811613678932, | |
| "step": 835 | |
| }, | |
| { | |
| "epoch": 4.891495601173021, | |
| "grad_norm": 0.8136347217934341, | |
| "learning_rate": 2.3353283644450556e-05, | |
| "loss": 0.1049, | |
| "mean_token_accuracy": 0.9681308940052986, | |
| "step": 836 | |
| }, | |
| { | |
| "epoch": 4.897360703812317, | |
| "grad_norm": 1.0361749212065177, | |
| "learning_rate": 2.3319085561215724e-05, | |
| "loss": 0.1088, | |
| "mean_token_accuracy": 0.9681940153241158, | |
| "step": 837 | |
| }, | |
| { | |
| "epoch": 4.903225806451613, | |
| "grad_norm": 0.7246651123258426, | |
| "learning_rate": 2.328488269023305e-05, | |
| "loss": 0.0938, | |
| "mean_token_accuracy": 0.9740229845046997, | |
| "step": 838 | |
| }, | |
| { | |
| "epoch": 4.909090909090909, | |
| "grad_norm": 0.9582355610327213, | |
| "learning_rate": 2.3250675155645136e-05, | |
| "loss": 0.0994, | |
| "mean_token_accuracy": 0.9708064943552017, | |
| "step": 839 | |
| }, | |
| { | |
| "epoch": 4.914956011730205, | |
| "grad_norm": 0.8597667767361529, | |
| "learning_rate": 2.3216463081611525e-05, | |
| "loss": 0.0939, | |
| "mean_token_accuracy": 0.9721712917089462, | |
| "step": 840 | |
| }, | |
| { | |
| "epoch": 4.9208211143695015, | |
| "grad_norm": 0.9203287999724475, | |
| "learning_rate": 2.3182246592308235e-05, | |
| "loss": 0.1134, | |
| "mean_token_accuracy": 0.9678708836436272, | |
| "step": 841 | |
| }, | |
| { | |
| "epoch": 4.926686217008798, | |
| "grad_norm": 0.7731015246866831, | |
| "learning_rate": 2.314802581192728e-05, | |
| "loss": 0.0909, | |
| "mean_token_accuracy": 0.9706169962882996, | |
| "step": 842 | |
| }, | |
| { | |
| "epoch": 4.932551319648094, | |
| "grad_norm": 0.9691449435397647, | |
| "learning_rate": 2.311380086467629e-05, | |
| "loss": 0.1138, | |
| "mean_token_accuracy": 0.9661426022648811, | |
| "step": 843 | |
| }, | |
| { | |
| "epoch": 4.93841642228739, | |
| "grad_norm": 0.72854018802925, | |
| "learning_rate": 2.3079571874778e-05, | |
| "loss": 0.1007, | |
| "mean_token_accuracy": 0.9667091220617294, | |
| "step": 844 | |
| }, | |
| { | |
| "epoch": 4.944281524926686, | |
| "grad_norm": 0.6600580029561475, | |
| "learning_rate": 2.304533896646981e-05, | |
| "loss": 0.083, | |
| "mean_token_accuracy": 0.9744493290781975, | |
| "step": 845 | |
| }, | |
| { | |
| "epoch": 4.9501466275659824, | |
| "grad_norm": 0.8077377461501789, | |
| "learning_rate": 2.3011102264003354e-05, | |
| "loss": 0.0992, | |
| "mean_token_accuracy": 0.9715530574321747, | |
| "step": 846 | |
| }, | |
| { | |
| "epoch": 4.956011730205279, | |
| "grad_norm": 0.8026305302127565, | |
| "learning_rate": 2.2976861891644045e-05, | |
| "loss": 0.0941, | |
| "mean_token_accuracy": 0.9704462736845016, | |
| "step": 847 | |
| }, | |
| { | |
| "epoch": 4.961876832844575, | |
| "grad_norm": 0.839509361357775, | |
| "learning_rate": 2.2942617973670596e-05, | |
| "loss": 0.0955, | |
| "mean_token_accuracy": 0.970968209207058, | |
| "step": 848 | |
| }, | |
| { | |
| "epoch": 4.967741935483871, | |
| "grad_norm": 1.0376899814733738, | |
| "learning_rate": 2.2908370634374603e-05, | |
| "loss": 0.1387, | |
| "mean_token_accuracy": 0.9611359685659409, | |
| "step": 849 | |
| }, | |
| { | |
| "epoch": 4.973607038123167, | |
| "grad_norm": 0.9686754058096747, | |
| "learning_rate": 2.287411999806007e-05, | |
| "loss": 0.1103, | |
| "mean_token_accuracy": 0.9722190052270889, | |
| "step": 850 | |
| }, | |
| { | |
| "epoch": 4.979472140762463, | |
| "grad_norm": 0.8862617626596047, | |
| "learning_rate": 2.2839866189042983e-05, | |
| "loss": 0.0936, | |
| "mean_token_accuracy": 0.9709000736474991, | |
| "step": 851 | |
| }, | |
| { | |
| "epoch": 4.9853372434017595, | |
| "grad_norm": 0.8477877839763255, | |
| "learning_rate": 2.2805609331650826e-05, | |
| "loss": 0.1113, | |
| "mean_token_accuracy": 0.9674248471856117, | |
| "step": 852 | |
| }, | |
| { | |
| "epoch": 4.991202346041056, | |
| "grad_norm": 0.8866585241499966, | |
| "learning_rate": 2.2771349550222158e-05, | |
| "loss": 0.1004, | |
| "mean_token_accuracy": 0.9717306420207024, | |
| "step": 853 | |
| }, | |
| { | |
| "epoch": 4.997067448680352, | |
| "grad_norm": 0.6095610289437057, | |
| "learning_rate": 2.273708696910616e-05, | |
| "loss": 0.0823, | |
| "mean_token_accuracy": 0.9722500741481781, | |
| "step": 854 | |
| }, | |
| { | |
| "epoch": 5.0, | |
| "grad_norm": 0.6095610289437057, | |
| "learning_rate": 2.2702821712662147e-05, | |
| "loss": 0.0798, | |
| "mean_token_accuracy": 0.9760673493146896, | |
| "step": 855 | |
| }, | |
| { | |
| "epoch": 5.005865102639296, | |
| "grad_norm": 0.8433148437880541, | |
| "learning_rate": 2.2668553905259168e-05, | |
| "loss": 0.0844, | |
| "mean_token_accuracy": 0.9737215414643288, | |
| "step": 856 | |
| }, | |
| { | |
| "epoch": 5.011730205278592, | |
| "grad_norm": 0.5159271864203083, | |
| "learning_rate": 2.2634283671275523e-05, | |
| "loss": 0.0741, | |
| "mean_token_accuracy": 0.977745771408081, | |
| "step": 857 | |
| }, | |
| { | |
| "epoch": 5.0175953079178885, | |
| "grad_norm": 0.6773169070419274, | |
| "learning_rate": 2.2600011135098323e-05, | |
| "loss": 0.0807, | |
| "mean_token_accuracy": 0.9754450246691704, | |
| "step": 858 | |
| }, | |
| { | |
| "epoch": 5.023460410557185, | |
| "grad_norm": 0.6566845303303651, | |
| "learning_rate": 2.2565736421123035e-05, | |
| "loss": 0.0987, | |
| "mean_token_accuracy": 0.973513200879097, | |
| "step": 859 | |
| }, | |
| { | |
| "epoch": 5.029325513196481, | |
| "grad_norm": 1.0725755922059608, | |
| "learning_rate": 2.253145965375302e-05, | |
| "loss": 0.1049, | |
| "mean_token_accuracy": 0.9684402048587799, | |
| "step": 860 | |
| }, | |
| { | |
| "epoch": 5.035190615835777, | |
| "grad_norm": 0.7950554728335751, | |
| "learning_rate": 2.2497180957399108e-05, | |
| "loss": 0.1012, | |
| "mean_token_accuracy": 0.9666916355490685, | |
| "step": 861 | |
| }, | |
| { | |
| "epoch": 5.041055718475073, | |
| "grad_norm": 0.8569776292433064, | |
| "learning_rate": 2.246290045647912e-05, | |
| "loss": 0.0788, | |
| "mean_token_accuracy": 0.97861647605896, | |
| "step": 862 | |
| }, | |
| { | |
| "epoch": 5.0469208211143695, | |
| "grad_norm": 0.6247190584519964, | |
| "learning_rate": 2.242861827541742e-05, | |
| "loss": 0.0758, | |
| "mean_token_accuracy": 0.9780908152461052, | |
| "step": 863 | |
| }, | |
| { | |
| "epoch": 5.052785923753666, | |
| "grad_norm": 0.7794311246553162, | |
| "learning_rate": 2.2394334538644494e-05, | |
| "loss": 0.08, | |
| "mean_token_accuracy": 0.9768287613987923, | |
| "step": 864 | |
| }, | |
| { | |
| "epoch": 5.058651026392962, | |
| "grad_norm": 0.8297165299310638, | |
| "learning_rate": 2.2360049370596454e-05, | |
| "loss": 0.095, | |
| "mean_token_accuracy": 0.9734896495938301, | |
| "step": 865 | |
| }, | |
| { | |
| "epoch": 5.064516129032258, | |
| "grad_norm": 0.7979594762098186, | |
| "learning_rate": 2.2325762895714616e-05, | |
| "loss": 0.087, | |
| "mean_token_accuracy": 0.9727722704410553, | |
| "step": 866 | |
| }, | |
| { | |
| "epoch": 5.070381231671554, | |
| "grad_norm": 0.6589844664319869, | |
| "learning_rate": 2.2291475238445033e-05, | |
| "loss": 0.0901, | |
| "mean_token_accuracy": 0.9736703634262085, | |
| "step": 867 | |
| }, | |
| { | |
| "epoch": 5.07624633431085, | |
| "grad_norm": 2.042890482434605, | |
| "learning_rate": 2.225718652323805e-05, | |
| "loss": 0.0853, | |
| "mean_token_accuracy": 0.9707973897457123, | |
| "step": 868 | |
| }, | |
| { | |
| "epoch": 5.0821114369501466, | |
| "grad_norm": 0.9022788360958364, | |
| "learning_rate": 2.2222896874547856e-05, | |
| "loss": 0.1093, | |
| "mean_token_accuracy": 0.97090233117342, | |
| "step": 869 | |
| }, | |
| { | |
| "epoch": 5.087976539589443, | |
| "grad_norm": 0.773657493300953, | |
| "learning_rate": 2.2188606416832035e-05, | |
| "loss": 0.0699, | |
| "mean_token_accuracy": 0.9799191579222679, | |
| "step": 870 | |
| }, | |
| { | |
| "epoch": 5.093841642228739, | |
| "grad_norm": 0.5823987344196742, | |
| "learning_rate": 2.2154315274551093e-05, | |
| "loss": 0.0934, | |
| "mean_token_accuracy": 0.9723178669810295, | |
| "step": 871 | |
| }, | |
| { | |
| "epoch": 5.099706744868035, | |
| "grad_norm": 0.6786086304917306, | |
| "learning_rate": 2.2120023572168026e-05, | |
| "loss": 0.0817, | |
| "mean_token_accuracy": 0.9762729480862617, | |
| "step": 872 | |
| }, | |
| { | |
| "epoch": 5.105571847507331, | |
| "grad_norm": 0.7089902588796277, | |
| "learning_rate": 2.208573143414787e-05, | |
| "loss": 0.079, | |
| "mean_token_accuracy": 0.9801476299762726, | |
| "step": 873 | |
| }, | |
| { | |
| "epoch": 5.1114369501466275, | |
| "grad_norm": 0.7326859159400232, | |
| "learning_rate": 2.2051438984957234e-05, | |
| "loss": 0.0888, | |
| "mean_token_accuracy": 0.9746009409427643, | |
| "step": 874 | |
| }, | |
| { | |
| "epoch": 5.117302052785924, | |
| "grad_norm": 0.6965082129251428, | |
| "learning_rate": 2.2017146349063855e-05, | |
| "loss": 0.0902, | |
| "mean_token_accuracy": 0.9732745662331581, | |
| "step": 875 | |
| }, | |
| { | |
| "epoch": 5.12316715542522, | |
| "grad_norm": 0.6205158598733057, | |
| "learning_rate": 2.1982853650936154e-05, | |
| "loss": 0.0846, | |
| "mean_token_accuracy": 0.9761428236961365, | |
| "step": 876 | |
| }, | |
| { | |
| "epoch": 5.129032258064516, | |
| "grad_norm": 0.6306034174676098, | |
| "learning_rate": 2.1948561015042772e-05, | |
| "loss": 0.089, | |
| "mean_token_accuracy": 0.9759569987654686, | |
| "step": 877 | |
| }, | |
| { | |
| "epoch": 5.134897360703812, | |
| "grad_norm": 0.7374044872193467, | |
| "learning_rate": 2.1914268565852134e-05, | |
| "loss": 0.1016, | |
| "mean_token_accuracy": 0.9703678116202354, | |
| "step": 878 | |
| }, | |
| { | |
| "epoch": 5.140762463343108, | |
| "grad_norm": 0.65191353602101, | |
| "learning_rate": 2.1879976427831983e-05, | |
| "loss": 0.0834, | |
| "mean_token_accuracy": 0.9758019149303436, | |
| "step": 879 | |
| }, | |
| { | |
| "epoch": 5.146627565982405, | |
| "grad_norm": 0.7506644384525571, | |
| "learning_rate": 2.1845684725448916e-05, | |
| "loss": 0.0956, | |
| "mean_token_accuracy": 0.9727787226438522, | |
| "step": 880 | |
| }, | |
| { | |
| "epoch": 5.152492668621701, | |
| "grad_norm": 0.7900929308120329, | |
| "learning_rate": 2.181139358316797e-05, | |
| "loss": 0.0857, | |
| "mean_token_accuracy": 0.9730308651924133, | |
| "step": 881 | |
| }, | |
| { | |
| "epoch": 5.158357771260997, | |
| "grad_norm": 0.6620605131204909, | |
| "learning_rate": 2.1777103125452146e-05, | |
| "loss": 0.0848, | |
| "mean_token_accuracy": 0.9756103083491325, | |
| "step": 882 | |
| }, | |
| { | |
| "epoch": 5.164222873900293, | |
| "grad_norm": 0.8667893086755721, | |
| "learning_rate": 2.1742813476761958e-05, | |
| "loss": 0.1017, | |
| "mean_token_accuracy": 0.9711831659078598, | |
| "step": 883 | |
| }, | |
| { | |
| "epoch": 5.170087976539589, | |
| "grad_norm": 1.7675263240094965, | |
| "learning_rate": 2.1708524761554973e-05, | |
| "loss": 0.0837, | |
| "mean_token_accuracy": 0.9736537113785744, | |
| "step": 884 | |
| }, | |
| { | |
| "epoch": 5.1759530791788855, | |
| "grad_norm": 0.5726645519349775, | |
| "learning_rate": 2.1674237104285393e-05, | |
| "loss": 0.0827, | |
| "mean_token_accuracy": 0.9737741276621819, | |
| "step": 885 | |
| }, | |
| { | |
| "epoch": 5.181818181818182, | |
| "grad_norm": 0.7145760041095544, | |
| "learning_rate": 2.1639950629403552e-05, | |
| "loss": 0.0766, | |
| "mean_token_accuracy": 0.9776047691702843, | |
| "step": 886 | |
| }, | |
| { | |
| "epoch": 5.187683284457478, | |
| "grad_norm": 0.8323224710208923, | |
| "learning_rate": 2.1605665461355515e-05, | |
| "loss": 0.0865, | |
| "mean_token_accuracy": 0.975310780107975, | |
| "step": 887 | |
| }, | |
| { | |
| "epoch": 5.193548387096774, | |
| "grad_norm": 0.7231739554172291, | |
| "learning_rate": 2.1571381724582588e-05, | |
| "loss": 0.0929, | |
| "mean_token_accuracy": 0.9715317115187645, | |
| "step": 888 | |
| }, | |
| { | |
| "epoch": 5.19941348973607, | |
| "grad_norm": 0.909867276658091, | |
| "learning_rate": 2.153709954352089e-05, | |
| "loss": 0.0844, | |
| "mean_token_accuracy": 0.9758851453661919, | |
| "step": 889 | |
| }, | |
| { | |
| "epoch": 5.205278592375366, | |
| "grad_norm": 0.8188932670265725, | |
| "learning_rate": 2.15028190426009e-05, | |
| "loss": 0.0918, | |
| "mean_token_accuracy": 0.9724317267537117, | |
| "step": 890 | |
| }, | |
| { | |
| "epoch": 5.211143695014663, | |
| "grad_norm": 0.5670047382739889, | |
| "learning_rate": 2.1468540346246986e-05, | |
| "loss": 0.0774, | |
| "mean_token_accuracy": 0.9727488458156586, | |
| "step": 891 | |
| }, | |
| { | |
| "epoch": 5.217008797653959, | |
| "grad_norm": 0.9745874503233469, | |
| "learning_rate": 2.143426357887697e-05, | |
| "loss": 0.0982, | |
| "mean_token_accuracy": 0.9717305302619934, | |
| "step": 892 | |
| }, | |
| { | |
| "epoch": 5.222873900293255, | |
| "grad_norm": 0.7154601998505632, | |
| "learning_rate": 2.139998886490169e-05, | |
| "loss": 0.0682, | |
| "mean_token_accuracy": 0.9806524217128754, | |
| "step": 893 | |
| }, | |
| { | |
| "epoch": 5.228739002932551, | |
| "grad_norm": 0.5296720463890491, | |
| "learning_rate": 2.136571632872449e-05, | |
| "loss": 0.0877, | |
| "mean_token_accuracy": 0.9748145341873169, | |
| "step": 894 | |
| }, | |
| { | |
| "epoch": 5.234604105571847, | |
| "grad_norm": 0.8692292566562774, | |
| "learning_rate": 2.1331446094740845e-05, | |
| "loss": 0.1122, | |
| "mean_token_accuracy": 0.9713733717799187, | |
| "step": 895 | |
| }, | |
| { | |
| "epoch": 5.2404692082111435, | |
| "grad_norm": 0.9200261999499213, | |
| "learning_rate": 2.1297178287337865e-05, | |
| "loss": 0.1022, | |
| "mean_token_accuracy": 0.9730254113674164, | |
| "step": 896 | |
| }, | |
| { | |
| "epoch": 5.24633431085044, | |
| "grad_norm": 0.8792886399633073, | |
| "learning_rate": 2.1262913030893855e-05, | |
| "loss": 0.103, | |
| "mean_token_accuracy": 0.9704674929380417, | |
| "step": 897 | |
| }, | |
| { | |
| "epoch": 5.252199413489736, | |
| "grad_norm": 0.9771077901678256, | |
| "learning_rate": 2.1228650449777848e-05, | |
| "loss": 0.0925, | |
| "mean_token_accuracy": 0.972001887857914, | |
| "step": 898 | |
| }, | |
| { | |
| "epoch": 5.258064516129032, | |
| "grad_norm": 0.7113805154464254, | |
| "learning_rate": 2.1194390668349186e-05, | |
| "loss": 0.0945, | |
| "mean_token_accuracy": 0.9730637967586517, | |
| "step": 899 | |
| }, | |
| { | |
| "epoch": 5.263929618768328, | |
| "grad_norm": 0.7719540492961735, | |
| "learning_rate": 2.116013381095703e-05, | |
| "loss": 0.0844, | |
| "mean_token_accuracy": 0.9765499979257584, | |
| "step": 900 | |
| }, | |
| { | |
| "epoch": 5.269794721407624, | |
| "grad_norm": 0.5930886595639958, | |
| "learning_rate": 2.112588000193994e-05, | |
| "loss": 0.0775, | |
| "mean_token_accuracy": 0.9765195026993752, | |
| "step": 901 | |
| }, | |
| { | |
| "epoch": 5.275659824046921, | |
| "grad_norm": 0.823286226023431, | |
| "learning_rate": 2.1091629365625403e-05, | |
| "loss": 0.0806, | |
| "mean_token_accuracy": 0.9732163697481155, | |
| "step": 902 | |
| }, | |
| { | |
| "epoch": 5.281524926686217, | |
| "grad_norm": 0.9117273368025238, | |
| "learning_rate": 2.105738202632941e-05, | |
| "loss": 0.0965, | |
| "mean_token_accuracy": 0.973268635571003, | |
| "step": 903 | |
| }, | |
| { | |
| "epoch": 5.287390029325513, | |
| "grad_norm": 0.6499800912484074, | |
| "learning_rate": 2.1023138108355957e-05, | |
| "loss": 0.0761, | |
| "mean_token_accuracy": 0.9788585156202316, | |
| "step": 904 | |
| }, | |
| { | |
| "epoch": 5.293255131964809, | |
| "grad_norm": 0.8350311296266436, | |
| "learning_rate": 2.098889773599665e-05, | |
| "loss": 0.1091, | |
| "mean_token_accuracy": 0.9673629701137543, | |
| "step": 905 | |
| }, | |
| { | |
| "epoch": 5.299120234604105, | |
| "grad_norm": 0.8758013377718885, | |
| "learning_rate": 2.0954661033530193e-05, | |
| "loss": 0.0832, | |
| "mean_token_accuracy": 0.9746165573596954, | |
| "step": 906 | |
| }, | |
| { | |
| "epoch": 5.3049853372434015, | |
| "grad_norm": 0.7318638597492443, | |
| "learning_rate": 2.0920428125222004e-05, | |
| "loss": 0.0881, | |
| "mean_token_accuracy": 0.9729284644126892, | |
| "step": 907 | |
| }, | |
| { | |
| "epoch": 5.310850439882698, | |
| "grad_norm": 0.5207542607039044, | |
| "learning_rate": 2.0886199135323712e-05, | |
| "loss": 0.0907, | |
| "mean_token_accuracy": 0.9727116152644157, | |
| "step": 908 | |
| }, | |
| { | |
| "epoch": 5.316715542521994, | |
| "grad_norm": 0.8002194745455268, | |
| "learning_rate": 2.085197418807272e-05, | |
| "loss": 0.0903, | |
| "mean_token_accuracy": 0.9738972410559654, | |
| "step": 909 | |
| }, | |
| { | |
| "epoch": 5.32258064516129, | |
| "grad_norm": 0.7003191473454057, | |
| "learning_rate": 2.0817753407691774e-05, | |
| "loss": 0.0904, | |
| "mean_token_accuracy": 0.9728701040148735, | |
| "step": 910 | |
| }, | |
| { | |
| "epoch": 5.328445747800586, | |
| "grad_norm": 0.6997923911492139, | |
| "learning_rate": 2.0783536918388477e-05, | |
| "loss": 0.1048, | |
| "mean_token_accuracy": 0.9690842255949974, | |
| "step": 911 | |
| }, | |
| { | |
| "epoch": 5.334310850439882, | |
| "grad_norm": 1.573742684312648, | |
| "learning_rate": 2.0749324844354867e-05, | |
| "loss": 0.091, | |
| "mean_token_accuracy": 0.9728436172008514, | |
| "step": 912 | |
| }, | |
| { | |
| "epoch": 5.340175953079179, | |
| "grad_norm": 0.6744667083171827, | |
| "learning_rate": 2.0715117309766953e-05, | |
| "loss": 0.0798, | |
| "mean_token_accuracy": 0.9753686562180519, | |
| "step": 913 | |
| }, | |
| { | |
| "epoch": 5.346041055718475, | |
| "grad_norm": 0.5591477289746339, | |
| "learning_rate": 2.068091443878428e-05, | |
| "loss": 0.0939, | |
| "mean_token_accuracy": 0.9701094627380371, | |
| "step": 914 | |
| }, | |
| { | |
| "epoch": 5.351906158357771, | |
| "grad_norm": 0.8062620386979885, | |
| "learning_rate": 2.064671635554945e-05, | |
| "loss": 0.104, | |
| "mean_token_accuracy": 0.9726444035768509, | |
| "step": 915 | |
| }, | |
| { | |
| "epoch": 5.357771260997067, | |
| "grad_norm": 0.7825370740045777, | |
| "learning_rate": 2.0612523184187693e-05, | |
| "loss": 0.0815, | |
| "mean_token_accuracy": 0.9739454016089439, | |
| "step": 916 | |
| }, | |
| { | |
| "epoch": 5.363636363636363, | |
| "grad_norm": 0.5988992551643588, | |
| "learning_rate": 2.057833504880641e-05, | |
| "loss": 0.0945, | |
| "mean_token_accuracy": 0.9690568298101425, | |
| "step": 917 | |
| }, | |
| { | |
| "epoch": 5.3695014662756595, | |
| "grad_norm": 0.8655470524535265, | |
| "learning_rate": 2.054415207349473e-05, | |
| "loss": 0.1016, | |
| "mean_token_accuracy": 0.9704934805631638, | |
| "step": 918 | |
| }, | |
| { | |
| "epoch": 5.375366568914956, | |
| "grad_norm": 0.6363959385625477, | |
| "learning_rate": 2.0509974382323043e-05, | |
| "loss": 0.0866, | |
| "mean_token_accuracy": 0.9745766744017601, | |
| "step": 919 | |
| }, | |
| { | |
| "epoch": 5.381231671554252, | |
| "grad_norm": 0.6258477209760515, | |
| "learning_rate": 2.047580209934256e-05, | |
| "loss": 0.0846, | |
| "mean_token_accuracy": 0.9737055748701096, | |
| "step": 920 | |
| }, | |
| { | |
| "epoch": 5.387096774193548, | |
| "grad_norm": 0.6613535414703848, | |
| "learning_rate": 2.0441635348584876e-05, | |
| "loss": 0.0881, | |
| "mean_token_accuracy": 0.9749500304460526, | |
| "step": 921 | |
| }, | |
| { | |
| "epoch": 5.392961876832844, | |
| "grad_norm": 0.7602735845477481, | |
| "learning_rate": 2.0407474254061498e-05, | |
| "loss": 0.1114, | |
| "mean_token_accuracy": 0.9694864973425865, | |
| "step": 922 | |
| }, | |
| { | |
| "epoch": 5.39882697947214, | |
| "grad_norm": 0.626037285164579, | |
| "learning_rate": 2.0373318939763397e-05, | |
| "loss": 0.0877, | |
| "mean_token_accuracy": 0.972472071647644, | |
| "step": 923 | |
| }, | |
| { | |
| "epoch": 5.404692082111437, | |
| "grad_norm": 0.9112477803463636, | |
| "learning_rate": 2.033916952966057e-05, | |
| "loss": 0.0858, | |
| "mean_token_accuracy": 0.9744387790560722, | |
| "step": 924 | |
| }, | |
| { | |
| "epoch": 5.410557184750733, | |
| "grad_norm": 0.7421628592141191, | |
| "learning_rate": 2.0305026147701584e-05, | |
| "loss": 0.0863, | |
| "mean_token_accuracy": 0.9712925776839256, | |
| "step": 925 | |
| }, | |
| { | |
| "epoch": 5.416422287390029, | |
| "grad_norm": 0.6515695297727665, | |
| "learning_rate": 2.0270888917813124e-05, | |
| "loss": 0.0776, | |
| "mean_token_accuracy": 0.9756393432617188, | |
| "step": 926 | |
| }, | |
| { | |
| "epoch": 5.422287390029325, | |
| "grad_norm": 0.8628434185290488, | |
| "learning_rate": 2.0236757963899548e-05, | |
| "loss": 0.0984, | |
| "mean_token_accuracy": 0.973811186850071, | |
| "step": 927 | |
| }, | |
| { | |
| "epoch": 5.428152492668621, | |
| "grad_norm": 1.1299749301147368, | |
| "learning_rate": 2.020263340984244e-05, | |
| "loss": 0.0918, | |
| "mean_token_accuracy": 0.9751879572868347, | |
| "step": 928 | |
| }, | |
| { | |
| "epoch": 5.4340175953079175, | |
| "grad_norm": 0.6428804437106626, | |
| "learning_rate": 2.0168515379500145e-05, | |
| "loss": 0.0828, | |
| "mean_token_accuracy": 0.9709775000810623, | |
| "step": 929 | |
| }, | |
| { | |
| "epoch": 5.439882697947214, | |
| "grad_norm": 0.7470209883447461, | |
| "learning_rate": 2.0134403996707338e-05, | |
| "loss": 0.0806, | |
| "mean_token_accuracy": 0.9749180600047112, | |
| "step": 930 | |
| }, | |
| { | |
| "epoch": 5.44574780058651, | |
| "grad_norm": 0.4642855476533981, | |
| "learning_rate": 2.0100299385274547e-05, | |
| "loss": 0.0902, | |
| "mean_token_accuracy": 0.9726503938436508, | |
| "step": 931 | |
| }, | |
| { | |
| "epoch": 5.451612903225806, | |
| "grad_norm": 0.7606298055914933, | |
| "learning_rate": 2.0066201668987757e-05, | |
| "loss": 0.0923, | |
| "mean_token_accuracy": 0.9702809303998947, | |
| "step": 932 | |
| }, | |
| { | |
| "epoch": 5.457478005865102, | |
| "grad_norm": 0.6008762056162931, | |
| "learning_rate": 2.0032110971607894e-05, | |
| "loss": 0.0887, | |
| "mean_token_accuracy": 0.9757914617657661, | |
| "step": 933 | |
| }, | |
| { | |
| "epoch": 5.463343108504398, | |
| "grad_norm": 0.6078060941097535, | |
| "learning_rate": 1.999802741687042e-05, | |
| "loss": 0.0931, | |
| "mean_token_accuracy": 0.9730753004550934, | |
| "step": 934 | |
| }, | |
| { | |
| "epoch": 5.469208211143695, | |
| "grad_norm": 0.5758717205478393, | |
| "learning_rate": 1.9963951128484886e-05, | |
| "loss": 0.0677, | |
| "mean_token_accuracy": 0.9766752049326897, | |
| "step": 935 | |
| }, | |
| { | |
| "epoch": 5.475073313782991, | |
| "grad_norm": 0.6785426496703266, | |
| "learning_rate": 1.9929882230134452e-05, | |
| "loss": 0.0877, | |
| "mean_token_accuracy": 0.9692973420023918, | |
| "step": 936 | |
| }, | |
| { | |
| "epoch": 5.480938416422287, | |
| "grad_norm": 0.728290541857711, | |
| "learning_rate": 1.9895820845475445e-05, | |
| "loss": 0.0969, | |
| "mean_token_accuracy": 0.9711208865046501, | |
| "step": 937 | |
| }, | |
| { | |
| "epoch": 5.486803519061583, | |
| "grad_norm": 0.6533361112946945, | |
| "learning_rate": 1.9861767098136956e-05, | |
| "loss": 0.0804, | |
| "mean_token_accuracy": 0.9775801599025726, | |
| "step": 938 | |
| }, | |
| { | |
| "epoch": 5.492668621700879, | |
| "grad_norm": 0.8201669009430897, | |
| "learning_rate": 1.982772111172032e-05, | |
| "loss": 0.0874, | |
| "mean_token_accuracy": 0.9733827859163284, | |
| "step": 939 | |
| }, | |
| { | |
| "epoch": 5.4985337243401755, | |
| "grad_norm": 0.5158612094205587, | |
| "learning_rate": 1.9793683009798718e-05, | |
| "loss": 0.0747, | |
| "mean_token_accuracy": 0.9773239716887474, | |
| "step": 940 | |
| }, | |
| { | |
| "epoch": 5.504398826979472, | |
| "grad_norm": 0.6241066832474991, | |
| "learning_rate": 1.975965291591672e-05, | |
| "loss": 0.0978, | |
| "mean_token_accuracy": 0.9694525748491287, | |
| "step": 941 | |
| }, | |
| { | |
| "epoch": 5.510263929618768, | |
| "grad_norm": 0.7158903642944286, | |
| "learning_rate": 1.9725630953589823e-05, | |
| "loss": 0.0896, | |
| "mean_token_accuracy": 0.9751381054520607, | |
| "step": 942 | |
| }, | |
| { | |
| "epoch": 5.516129032258064, | |
| "grad_norm": 0.6500428056572851, | |
| "learning_rate": 1.9691617246304007e-05, | |
| "loss": 0.0906, | |
| "mean_token_accuracy": 0.9697719290852547, | |
| "step": 943 | |
| }, | |
| { | |
| "epoch": 5.52199413489736, | |
| "grad_norm": 0.6757819265845151, | |
| "learning_rate": 1.9657611917515287e-05, | |
| "loss": 0.0946, | |
| "mean_token_accuracy": 0.9740458875894547, | |
| "step": 944 | |
| }, | |
| { | |
| "epoch": 5.527859237536656, | |
| "grad_norm": 0.6467851667757466, | |
| "learning_rate": 1.962361509064928e-05, | |
| "loss": 0.0779, | |
| "mean_token_accuracy": 0.9767004624009132, | |
| "step": 945 | |
| }, | |
| { | |
| "epoch": 5.533724340175953, | |
| "grad_norm": 0.6929353877478521, | |
| "learning_rate": 1.958962688910073e-05, | |
| "loss": 0.0772, | |
| "mean_token_accuracy": 0.9737014174461365, | |
| "step": 946 | |
| }, | |
| { | |
| "epoch": 5.539589442815249, | |
| "grad_norm": 0.554178700724065, | |
| "learning_rate": 1.9555647436233093e-05, | |
| "loss": 0.0833, | |
| "mean_token_accuracy": 0.9781376793980598, | |
| "step": 947 | |
| }, | |
| { | |
| "epoch": 5.545454545454545, | |
| "grad_norm": 0.731670261601833, | |
| "learning_rate": 1.9521676855378045e-05, | |
| "loss": 0.0813, | |
| "mean_token_accuracy": 0.977122388780117, | |
| "step": 948 | |
| }, | |
| { | |
| "epoch": 5.551319648093841, | |
| "grad_norm": 0.7428121682204718, | |
| "learning_rate": 1.9487715269835082e-05, | |
| "loss": 0.0821, | |
| "mean_token_accuracy": 0.9735124111175537, | |
| "step": 949 | |
| }, | |
| { | |
| "epoch": 5.557184750733137, | |
| "grad_norm": 0.5862393003846442, | |
| "learning_rate": 1.945376280287105e-05, | |
| "loss": 0.0907, | |
| "mean_token_accuracy": 0.9709196835756302, | |
| "step": 950 | |
| }, | |
| { | |
| "epoch": 5.563049853372434, | |
| "grad_norm": 0.7517377301766143, | |
| "learning_rate": 1.9419819577719684e-05, | |
| "loss": 0.0864, | |
| "mean_token_accuracy": 0.9729499071836472, | |
| "step": 951 | |
| }, | |
| { | |
| "epoch": 5.568914956011731, | |
| "grad_norm": 0.7630913812828717, | |
| "learning_rate": 1.9385885717581182e-05, | |
| "loss": 0.094, | |
| "mean_token_accuracy": 0.9722026437520981, | |
| "step": 952 | |
| }, | |
| { | |
| "epoch": 5.574780058651027, | |
| "grad_norm": 0.5631421776070452, | |
| "learning_rate": 1.935196134562175e-05, | |
| "loss": 0.0836, | |
| "mean_token_accuracy": 0.9755956828594208, | |
| "step": 953 | |
| }, | |
| { | |
| "epoch": 5.580645161290323, | |
| "grad_norm": 0.6601421450613268, | |
| "learning_rate": 1.931804658497316e-05, | |
| "loss": 0.086, | |
| "mean_token_accuracy": 0.9734816625714302, | |
| "step": 954 | |
| }, | |
| { | |
| "epoch": 5.586510263929619, | |
| "grad_norm": 0.7963802408293887, | |
| "learning_rate": 1.9284141558732296e-05, | |
| "loss": 0.0892, | |
| "mean_token_accuracy": 0.9736581519246101, | |
| "step": 955 | |
| }, | |
| { | |
| "epoch": 5.592375366568915, | |
| "grad_norm": 0.6925565662697808, | |
| "learning_rate": 1.925024638996071e-05, | |
| "loss": 0.0906, | |
| "mean_token_accuracy": 0.9748614057898521, | |
| "step": 956 | |
| }, | |
| { | |
| "epoch": 5.5982404692082115, | |
| "grad_norm": 0.4469096595424996, | |
| "learning_rate": 1.9216361201684174e-05, | |
| "loss": 0.0767, | |
| "mean_token_accuracy": 0.9786844179034233, | |
| "step": 957 | |
| }, | |
| { | |
| "epoch": 5.604105571847508, | |
| "grad_norm": 0.6688127687388462, | |
| "learning_rate": 1.918248611689224e-05, | |
| "loss": 0.0805, | |
| "mean_token_accuracy": 0.9765229448676109, | |
| "step": 958 | |
| }, | |
| { | |
| "epoch": 5.609970674486804, | |
| "grad_norm": 0.6855039450437475, | |
| "learning_rate": 1.9148621258537782e-05, | |
| "loss": 0.0889, | |
| "mean_token_accuracy": 0.9715561494231224, | |
| "step": 959 | |
| }, | |
| { | |
| "epoch": 5.6158357771261, | |
| "grad_norm": 0.8037823314037765, | |
| "learning_rate": 1.911476674953656e-05, | |
| "loss": 0.0699, | |
| "mean_token_accuracy": 0.9763055369257927, | |
| "step": 960 | |
| }, | |
| { | |
| "epoch": 5.621700879765396, | |
| "grad_norm": 0.5073819720734495, | |
| "learning_rate": 1.9080922712766762e-05, | |
| "loss": 0.0911, | |
| "mean_token_accuracy": 0.9702242463827133, | |
| "step": 961 | |
| }, | |
| { | |
| "epoch": 5.627565982404692, | |
| "grad_norm": 0.5329313662136718, | |
| "learning_rate": 1.904708927106858e-05, | |
| "loss": 0.0897, | |
| "mean_token_accuracy": 0.9713274911046028, | |
| "step": 962 | |
| }, | |
| { | |
| "epoch": 5.633431085043989, | |
| "grad_norm": 0.6822929726272469, | |
| "learning_rate": 1.9013266547243742e-05, | |
| "loss": 0.0798, | |
| "mean_token_accuracy": 0.9777653217315674, | |
| "step": 963 | |
| }, | |
| { | |
| "epoch": 5.639296187683285, | |
| "grad_norm": 0.5806622127690888, | |
| "learning_rate": 1.8979454664055068e-05, | |
| "loss": 0.0857, | |
| "mean_token_accuracy": 0.9744703099131584, | |
| "step": 964 | |
| }, | |
| { | |
| "epoch": 5.645161290322581, | |
| "grad_norm": 0.8665695970177962, | |
| "learning_rate": 1.894565374422605e-05, | |
| "loss": 0.0774, | |
| "mean_token_accuracy": 0.9745295867323875, | |
| "step": 965 | |
| }, | |
| { | |
| "epoch": 5.651026392961877, | |
| "grad_norm": 0.6266246362289842, | |
| "learning_rate": 1.891186391044037e-05, | |
| "loss": 0.0947, | |
| "mean_token_accuracy": 0.9700672402977943, | |
| "step": 966 | |
| }, | |
| { | |
| "epoch": 5.656891495601173, | |
| "grad_norm": 0.7989074355855846, | |
| "learning_rate": 1.887808528534148e-05, | |
| "loss": 0.0873, | |
| "mean_token_accuracy": 0.9718985334038734, | |
| "step": 967 | |
| }, | |
| { | |
| "epoch": 5.6627565982404695, | |
| "grad_norm": 0.4353989783282138, | |
| "learning_rate": 1.884431799153214e-05, | |
| "loss": 0.0695, | |
| "mean_token_accuracy": 0.978706993162632, | |
| "step": 968 | |
| }, | |
| { | |
| "epoch": 5.668621700879766, | |
| "grad_norm": 0.8266065947284112, | |
| "learning_rate": 1.8810562151573993e-05, | |
| "loss": 0.0935, | |
| "mean_token_accuracy": 0.972965806722641, | |
| "step": 969 | |
| }, | |
| { | |
| "epoch": 5.674486803519062, | |
| "grad_norm": 0.6820629959510945, | |
| "learning_rate": 1.8776817887987105e-05, | |
| "loss": 0.0921, | |
| "mean_token_accuracy": 0.9734281525015831, | |
| "step": 970 | |
| }, | |
| { | |
| "epoch": 5.680351906158358, | |
| "grad_norm": 0.5623312424943822, | |
| "learning_rate": 1.8743085323249527e-05, | |
| "loss": 0.0889, | |
| "mean_token_accuracy": 0.9733134210109711, | |
| "step": 971 | |
| }, | |
| { | |
| "epoch": 5.686217008797654, | |
| "grad_norm": 0.5194482005569641, | |
| "learning_rate": 1.870936457979684e-05, | |
| "loss": 0.0908, | |
| "mean_token_accuracy": 0.9747659862041473, | |
| "step": 972 | |
| }, | |
| { | |
| "epoch": 5.69208211143695, | |
| "grad_norm": 0.5875433250388189, | |
| "learning_rate": 1.8675655780021733e-05, | |
| "loss": 0.0698, | |
| "mean_token_accuracy": 0.9777504205703735, | |
| "step": 973 | |
| }, | |
| { | |
| "epoch": 5.697947214076247, | |
| "grad_norm": 0.5039670849995016, | |
| "learning_rate": 1.8641959046273525e-05, | |
| "loss": 0.0922, | |
| "mean_token_accuracy": 0.9732639566063881, | |
| "step": 974 | |
| }, | |
| { | |
| "epoch": 5.703812316715543, | |
| "grad_norm": 0.6502810650481761, | |
| "learning_rate": 1.8608274500857756e-05, | |
| "loss": 0.0916, | |
| "mean_token_accuracy": 0.9737720489501953, | |
| "step": 975 | |
| }, | |
| { | |
| "epoch": 5.709677419354839, | |
| "grad_norm": 0.7462538481750892, | |
| "learning_rate": 1.8574602266035714e-05, | |
| "loss": 0.0705, | |
| "mean_token_accuracy": 0.9790600091218948, | |
| "step": 976 | |
| }, | |
| { | |
| "epoch": 5.715542521994135, | |
| "grad_norm": 0.6951076012758072, | |
| "learning_rate": 1.854094246402402e-05, | |
| "loss": 0.1029, | |
| "mean_token_accuracy": 0.9681782871484756, | |
| "step": 977 | |
| }, | |
| { | |
| "epoch": 5.721407624633431, | |
| "grad_norm": 0.7289075041674313, | |
| "learning_rate": 1.8507295216994162e-05, | |
| "loss": 0.0745, | |
| "mean_token_accuracy": 0.9782817512750626, | |
| "step": 978 | |
| }, | |
| { | |
| "epoch": 5.7272727272727275, | |
| "grad_norm": 0.6170230938347478, | |
| "learning_rate": 1.8473660647072053e-05, | |
| "loss": 0.0936, | |
| "mean_token_accuracy": 0.9697033986449242, | |
| "step": 979 | |
| }, | |
| { | |
| "epoch": 5.733137829912024, | |
| "grad_norm": 0.6286851176517133, | |
| "learning_rate": 1.8440038876337597e-05, | |
| "loss": 0.0738, | |
| "mean_token_accuracy": 0.9746864810585976, | |
| "step": 980 | |
| }, | |
| { | |
| "epoch": 5.73900293255132, | |
| "grad_norm": 0.6924654681455282, | |
| "learning_rate": 1.8406430026824252e-05, | |
| "loss": 0.0874, | |
| "mean_token_accuracy": 0.9719264730811119, | |
| "step": 981 | |
| }, | |
| { | |
| "epoch": 5.744868035190616, | |
| "grad_norm": 0.6386975637341502, | |
| "learning_rate": 1.837283422051855e-05, | |
| "loss": 0.0824, | |
| "mean_token_accuracy": 0.9745275229215622, | |
| "step": 982 | |
| }, | |
| { | |
| "epoch": 5.750733137829912, | |
| "grad_norm": 0.7673873081774819, | |
| "learning_rate": 1.8339251579359713e-05, | |
| "loss": 0.0899, | |
| "mean_token_accuracy": 0.9738566502928734, | |
| "step": 983 | |
| }, | |
| { | |
| "epoch": 5.756598240469208, | |
| "grad_norm": 0.5320830282669105, | |
| "learning_rate": 1.8305682225239167e-05, | |
| "loss": 0.0741, | |
| "mean_token_accuracy": 0.9771842882037163, | |
| "step": 984 | |
| }, | |
| { | |
| "epoch": 5.762463343108505, | |
| "grad_norm": 0.7839551637151186, | |
| "learning_rate": 1.8272126280000102e-05, | |
| "loss": 0.1027, | |
| "mean_token_accuracy": 0.9691510125994682, | |
| "step": 985 | |
| }, | |
| { | |
| "epoch": 5.768328445747801, | |
| "grad_norm": 0.659651301924526, | |
| "learning_rate": 1.823858386543705e-05, | |
| "loss": 0.0801, | |
| "mean_token_accuracy": 0.9774763435125351, | |
| "step": 986 | |
| }, | |
| { | |
| "epoch": 5.774193548387097, | |
| "grad_norm": 0.684337831256596, | |
| "learning_rate": 1.8205055103295434e-05, | |
| "loss": 0.0918, | |
| "mean_token_accuracy": 0.9704063758254051, | |
| "step": 987 | |
| }, | |
| { | |
| "epoch": 5.780058651026393, | |
| "grad_norm": 0.6780204534520761, | |
| "learning_rate": 1.8171540115271108e-05, | |
| "loss": 0.0966, | |
| "mean_token_accuracy": 0.9674130603671074, | |
| "step": 988 | |
| }, | |
| { | |
| "epoch": 5.785923753665689, | |
| "grad_norm": 0.7688483149907289, | |
| "learning_rate": 1.813803902300995e-05, | |
| "loss": 0.0899, | |
| "mean_token_accuracy": 0.9712028130888939, | |
| "step": 989 | |
| }, | |
| { | |
| "epoch": 5.7917888563049855, | |
| "grad_norm": 0.5787977118824297, | |
| "learning_rate": 1.8104551948107395e-05, | |
| "loss": 0.0877, | |
| "mean_token_accuracy": 0.9761911928653717, | |
| "step": 990 | |
| }, | |
| { | |
| "epoch": 5.797653958944282, | |
| "grad_norm": 0.7838858854358897, | |
| "learning_rate": 1.8071079012107997e-05, | |
| "loss": 0.0725, | |
| "mean_token_accuracy": 0.9778272584080696, | |
| "step": 991 | |
| }, | |
| { | |
| "epoch": 5.803519061583578, | |
| "grad_norm": 0.5204651103182804, | |
| "learning_rate": 1.8037620336504993e-05, | |
| "loss": 0.0884, | |
| "mean_token_accuracy": 0.9750376492738724, | |
| "step": 992 | |
| }, | |
| { | |
| "epoch": 5.809384164222874, | |
| "grad_norm": 0.6959107859348317, | |
| "learning_rate": 1.8004176042739877e-05, | |
| "loss": 0.0812, | |
| "mean_token_accuracy": 0.9772611781954765, | |
| "step": 993 | |
| }, | |
| { | |
| "epoch": 5.81524926686217, | |
| "grad_norm": 0.5986772400066928, | |
| "learning_rate": 1.797074625220191e-05, | |
| "loss": 0.0842, | |
| "mean_token_accuracy": 0.9771137833595276, | |
| "step": 994 | |
| }, | |
| { | |
| "epoch": 5.821114369501466, | |
| "grad_norm": 0.4693360660469766, | |
| "learning_rate": 1.7937331086227737e-05, | |
| "loss": 0.0889, | |
| "mean_token_accuracy": 0.9712493047118187, | |
| "step": 995 | |
| }, | |
| { | |
| "epoch": 5.826979472140763, | |
| "grad_norm": 0.7490861103636983, | |
| "learning_rate": 1.790393066610091e-05, | |
| "loss": 0.0952, | |
| "mean_token_accuracy": 0.971531830728054, | |
| "step": 996 | |
| }, | |
| { | |
| "epoch": 5.832844574780059, | |
| "grad_norm": 0.6288451112815866, | |
| "learning_rate": 1.787054511305148e-05, | |
| "loss": 0.0893, | |
| "mean_token_accuracy": 0.9741169288754463, | |
| "step": 997 | |
| }, | |
| { | |
| "epoch": 5.838709677419355, | |
| "grad_norm": 0.7702435889994751, | |
| "learning_rate": 1.7837174548255504e-05, | |
| "loss": 0.0948, | |
| "mean_token_accuracy": 0.9711467698216438, | |
| "step": 998 | |
| }, | |
| { | |
| "epoch": 5.844574780058651, | |
| "grad_norm": 0.5823123249879522, | |
| "learning_rate": 1.7803819092834668e-05, | |
| "loss": 0.0736, | |
| "mean_token_accuracy": 0.975670225918293, | |
| "step": 999 | |
| }, | |
| { | |
| "epoch": 5.850439882697947, | |
| "grad_norm": 0.6981130982253064, | |
| "learning_rate": 1.7770478867855797e-05, | |
| "loss": 0.084, | |
| "mean_token_accuracy": 0.9753496646881104, | |
| "step": 1000 | |
| }, | |
| { | |
| "epoch": 5.8563049853372435, | |
| "grad_norm": 0.7372940110063448, | |
| "learning_rate": 1.7737153994330437e-05, | |
| "loss": 0.1116, | |
| "mean_token_accuracy": 0.9673419818282127, | |
| "step": 1001 | |
| }, | |
| { | |
| "epoch": 5.86217008797654, | |
| "grad_norm": 0.8006525910133377, | |
| "learning_rate": 1.7703844593214427e-05, | |
| "loss": 0.081, | |
| "mean_token_accuracy": 0.974897563457489, | |
| "step": 1002 | |
| }, | |
| { | |
| "epoch": 5.868035190615836, | |
| "grad_norm": 0.5561798341071992, | |
| "learning_rate": 1.7670550785407444e-05, | |
| "loss": 0.0701, | |
| "mean_token_accuracy": 0.9756550714373589, | |
| "step": 1003 | |
| }, | |
| { | |
| "epoch": 5.873900293255132, | |
| "grad_norm": 0.5615991213783584, | |
| "learning_rate": 1.7637272691752548e-05, | |
| "loss": 0.0829, | |
| "mean_token_accuracy": 0.97202467918396, | |
| "step": 1004 | |
| }, | |
| { | |
| "epoch": 5.879765395894428, | |
| "grad_norm": 0.6399659767936104, | |
| "learning_rate": 1.7604010433035793e-05, | |
| "loss": 0.0895, | |
| "mean_token_accuracy": 0.9714025035500526, | |
| "step": 1005 | |
| }, | |
| { | |
| "epoch": 5.885630498533724, | |
| "grad_norm": 0.6345682852590906, | |
| "learning_rate": 1.7570764129985747e-05, | |
| "loss": 0.0829, | |
| "mean_token_accuracy": 0.9739578440785408, | |
| "step": 1006 | |
| }, | |
| { | |
| "epoch": 5.891495601173021, | |
| "grad_norm": 0.8095757391274871, | |
| "learning_rate": 1.7537533903273055e-05, | |
| "loss": 0.1017, | |
| "mean_token_accuracy": 0.9723379909992218, | |
| "step": 1007 | |
| }, | |
| { | |
| "epoch": 5.897360703812317, | |
| "grad_norm": 0.829159717201638, | |
| "learning_rate": 1.7504319873510014e-05, | |
| "loss": 0.1032, | |
| "mean_token_accuracy": 0.9734954759478569, | |
| "step": 1008 | |
| }, | |
| { | |
| "epoch": 5.903225806451613, | |
| "grad_norm": 0.6932420286406421, | |
| "learning_rate": 1.7471122161250153e-05, | |
| "loss": 0.0863, | |
| "mean_token_accuracy": 0.9713395535945892, | |
| "step": 1009 | |
| }, | |
| { | |
| "epoch": 5.909090909090909, | |
| "grad_norm": 0.7242637421768542, | |
| "learning_rate": 1.743794088698775e-05, | |
| "loss": 0.1025, | |
| "mean_token_accuracy": 0.9717629998922348, | |
| "step": 1010 | |
| }, | |
| { | |
| "epoch": 5.914956011730205, | |
| "grad_norm": 0.6812715069301856, | |
| "learning_rate": 1.7404776171157428e-05, | |
| "loss": 0.0901, | |
| "mean_token_accuracy": 0.9726547002792358, | |
| "step": 1011 | |
| }, | |
| { | |
| "epoch": 5.9208211143695015, | |
| "grad_norm": 0.47494390794003294, | |
| "learning_rate": 1.7371628134133716e-05, | |
| "loss": 0.1006, | |
| "mean_token_accuracy": 0.9703837037086487, | |
| "step": 1012 | |
| }, | |
| { | |
| "epoch": 5.926686217008798, | |
| "grad_norm": 0.7988053128511963, | |
| "learning_rate": 1.73384968962306e-05, | |
| "loss": 0.087, | |
| "mean_token_accuracy": 0.9710239991545677, | |
| "step": 1013 | |
| }, | |
| { | |
| "epoch": 5.932551319648094, | |
| "grad_norm": 0.7048215584079881, | |
| "learning_rate": 1.7305382577701088e-05, | |
| "loss": 0.0912, | |
| "mean_token_accuracy": 0.972258172929287, | |
| "step": 1014 | |
| }, | |
| { | |
| "epoch": 5.93841642228739, | |
| "grad_norm": 0.5987039942872129, | |
| "learning_rate": 1.7272285298736787e-05, | |
| "loss": 0.0812, | |
| "mean_token_accuracy": 0.9739760607481003, | |
| "step": 1015 | |
| }, | |
| { | |
| "epoch": 5.944281524926686, | |
| "grad_norm": 0.7669426180702338, | |
| "learning_rate": 1.7239205179467453e-05, | |
| "loss": 0.0928, | |
| "mean_token_accuracy": 0.9731807708740234, | |
| "step": 1016 | |
| }, | |
| { | |
| "epoch": 5.9501466275659824, | |
| "grad_norm": 0.8816952587467959, | |
| "learning_rate": 1.720614233996056e-05, | |
| "loss": 0.1119, | |
| "mean_token_accuracy": 0.9651398956775665, | |
| "step": 1017 | |
| }, | |
| { | |
| "epoch": 5.956011730205279, | |
| "grad_norm": 0.7904778378822598, | |
| "learning_rate": 1.7173096900220852e-05, | |
| "loss": 0.0862, | |
| "mean_token_accuracy": 0.9707507267594337, | |
| "step": 1018 | |
| }, | |
| { | |
| "epoch": 5.961876832844575, | |
| "grad_norm": 0.5993372916237711, | |
| "learning_rate": 1.7140068980189943e-05, | |
| "loss": 0.1027, | |
| "mean_token_accuracy": 0.9691429063677788, | |
| "step": 1019 | |
| }, | |
| { | |
| "epoch": 5.967741935483871, | |
| "grad_norm": 0.7894401534507262, | |
| "learning_rate": 1.710705869974583e-05, | |
| "loss": 0.0878, | |
| "mean_token_accuracy": 0.9715762436389923, | |
| "step": 1020 | |
| }, | |
| { | |
| "epoch": 5.973607038123167, | |
| "grad_norm": 0.5007290522624128, | |
| "learning_rate": 1.7074066178702512e-05, | |
| "loss": 0.0735, | |
| "mean_token_accuracy": 0.9772191122174263, | |
| "step": 1021 | |
| }, | |
| { | |
| "epoch": 5.979472140762463, | |
| "grad_norm": 1.0706418477291928, | |
| "learning_rate": 1.7041091536809506e-05, | |
| "loss": 0.0899, | |
| "mean_token_accuracy": 0.9734712392091751, | |
| "step": 1022 | |
| }, | |
| { | |
| "epoch": 5.9853372434017595, | |
| "grad_norm": 0.5132866545703823, | |
| "learning_rate": 1.7008134893751446e-05, | |
| "loss": 0.0774, | |
| "mean_token_accuracy": 0.9778529033064842, | |
| "step": 1023 | |
| }, | |
| { | |
| "epoch": 5.991202346041056, | |
| "grad_norm": 0.3934917191270626, | |
| "learning_rate": 1.697519636914765e-05, | |
| "loss": 0.0752, | |
| "mean_token_accuracy": 0.976367898285389, | |
| "step": 1024 | |
| }, | |
| { | |
| "epoch": 5.997067448680352, | |
| "grad_norm": 0.5893645990583302, | |
| "learning_rate": 1.6942276082551634e-05, | |
| "loss": 0.1045, | |
| "mean_token_accuracy": 0.968669667840004, | |
| "step": 1025 | |
| }, | |
| { | |
| "epoch": 6.0, | |
| "grad_norm": 1.3504099609374147, | |
| "learning_rate": 1.6909374153450762e-05, | |
| "loss": 0.098, | |
| "mean_token_accuracy": 0.9726896286010742, | |
| "step": 1026 | |
| }, | |
| { | |
| "epoch": 6.005865102639296, | |
| "grad_norm": 0.598228971963004, | |
| "learning_rate": 1.6876490701265736e-05, | |
| "loss": 0.0673, | |
| "mean_token_accuracy": 0.979034774005413, | |
| "step": 1027 | |
| }, | |
| { | |
| "epoch": 6.011730205278592, | |
| "grad_norm": 0.48280619668172964, | |
| "learning_rate": 1.684362584535022e-05, | |
| "loss": 0.0765, | |
| "mean_token_accuracy": 0.9782651886343956, | |
| "step": 1028 | |
| }, | |
| { | |
| "epoch": 6.0175953079178885, | |
| "grad_norm": 0.466644575352627, | |
| "learning_rate": 1.6810779704990358e-05, | |
| "loss": 0.0756, | |
| "mean_token_accuracy": 0.9769785478711128, | |
| "step": 1029 | |
| }, | |
| { | |
| "epoch": 6.023460410557185, | |
| "grad_norm": 0.47183688005778257, | |
| "learning_rate": 1.677795239940438e-05, | |
| "loss": 0.0641, | |
| "mean_token_accuracy": 0.9806296676397324, | |
| "step": 1030 | |
| }, | |
| { | |
| "epoch": 6.029325513196481, | |
| "grad_norm": 0.5718220069789671, | |
| "learning_rate": 1.674514404774214e-05, | |
| "loss": 0.0761, | |
| "mean_token_accuracy": 0.9777848049998283, | |
| "step": 1031 | |
| }, | |
| { | |
| "epoch": 6.035190615835777, | |
| "grad_norm": 0.5179785510961876, | |
| "learning_rate": 1.671235476908471e-05, | |
| "loss": 0.0743, | |
| "mean_token_accuracy": 0.9780596271157265, | |
| "step": 1032 | |
| }, | |
| { | |
| "epoch": 6.041055718475073, | |
| "grad_norm": 0.6142614536588886, | |
| "learning_rate": 1.6679584682443924e-05, | |
| "loss": 0.0792, | |
| "mean_token_accuracy": 0.97633446007967, | |
| "step": 1033 | |
| }, | |
| { | |
| "epoch": 6.0469208211143695, | |
| "grad_norm": 0.41142915451620954, | |
| "learning_rate": 1.6646833906761965e-05, | |
| "loss": 0.0689, | |
| "mean_token_accuracy": 0.9772084280848503, | |
| "step": 1034 | |
| }, | |
| { | |
| "epoch": 6.052785923753666, | |
| "grad_norm": 0.4847833155252248, | |
| "learning_rate": 1.661410256091092e-05, | |
| "loss": 0.0732, | |
| "mean_token_accuracy": 0.9788917377591133, | |
| "step": 1035 | |
| }, | |
| { | |
| "epoch": 6.058651026392962, | |
| "grad_norm": 0.6440065513831064, | |
| "learning_rate": 1.658139076369236e-05, | |
| "loss": 0.0795, | |
| "mean_token_accuracy": 0.9781024977564812, | |
| "step": 1036 | |
| }, | |
| { | |
| "epoch": 6.064516129032258, | |
| "grad_norm": 0.4806652284113589, | |
| "learning_rate": 1.6548698633836893e-05, | |
| "loss": 0.0711, | |
| "mean_token_accuracy": 0.9742519408464432, | |
| "step": 1037 | |
| }, | |
| { | |
| "epoch": 6.070381231671554, | |
| "grad_norm": 0.49812439107885775, | |
| "learning_rate": 1.6516026290003746e-05, | |
| "loss": 0.0665, | |
| "mean_token_accuracy": 0.9813807904720306, | |
| "step": 1038 | |
| }, | |
| { | |
| "epoch": 6.07624633431085, | |
| "grad_norm": 0.4946026191358958, | |
| "learning_rate": 1.6483373850780328e-05, | |
| "loss": 0.0709, | |
| "mean_token_accuracy": 0.9776654466986656, | |
| "step": 1039 | |
| }, | |
| { | |
| "epoch": 6.0821114369501466, | |
| "grad_norm": 0.38914184460055723, | |
| "learning_rate": 1.645074143468181e-05, | |
| "loss": 0.063, | |
| "mean_token_accuracy": 0.9806255549192429, | |
| "step": 1040 | |
| }, | |
| { | |
| "epoch": 6.087976539589443, | |
| "grad_norm": 0.6214256536736328, | |
| "learning_rate": 1.6418129160150692e-05, | |
| "loss": 0.078, | |
| "mean_token_accuracy": 0.9744777455925941, | |
| "step": 1041 | |
| }, | |
| { | |
| "epoch": 6.093841642228739, | |
| "grad_norm": 0.42727507511348983, | |
| "learning_rate": 1.6385537145556346e-05, | |
| "loss": 0.0657, | |
| "mean_token_accuracy": 0.9816567301750183, | |
| "step": 1042 | |
| }, | |
| { | |
| "epoch": 6.099706744868035, | |
| "grad_norm": 0.5862832205156056, | |
| "learning_rate": 1.6352965509194634e-05, | |
| "loss": 0.0673, | |
| "mean_token_accuracy": 0.9801078513264656, | |
| "step": 1043 | |
| }, | |
| { | |
| "epoch": 6.105571847507331, | |
| "grad_norm": 0.5540609521688267, | |
| "learning_rate": 1.6320414369287427e-05, | |
| "loss": 0.0694, | |
| "mean_token_accuracy": 0.9785455390810966, | |
| "step": 1044 | |
| }, | |
| { | |
| "epoch": 6.1114369501466275, | |
| "grad_norm": 0.5305278819763478, | |
| "learning_rate": 1.6287883843982223e-05, | |
| "loss": 0.0745, | |
| "mean_token_accuracy": 0.9784517213702202, | |
| "step": 1045 | |
| }, | |
| { | |
| "epoch": 6.117302052785924, | |
| "grad_norm": 0.5324605295420759, | |
| "learning_rate": 1.625537405135169e-05, | |
| "loss": 0.0883, | |
| "mean_token_accuracy": 0.9728142842650414, | |
| "step": 1046 | |
| }, | |
| { | |
| "epoch": 6.12316715542522, | |
| "grad_norm": 0.48156545050119387, | |
| "learning_rate": 1.622288510939325e-05, | |
| "loss": 0.0736, | |
| "mean_token_accuracy": 0.9753900468349457, | |
| "step": 1047 | |
| }, | |
| { | |
| "epoch": 6.129032258064516, | |
| "grad_norm": 0.5403475317147802, | |
| "learning_rate": 1.619041713602864e-05, | |
| "loss": 0.086, | |
| "mean_token_accuracy": 0.9756969884037971, | |
| "step": 1048 | |
| }, | |
| { | |
| "epoch": 6.134897360703812, | |
| "grad_norm": 0.6164994865776425, | |
| "learning_rate": 1.6157970249103484e-05, | |
| "loss": 0.0822, | |
| "mean_token_accuracy": 0.97352235019207, | |
| "step": 1049 | |
| }, | |
| { | |
| "epoch": 6.140762463343108, | |
| "grad_norm": 0.5479826751146541, | |
| "learning_rate": 1.612554456638688e-05, | |
| "loss": 0.0803, | |
| "mean_token_accuracy": 0.9752313643693924, | |
| "step": 1050 | |
| }, | |
| { | |
| "epoch": 6.146627565982405, | |
| "grad_norm": 0.8355035708152437, | |
| "learning_rate": 1.6093140205570962e-05, | |
| "loss": 0.1034, | |
| "mean_token_accuracy": 0.9721332043409348, | |
| "step": 1051 | |
| }, | |
| { | |
| "epoch": 6.152492668621701, | |
| "grad_norm": 0.5699260258595685, | |
| "learning_rate": 1.6060757284270474e-05, | |
| "loss": 0.088, | |
| "mean_token_accuracy": 0.9704829677939415, | |
| "step": 1052 | |
| }, | |
| { | |
| "epoch": 6.158357771260997, | |
| "grad_norm": 0.4016074967594273, | |
| "learning_rate": 1.6028395920022336e-05, | |
| "loss": 0.061, | |
| "mean_token_accuracy": 0.9768156260251999, | |
| "step": 1053 | |
| }, | |
| { | |
| "epoch": 6.164222873900293, | |
| "grad_norm": 0.6274851394247467, | |
| "learning_rate": 1.5996056230285237e-05, | |
| "loss": 0.0776, | |
| "mean_token_accuracy": 0.9752253219485283, | |
| "step": 1054 | |
| }, | |
| { | |
| "epoch": 6.170087976539589, | |
| "grad_norm": 0.41979905539903684, | |
| "learning_rate": 1.596373833243918e-05, | |
| "loss": 0.0758, | |
| "mean_token_accuracy": 0.9752494245767593, | |
| "step": 1055 | |
| }, | |
| { | |
| "epoch": 6.1759530791788855, | |
| "grad_norm": 0.5371249434282453, | |
| "learning_rate": 1.593144234378509e-05, | |
| "loss": 0.0814, | |
| "mean_token_accuracy": 0.9740007221698761, | |
| "step": 1056 | |
| }, | |
| { | |
| "epoch": 6.181818181818182, | |
| "grad_norm": 0.5958655099860268, | |
| "learning_rate": 1.5899168381544362e-05, | |
| "loss": 0.076, | |
| "mean_token_accuracy": 0.9766522198915482, | |
| "step": 1057 | |
| }, | |
| { | |
| "epoch": 6.187683284457478, | |
| "grad_norm": 0.5620879511590664, | |
| "learning_rate": 1.5866916562858444e-05, | |
| "loss": 0.0747, | |
| "mean_token_accuracy": 0.9761717393994331, | |
| "step": 1058 | |
| }, | |
| { | |
| "epoch": 6.193548387096774, | |
| "grad_norm": 0.41663290523753393, | |
| "learning_rate": 1.5834687004788406e-05, | |
| "loss": 0.0719, | |
| "mean_token_accuracy": 0.9762328043580055, | |
| "step": 1059 | |
| }, | |
| { | |
| "epoch": 6.19941348973607, | |
| "grad_norm": 0.45751013839163385, | |
| "learning_rate": 1.5802479824314537e-05, | |
| "loss": 0.0761, | |
| "mean_token_accuracy": 0.9730842038989067, | |
| "step": 1060 | |
| }, | |
| { | |
| "epoch": 6.205278592375366, | |
| "grad_norm": 0.6061591524965043, | |
| "learning_rate": 1.5770295138335896e-05, | |
| "loss": 0.0675, | |
| "mean_token_accuracy": 0.9795668348670006, | |
| "step": 1061 | |
| }, | |
| { | |
| "epoch": 6.211143695014663, | |
| "grad_norm": 0.48653084806168795, | |
| "learning_rate": 1.573813306366988e-05, | |
| "loss": 0.0669, | |
| "mean_token_accuracy": 0.9805929064750671, | |
| "step": 1062 | |
| }, | |
| { | |
| "epoch": 6.217008797653959, | |
| "grad_norm": 0.7918965125848585, | |
| "learning_rate": 1.5705993717051838e-05, | |
| "loss": 0.0966, | |
| "mean_token_accuracy": 0.9726722538471222, | |
| "step": 1063 | |
| }, | |
| { | |
| "epoch": 6.222873900293255, | |
| "grad_norm": 0.4360806881193703, | |
| "learning_rate": 1.567387721513462e-05, | |
| "loss": 0.0733, | |
| "mean_token_accuracy": 0.9755243211984634, | |
| "step": 1064 | |
| }, | |
| { | |
| "epoch": 6.228739002932551, | |
| "grad_norm": 0.4507922929333053, | |
| "learning_rate": 1.5641783674488155e-05, | |
| "loss": 0.0788, | |
| "mean_token_accuracy": 0.9779406636953354, | |
| "step": 1065 | |
| }, | |
| { | |
| "epoch": 6.234604105571847, | |
| "grad_norm": 0.524440368997054, | |
| "learning_rate": 1.5609713211599035e-05, | |
| "loss": 0.0886, | |
| "mean_token_accuracy": 0.9735754728317261, | |
| "step": 1066 | |
| }, | |
| { | |
| "epoch": 6.2404692082111435, | |
| "grad_norm": 0.4933573360395662, | |
| "learning_rate": 1.557766594287009e-05, | |
| "loss": 0.0814, | |
| "mean_token_accuracy": 0.9745327085256577, | |
| "step": 1067 | |
| }, | |
| { | |
| "epoch": 6.24633431085044, | |
| "grad_norm": 0.6374490569807274, | |
| "learning_rate": 1.554564198461996e-05, | |
| "loss": 0.1006, | |
| "mean_token_accuracy": 0.9676776975393295, | |
| "step": 1068 | |
| }, | |
| { | |
| "epoch": 6.252199413489736, | |
| "grad_norm": 0.503016875609656, | |
| "learning_rate": 1.5513641453082672e-05, | |
| "loss": 0.0743, | |
| "mean_token_accuracy": 0.9755501598119736, | |
| "step": 1069 | |
| }, | |
| { | |
| "epoch": 6.258064516129032, | |
| "grad_norm": 0.6725586143220211, | |
| "learning_rate": 1.5481664464407246e-05, | |
| "loss": 0.0786, | |
| "mean_token_accuracy": 0.9803177490830421, | |
| "step": 1070 | |
| }, | |
| { | |
| "epoch": 6.263929618768328, | |
| "grad_norm": 0.3845258686542439, | |
| "learning_rate": 1.5449711134657224e-05, | |
| "loss": 0.0814, | |
| "mean_token_accuracy": 0.9737701192498207, | |
| "step": 1071 | |
| }, | |
| { | |
| "epoch": 6.269794721407624, | |
| "grad_norm": 0.5801290016339792, | |
| "learning_rate": 1.5417781579810296e-05, | |
| "loss": 0.0854, | |
| "mean_token_accuracy": 0.9750698879361153, | |
| "step": 1072 | |
| }, | |
| { | |
| "epoch": 6.275659824046921, | |
| "grad_norm": 0.46851452988304226, | |
| "learning_rate": 1.5385875915757846e-05, | |
| "loss": 0.0688, | |
| "mean_token_accuracy": 0.9781253635883331, | |
| "step": 1073 | |
| }, | |
| { | |
| "epoch": 6.281524926686217, | |
| "grad_norm": 0.6003054181463484, | |
| "learning_rate": 1.535399425830456e-05, | |
| "loss": 0.0806, | |
| "mean_token_accuracy": 0.9752677977085114, | |
| "step": 1074 | |
| }, | |
| { | |
| "epoch": 6.287390029325513, | |
| "grad_norm": 0.4922352812959743, | |
| "learning_rate": 1.5322136723167957e-05, | |
| "loss": 0.074, | |
| "mean_token_accuracy": 0.9730022326111794, | |
| "step": 1075 | |
| }, | |
| { | |
| "epoch": 6.293255131964809, | |
| "grad_norm": 0.5503056897332197, | |
| "learning_rate": 1.5290303425978036e-05, | |
| "loss": 0.0742, | |
| "mean_token_accuracy": 0.978991761803627, | |
| "step": 1076 | |
| }, | |
| { | |
| "epoch": 6.299120234604105, | |
| "grad_norm": 0.48722504997778493, | |
| "learning_rate": 1.525849448227681e-05, | |
| "loss": 0.0808, | |
| "mean_token_accuracy": 0.9766300916671753, | |
| "step": 1077 | |
| }, | |
| { | |
| "epoch": 6.3049853372434015, | |
| "grad_norm": 0.5168974806482571, | |
| "learning_rate": 1.5226710007517894e-05, | |
| "loss": 0.0912, | |
| "mean_token_accuracy": 0.9702020585536957, | |
| "step": 1078 | |
| }, | |
| { | |
| "epoch": 6.310850439882698, | |
| "grad_norm": 0.427948023862911, | |
| "learning_rate": 1.5194950117066097e-05, | |
| "loss": 0.0674, | |
| "mean_token_accuracy": 0.9776515811681747, | |
| "step": 1079 | |
| }, | |
| { | |
| "epoch": 6.316715542521994, | |
| "grad_norm": 0.5486267408576772, | |
| "learning_rate": 1.5163214926196995e-05, | |
| "loss": 0.0935, | |
| "mean_token_accuracy": 0.97169990837574, | |
| "step": 1080 | |
| }, | |
| { | |
| "epoch": 6.32258064516129, | |
| "grad_norm": 0.4875960684561732, | |
| "learning_rate": 1.5131504550096515e-05, | |
| "loss": 0.079, | |
| "mean_token_accuracy": 0.9735133945941925, | |
| "step": 1081 | |
| }, | |
| { | |
| "epoch": 6.328445747800586, | |
| "grad_norm": 0.6244028161363354, | |
| "learning_rate": 1.5099819103860504e-05, | |
| "loss": 0.0699, | |
| "mean_token_accuracy": 0.9777925983071327, | |
| "step": 1082 | |
| }, | |
| { | |
| "epoch": 6.334310850439882, | |
| "grad_norm": 0.5481634686271034, | |
| "learning_rate": 1.5068158702494348e-05, | |
| "loss": 0.067, | |
| "mean_token_accuracy": 0.9792685136198997, | |
| "step": 1083 | |
| }, | |
| { | |
| "epoch": 6.340175953079179, | |
| "grad_norm": 0.37298288716380845, | |
| "learning_rate": 1.5036523460912511e-05, | |
| "loss": 0.0648, | |
| "mean_token_accuracy": 0.9814745262265205, | |
| "step": 1084 | |
| }, | |
| { | |
| "epoch": 6.346041055718475, | |
| "grad_norm": 0.36936094294826227, | |
| "learning_rate": 1.5004913493938147e-05, | |
| "loss": 0.07, | |
| "mean_token_accuracy": 0.9765683263540268, | |
| "step": 1085 | |
| }, | |
| { | |
| "epoch": 6.351906158357771, | |
| "grad_norm": 0.7399682764563025, | |
| "learning_rate": 1.4973328916302667e-05, | |
| "loss": 0.0947, | |
| "mean_token_accuracy": 0.9703186228871346, | |
| "step": 1086 | |
| }, | |
| { | |
| "epoch": 6.357771260997067, | |
| "grad_norm": 0.5453093945017612, | |
| "learning_rate": 1.4941769842645335e-05, | |
| "loss": 0.0831, | |
| "mean_token_accuracy": 0.9727587401866913, | |
| "step": 1087 | |
| }, | |
| { | |
| "epoch": 6.363636363636363, | |
| "grad_norm": 0.7601852801488714, | |
| "learning_rate": 1.4910236387512837e-05, | |
| "loss": 0.0773, | |
| "mean_token_accuracy": 0.975163146853447, | |
| "step": 1088 | |
| }, | |
| { | |
| "epoch": 6.3695014662756595, | |
| "grad_norm": 0.8980917715317281, | |
| "learning_rate": 1.487872866535888e-05, | |
| "loss": 0.0771, | |
| "mean_token_accuracy": 0.9759137481451035, | |
| "step": 1089 | |
| }, | |
| { | |
| "epoch": 6.375366568914956, | |
| "grad_norm": 0.5099633970627233, | |
| "learning_rate": 1.4847246790543773e-05, | |
| "loss": 0.075, | |
| "mean_token_accuracy": 0.9745916873216629, | |
| "step": 1090 | |
| }, | |
| { | |
| "epoch": 6.381231671554252, | |
| "grad_norm": 0.6323169570307837, | |
| "learning_rate": 1.4815790877334007e-05, | |
| "loss": 0.0798, | |
| "mean_token_accuracy": 0.9740422070026398, | |
| "step": 1091 | |
| }, | |
| { | |
| "epoch": 6.387096774193548, | |
| "grad_norm": 0.6403999041432936, | |
| "learning_rate": 1.4784361039901844e-05, | |
| "loss": 0.0865, | |
| "mean_token_accuracy": 0.9765808507800102, | |
| "step": 1092 | |
| }, | |
| { | |
| "epoch": 6.392961876832844, | |
| "grad_norm": 0.7462715070778873, | |
| "learning_rate": 1.47529573923249e-05, | |
| "loss": 0.0688, | |
| "mean_token_accuracy": 0.978026993572712, | |
| "step": 1093 | |
| }, | |
| { | |
| "epoch": 6.39882697947214, | |
| "grad_norm": 0.5694658566557079, | |
| "learning_rate": 1.472158004858573e-05, | |
| "loss": 0.075, | |
| "mean_token_accuracy": 0.9749011695384979, | |
| "step": 1094 | |
| }, | |
| { | |
| "epoch": 6.404692082111437, | |
| "grad_norm": 0.5450236136544335, | |
| "learning_rate": 1.4690229122571419e-05, | |
| "loss": 0.0929, | |
| "mean_token_accuracy": 0.9710717871785164, | |
| "step": 1095 | |
| }, | |
| { | |
| "epoch": 6.410557184750733, | |
| "grad_norm": 0.41354829760386663, | |
| "learning_rate": 1.4658904728073169e-05, | |
| "loss": 0.0644, | |
| "mean_token_accuracy": 0.9808862060308456, | |
| "step": 1096 | |
| }, | |
| { | |
| "epoch": 6.416422287390029, | |
| "grad_norm": 0.5695348455497171, | |
| "learning_rate": 1.4627606978785878e-05, | |
| "loss": 0.0787, | |
| "mean_token_accuracy": 0.9755074679851532, | |
| "step": 1097 | |
| }, | |
| { | |
| "epoch": 6.422287390029325, | |
| "grad_norm": 0.6053385591283429, | |
| "learning_rate": 1.4596335988307736e-05, | |
| "loss": 0.0891, | |
| "mean_token_accuracy": 0.9750615283846855, | |
| "step": 1098 | |
| }, | |
| { | |
| "epoch": 6.428152492668621, | |
| "grad_norm": 0.43427458206289293, | |
| "learning_rate": 1.4565091870139814e-05, | |
| "loss": 0.0659, | |
| "mean_token_accuracy": 0.978523463010788, | |
| "step": 1099 | |
| }, | |
| { | |
| "epoch": 6.4340175953079175, | |
| "grad_norm": 0.7745882954070427, | |
| "learning_rate": 1.4533874737685638e-05, | |
| "loss": 0.1, | |
| "mean_token_accuracy": 0.9720165580511093, | |
| "step": 1100 | |
| }, | |
| { | |
| "epoch": 6.439882697947214, | |
| "grad_norm": 0.5568880003543828, | |
| "learning_rate": 1.450268470425079e-05, | |
| "loss": 0.0727, | |
| "mean_token_accuracy": 0.9786621853709221, | |
| "step": 1101 | |
| }, | |
| { | |
| "epoch": 6.44574780058651, | |
| "grad_norm": 0.6102606978496696, | |
| "learning_rate": 1.4471521883042492e-05, | |
| "loss": 0.0797, | |
| "mean_token_accuracy": 0.9762641340494156, | |
| "step": 1102 | |
| }, | |
| { | |
| "epoch": 6.451612903225806, | |
| "grad_norm": 0.5940527203426741, | |
| "learning_rate": 1.4440386387169207e-05, | |
| "loss": 0.0812, | |
| "mean_token_accuracy": 0.9767268747091293, | |
| "step": 1103 | |
| }, | |
| { | |
| "epoch": 6.457478005865102, | |
| "grad_norm": 0.45629024567204896, | |
| "learning_rate": 1.4409278329640218e-05, | |
| "loss": 0.0815, | |
| "mean_token_accuracy": 0.9736464098095894, | |
| "step": 1104 | |
| }, | |
| { | |
| "epoch": 6.463343108504398, | |
| "grad_norm": 0.5152284877743204, | |
| "learning_rate": 1.4378197823365186e-05, | |
| "loss": 0.0793, | |
| "mean_token_accuracy": 0.9755809679627419, | |
| "step": 1105 | |
| }, | |
| { | |
| "epoch": 6.469208211143695, | |
| "grad_norm": 0.5481643128231853, | |
| "learning_rate": 1.4347144981153807e-05, | |
| "loss": 0.0957, | |
| "mean_token_accuracy": 0.9682923331856728, | |
| "step": 1106 | |
| }, | |
| { | |
| "epoch": 6.475073313782991, | |
| "grad_norm": 0.38056702827336675, | |
| "learning_rate": 1.4316119915715363e-05, | |
| "loss": 0.0638, | |
| "mean_token_accuracy": 0.9792584180831909, | |
| "step": 1107 | |
| }, | |
| { | |
| "epoch": 6.480938416422287, | |
| "grad_norm": 0.6374492323620196, | |
| "learning_rate": 1.42851227396583e-05, | |
| "loss": 0.0848, | |
| "mean_token_accuracy": 0.9738872051239014, | |
| "step": 1108 | |
| }, | |
| { | |
| "epoch": 6.486803519061583, | |
| "grad_norm": 0.4774557252930596, | |
| "learning_rate": 1.4254153565489861e-05, | |
| "loss": 0.0847, | |
| "mean_token_accuracy": 0.9743882343173027, | |
| "step": 1109 | |
| }, | |
| { | |
| "epoch": 6.492668621700879, | |
| "grad_norm": 0.5011837903979812, | |
| "learning_rate": 1.4223212505615634e-05, | |
| "loss": 0.0729, | |
| "mean_token_accuracy": 0.9778528362512589, | |
| "step": 1110 | |
| }, | |
| { | |
| "epoch": 6.4985337243401755, | |
| "grad_norm": 0.33804780404464596, | |
| "learning_rate": 1.4192299672339167e-05, | |
| "loss": 0.0655, | |
| "mean_token_accuracy": 0.9776539281010628, | |
| "step": 1111 | |
| }, | |
| { | |
| "epoch": 6.504398826979472, | |
| "grad_norm": 0.5528074933384356, | |
| "learning_rate": 1.4161415177861568e-05, | |
| "loss": 0.0812, | |
| "mean_token_accuracy": 0.9744323939085007, | |
| "step": 1112 | |
| }, | |
| { | |
| "epoch": 6.510263929618768, | |
| "grad_norm": 0.457506401160417, | |
| "learning_rate": 1.4130559134281074e-05, | |
| "loss": 0.0696, | |
| "mean_token_accuracy": 0.9784527495503426, | |
| "step": 1113 | |
| }, | |
| { | |
| "epoch": 6.516129032258064, | |
| "grad_norm": 0.4355792645298778, | |
| "learning_rate": 1.4099731653592668e-05, | |
| "loss": 0.0714, | |
| "mean_token_accuracy": 0.9768256545066833, | |
| "step": 1114 | |
| }, | |
| { | |
| "epoch": 6.52199413489736, | |
| "grad_norm": 0.5847017963309225, | |
| "learning_rate": 1.406893284768764e-05, | |
| "loss": 0.0957, | |
| "mean_token_accuracy": 0.9728905037045479, | |
| "step": 1115 | |
| }, | |
| { | |
| "epoch": 6.527859237536656, | |
| "grad_norm": 0.4781949377668264, | |
| "learning_rate": 1.4038162828353223e-05, | |
| "loss": 0.0836, | |
| "mean_token_accuracy": 0.9729764312505722, | |
| "step": 1116 | |
| }, | |
| { | |
| "epoch": 6.533724340175953, | |
| "grad_norm": 0.39863238666699835, | |
| "learning_rate": 1.4007421707272167e-05, | |
| "loss": 0.0791, | |
| "mean_token_accuracy": 0.9747898653149605, | |
| "step": 1117 | |
| }, | |
| { | |
| "epoch": 6.539589442815249, | |
| "grad_norm": 0.42134641962040914, | |
| "learning_rate": 1.3976709596022313e-05, | |
| "loss": 0.0771, | |
| "mean_token_accuracy": 0.9754165560007095, | |
| "step": 1118 | |
| }, | |
| { | |
| "epoch": 6.545454545454545, | |
| "grad_norm": 0.55326137317192, | |
| "learning_rate": 1.3946026606076232e-05, | |
| "loss": 0.0801, | |
| "mean_token_accuracy": 0.9780076220631599, | |
| "step": 1119 | |
| }, | |
| { | |
| "epoch": 6.551319648093841, | |
| "grad_norm": 0.5124262743538484, | |
| "learning_rate": 1.3915372848800784e-05, | |
| "loss": 0.0813, | |
| "mean_token_accuracy": 0.9773962423205376, | |
| "step": 1120 | |
| }, | |
| { | |
| "epoch": 6.557184750733137, | |
| "grad_norm": 0.3791238325929412, | |
| "learning_rate": 1.388474843545672e-05, | |
| "loss": 0.0672, | |
| "mean_token_accuracy": 0.9768983200192451, | |
| "step": 1121 | |
| }, | |
| { | |
| "epoch": 6.563049853372434, | |
| "grad_norm": 0.5267515124456588, | |
| "learning_rate": 1.3854153477198305e-05, | |
| "loss": 0.0975, | |
| "mean_token_accuracy": 0.96617691218853, | |
| "step": 1122 | |
| }, | |
| { | |
| "epoch": 6.568914956011731, | |
| "grad_norm": 0.43681836428663207, | |
| "learning_rate": 1.3823588085072865e-05, | |
| "loss": 0.0682, | |
| "mean_token_accuracy": 0.9760182648897171, | |
| "step": 1123 | |
| }, | |
| { | |
| "epoch": 6.574780058651027, | |
| "grad_norm": 0.49103511141121264, | |
| "learning_rate": 1.3793052370020441e-05, | |
| "loss": 0.0855, | |
| "mean_token_accuracy": 0.9750201031565666, | |
| "step": 1124 | |
| }, | |
| { | |
| "epoch": 6.580645161290323, | |
| "grad_norm": 0.5269957140084237, | |
| "learning_rate": 1.3762546442873343e-05, | |
| "loss": 0.0795, | |
| "mean_token_accuracy": 0.9769143760204315, | |
| "step": 1125 | |
| }, | |
| { | |
| "epoch": 6.586510263929619, | |
| "grad_norm": 0.5181206552351602, | |
| "learning_rate": 1.3732070414355766e-05, | |
| "loss": 0.0787, | |
| "mean_token_accuracy": 0.977984108030796, | |
| "step": 1126 | |
| }, | |
| { | |
| "epoch": 6.592375366568915, | |
| "grad_norm": 0.5708151076864234, | |
| "learning_rate": 1.370162439508339e-05, | |
| "loss": 0.0693, | |
| "mean_token_accuracy": 0.9770340025424957, | |
| "step": 1127 | |
| }, | |
| { | |
| "epoch": 6.5982404692082115, | |
| "grad_norm": 0.5470399050655503, | |
| "learning_rate": 1.367120849556296e-05, | |
| "loss": 0.0777, | |
| "mean_token_accuracy": 0.9765413254499435, | |
| "step": 1128 | |
| }, | |
| { | |
| "epoch": 6.604105571847508, | |
| "grad_norm": 0.5386549797651184, | |
| "learning_rate": 1.3640822826191907e-05, | |
| "loss": 0.0631, | |
| "mean_token_accuracy": 0.981001041829586, | |
| "step": 1129 | |
| }, | |
| { | |
| "epoch": 6.609970674486804, | |
| "grad_norm": 0.5851693910491862, | |
| "learning_rate": 1.361046749725794e-05, | |
| "loss": 0.0828, | |
| "mean_token_accuracy": 0.9728478416800499, | |
| "step": 1130 | |
| }, | |
| { | |
| "epoch": 6.6158357771261, | |
| "grad_norm": 0.43900153373855055, | |
| "learning_rate": 1.3580142618938647e-05, | |
| "loss": 0.0657, | |
| "mean_token_accuracy": 0.9815235733985901, | |
| "step": 1131 | |
| }, | |
| { | |
| "epoch": 6.621700879765396, | |
| "grad_norm": 0.49986443142734005, | |
| "learning_rate": 1.354984830130109e-05, | |
| "loss": 0.079, | |
| "mean_token_accuracy": 0.972543366253376, | |
| "step": 1132 | |
| }, | |
| { | |
| "epoch": 6.627565982404692, | |
| "grad_norm": 0.34788685161884764, | |
| "learning_rate": 1.3519584654301401e-05, | |
| "loss": 0.0725, | |
| "mean_token_accuracy": 0.9763465449213982, | |
| "step": 1133 | |
| }, | |
| { | |
| "epoch": 6.633431085043989, | |
| "grad_norm": 0.4055813660781002, | |
| "learning_rate": 1.3489351787784398e-05, | |
| "loss": 0.0709, | |
| "mean_token_accuracy": 0.9773242846131325, | |
| "step": 1134 | |
| }, | |
| { | |
| "epoch": 6.639296187683285, | |
| "grad_norm": 0.5243402543409744, | |
| "learning_rate": 1.3459149811483178e-05, | |
| "loss": 0.0802, | |
| "mean_token_accuracy": 0.9739445820450783, | |
| "step": 1135 | |
| }, | |
| { | |
| "epoch": 6.645161290322581, | |
| "grad_norm": 0.5784909436096058, | |
| "learning_rate": 1.342897883501872e-05, | |
| "loss": 0.0806, | |
| "mean_token_accuracy": 0.9781769141554832, | |
| "step": 1136 | |
| }, | |
| { | |
| "epoch": 6.651026392961877, | |
| "grad_norm": 0.5642611585182853, | |
| "learning_rate": 1.3398838967899477e-05, | |
| "loss": 0.0757, | |
| "mean_token_accuracy": 0.9764586612582207, | |
| "step": 1137 | |
| }, | |
| { | |
| "epoch": 6.656891495601173, | |
| "grad_norm": 0.5776079482395293, | |
| "learning_rate": 1.3368730319520992e-05, | |
| "loss": 0.0798, | |
| "mean_token_accuracy": 0.9767368286848068, | |
| "step": 1138 | |
| }, | |
| { | |
| "epoch": 6.6627565982404695, | |
| "grad_norm": 0.5020885081440374, | |
| "learning_rate": 1.3338652999165511e-05, | |
| "loss": 0.0828, | |
| "mean_token_accuracy": 0.9737912267446518, | |
| "step": 1139 | |
| }, | |
| { | |
| "epoch": 6.668621700879766, | |
| "grad_norm": 0.4204132575953816, | |
| "learning_rate": 1.3308607116001549e-05, | |
| "loss": 0.0673, | |
| "mean_token_accuracy": 0.9794393181800842, | |
| "step": 1140 | |
| }, | |
| { | |
| "epoch": 6.674486803519062, | |
| "grad_norm": 0.33560567758456566, | |
| "learning_rate": 1.3278592779083534e-05, | |
| "loss": 0.0622, | |
| "mean_token_accuracy": 0.982670783996582, | |
| "step": 1141 | |
| }, | |
| { | |
| "epoch": 6.680351906158358, | |
| "grad_norm": 0.44043001628693984, | |
| "learning_rate": 1.324861009735138e-05, | |
| "loss": 0.0736, | |
| "mean_token_accuracy": 0.976046696305275, | |
| "step": 1142 | |
| }, | |
| { | |
| "epoch": 6.686217008797654, | |
| "grad_norm": 0.4471342618318792, | |
| "learning_rate": 1.3218659179630112e-05, | |
| "loss": 0.0801, | |
| "mean_token_accuracy": 0.9754833951592445, | |
| "step": 1143 | |
| }, | |
| { | |
| "epoch": 6.69208211143695, | |
| "grad_norm": 0.4923308485055896, | |
| "learning_rate": 1.3188740134629469e-05, | |
| "loss": 0.0773, | |
| "mean_token_accuracy": 0.9752767756581306, | |
| "step": 1144 | |
| }, | |
| { | |
| "epoch": 6.697947214076247, | |
| "grad_norm": 0.38739961314807136, | |
| "learning_rate": 1.3158853070943499e-05, | |
| "loss": 0.0627, | |
| "mean_token_accuracy": 0.9790047481656075, | |
| "step": 1145 | |
| }, | |
| { | |
| "epoch": 6.703812316715543, | |
| "grad_norm": 0.5382902596883963, | |
| "learning_rate": 1.3128998097050174e-05, | |
| "loss": 0.0761, | |
| "mean_token_accuracy": 0.9773894026875496, | |
| "step": 1146 | |
| }, | |
| { | |
| "epoch": 6.709677419354839, | |
| "grad_norm": 0.33644056058754507, | |
| "learning_rate": 1.3099175321310993e-05, | |
| "loss": 0.0731, | |
| "mean_token_accuracy": 0.9767147675156593, | |
| "step": 1147 | |
| }, | |
| { | |
| "epoch": 6.715542521994135, | |
| "grad_norm": 0.4103102513526391, | |
| "learning_rate": 1.3069384851970584e-05, | |
| "loss": 0.072, | |
| "mean_token_accuracy": 0.9766853898763657, | |
| "step": 1148 | |
| }, | |
| { | |
| "epoch": 6.721407624633431, | |
| "grad_norm": 0.4939880953827209, | |
| "learning_rate": 1.3039626797156321e-05, | |
| "loss": 0.0736, | |
| "mean_token_accuracy": 0.9746366888284683, | |
| "step": 1149 | |
| }, | |
| { | |
| "epoch": 6.7272727272727275, | |
| "grad_norm": 0.5170797931844157, | |
| "learning_rate": 1.3009901264877924e-05, | |
| "loss": 0.0742, | |
| "mean_token_accuracy": 0.9808530285954475, | |
| "step": 1150 | |
| }, | |
| { | |
| "epoch": 6.733137829912024, | |
| "grad_norm": 0.43733602892668694, | |
| "learning_rate": 1.298020836302707e-05, | |
| "loss": 0.0737, | |
| "mean_token_accuracy": 0.9757565036416054, | |
| "step": 1151 | |
| }, | |
| { | |
| "epoch": 6.73900293255132, | |
| "grad_norm": 0.4364034894757785, | |
| "learning_rate": 1.2950548199376999e-05, | |
| "loss": 0.0709, | |
| "mean_token_accuracy": 0.9793206825852394, | |
| "step": 1152 | |
| }, | |
| { | |
| "epoch": 6.744868035190616, | |
| "grad_norm": 0.46877766717768593, | |
| "learning_rate": 1.292092088158213e-05, | |
| "loss": 0.0802, | |
| "mean_token_accuracy": 0.9757286831736565, | |
| "step": 1153 | |
| }, | |
| { | |
| "epoch": 6.750733137829912, | |
| "grad_norm": 0.5253930018569454, | |
| "learning_rate": 1.2891326517177663e-05, | |
| "loss": 0.0639, | |
| "mean_token_accuracy": 0.9813364297151566, | |
| "step": 1154 | |
| }, | |
| { | |
| "epoch": 6.756598240469208, | |
| "grad_norm": 0.7472840164089494, | |
| "learning_rate": 1.2861765213579177e-05, | |
| "loss": 0.0785, | |
| "mean_token_accuracy": 0.9729948118329048, | |
| "step": 1155 | |
| }, | |
| { | |
| "epoch": 6.762463343108505, | |
| "grad_norm": 0.4671253491191335, | |
| "learning_rate": 1.2832237078082272e-05, | |
| "loss": 0.0736, | |
| "mean_token_accuracy": 0.9766018316149712, | |
| "step": 1156 | |
| }, | |
| { | |
| "epoch": 6.768328445747801, | |
| "grad_norm": 0.48121955835735647, | |
| "learning_rate": 1.2802742217862156e-05, | |
| "loss": 0.0789, | |
| "mean_token_accuracy": 0.9764746427536011, | |
| "step": 1157 | |
| }, | |
| { | |
| "epoch": 6.774193548387097, | |
| "grad_norm": 0.49066051899276214, | |
| "learning_rate": 1.2773280739973255e-05, | |
| "loss": 0.0763, | |
| "mean_token_accuracy": 0.9759164974093437, | |
| "step": 1158 | |
| }, | |
| { | |
| "epoch": 6.780058651026393, | |
| "grad_norm": 0.6208662162621278, | |
| "learning_rate": 1.2743852751348833e-05, | |
| "loss": 0.076, | |
| "mean_token_accuracy": 0.9792628586292267, | |
| "step": 1159 | |
| }, | |
| { | |
| "epoch": 6.785923753665689, | |
| "grad_norm": 0.353932410029375, | |
| "learning_rate": 1.2714458358800612e-05, | |
| "loss": 0.0532, | |
| "mean_token_accuracy": 0.9837256073951721, | |
| "step": 1160 | |
| }, | |
| { | |
| "epoch": 6.7917888563049855, | |
| "grad_norm": 0.49853873282358013, | |
| "learning_rate": 1.2685097669018362e-05, | |
| "loss": 0.0852, | |
| "mean_token_accuracy": 0.9731195271015167, | |
| "step": 1161 | |
| }, | |
| { | |
| "epoch": 6.797653958944282, | |
| "grad_norm": 0.44880495835263245, | |
| "learning_rate": 1.265577078856953e-05, | |
| "loss": 0.0828, | |
| "mean_token_accuracy": 0.9722541272640228, | |
| "step": 1162 | |
| }, | |
| { | |
| "epoch": 6.803519061583578, | |
| "grad_norm": 0.5569097040356273, | |
| "learning_rate": 1.2626477823898843e-05, | |
| "loss": 0.0859, | |
| "mean_token_accuracy": 0.9755456000566483, | |
| "step": 1163 | |
| }, | |
| { | |
| "epoch": 6.809384164222874, | |
| "grad_norm": 0.3498951131477967, | |
| "learning_rate": 1.2597218881327944e-05, | |
| "loss": 0.0735, | |
| "mean_token_accuracy": 0.9757883995771408, | |
| "step": 1164 | |
| }, | |
| { | |
| "epoch": 6.81524926686217, | |
| "grad_norm": 0.5849400834625669, | |
| "learning_rate": 1.2567994067054961e-05, | |
| "loss": 0.0765, | |
| "mean_token_accuracy": 0.9759183302521706, | |
| "step": 1165 | |
| }, | |
| { | |
| "epoch": 6.821114369501466, | |
| "grad_norm": 0.381799734503235, | |
| "learning_rate": 1.2538803487154177e-05, | |
| "loss": 0.0711, | |
| "mean_token_accuracy": 0.9773522317409515, | |
| "step": 1166 | |
| }, | |
| { | |
| "epoch": 6.826979472140763, | |
| "grad_norm": 0.7880188378909819, | |
| "learning_rate": 1.25096472475756e-05, | |
| "loss": 0.0826, | |
| "mean_token_accuracy": 0.972911424934864, | |
| "step": 1167 | |
| }, | |
| { | |
| "epoch": 6.832844574780059, | |
| "grad_norm": 0.3352929350228359, | |
| "learning_rate": 1.248052545414461e-05, | |
| "loss": 0.0722, | |
| "mean_token_accuracy": 0.9781019762158394, | |
| "step": 1168 | |
| }, | |
| { | |
| "epoch": 6.838709677419355, | |
| "grad_norm": 0.4486648642428875, | |
| "learning_rate": 1.2451438212561556e-05, | |
| "loss": 0.086, | |
| "mean_token_accuracy": 0.9684128165245056, | |
| "step": 1169 | |
| }, | |
| { | |
| "epoch": 6.844574780058651, | |
| "grad_norm": 0.5337445397907, | |
| "learning_rate": 1.2422385628401377e-05, | |
| "loss": 0.0784, | |
| "mean_token_accuracy": 0.975617341697216, | |
| "step": 1170 | |
| }, | |
| { | |
| "epoch": 6.850439882697947, | |
| "grad_norm": 0.5023072492519426, | |
| "learning_rate": 1.2393367807113217e-05, | |
| "loss": 0.0735, | |
| "mean_token_accuracy": 0.9774403125047684, | |
| "step": 1171 | |
| }, | |
| { | |
| "epoch": 6.8563049853372435, | |
| "grad_norm": 0.5478108519507331, | |
| "learning_rate": 1.236438485402005e-05, | |
| "loss": 0.0817, | |
| "mean_token_accuracy": 0.9768877327442169, | |
| "step": 1172 | |
| }, | |
| { | |
| "epoch": 6.86217008797654, | |
| "grad_norm": 0.34421809417617816, | |
| "learning_rate": 1.2335436874318293e-05, | |
| "loss": 0.0679, | |
| "mean_token_accuracy": 0.9794752076268196, | |
| "step": 1173 | |
| }, | |
| { | |
| "epoch": 6.868035190615836, | |
| "grad_norm": 0.4818321119164012, | |
| "learning_rate": 1.2306523973077416e-05, | |
| "loss": 0.0854, | |
| "mean_token_accuracy": 0.9749922305345535, | |
| "step": 1174 | |
| }, | |
| { | |
| "epoch": 6.873900293255132, | |
| "grad_norm": 0.44140692461936487, | |
| "learning_rate": 1.2277646255239572e-05, | |
| "loss": 0.0822, | |
| "mean_token_accuracy": 0.9771685898303986, | |
| "step": 1175 | |
| }, | |
| { | |
| "epoch": 6.879765395894428, | |
| "grad_norm": 0.4122931627899861, | |
| "learning_rate": 1.2248803825619224e-05, | |
| "loss": 0.0777, | |
| "mean_token_accuracy": 0.9765308573842049, | |
| "step": 1176 | |
| }, | |
| { | |
| "epoch": 6.885630498533724, | |
| "grad_norm": 0.5185863368160872, | |
| "learning_rate": 1.2219996788902734e-05, | |
| "loss": 0.0751, | |
| "mean_token_accuracy": 0.9782401323318481, | |
| "step": 1177 | |
| }, | |
| { | |
| "epoch": 6.891495601173021, | |
| "grad_norm": 0.4704417201653903, | |
| "learning_rate": 1.2191225249648016e-05, | |
| "loss": 0.0734, | |
| "mean_token_accuracy": 0.9757214114069939, | |
| "step": 1178 | |
| }, | |
| { | |
| "epoch": 6.897360703812317, | |
| "grad_norm": 0.4464835699482663, | |
| "learning_rate": 1.216248931228413e-05, | |
| "loss": 0.0816, | |
| "mean_token_accuracy": 0.9743654951453209, | |
| "step": 1179 | |
| }, | |
| { | |
| "epoch": 6.903225806451613, | |
| "grad_norm": 0.5291322182737432, | |
| "learning_rate": 1.2133789081110927e-05, | |
| "loss": 0.0677, | |
| "mean_token_accuracy": 0.9801248833537102, | |
| "step": 1180 | |
| }, | |
| { | |
| "epoch": 6.909090909090909, | |
| "grad_norm": 0.6972005324832886, | |
| "learning_rate": 1.2105124660298655e-05, | |
| "loss": 0.0744, | |
| "mean_token_accuracy": 0.9761540293693542, | |
| "step": 1181 | |
| }, | |
| { | |
| "epoch": 6.914956011730205, | |
| "grad_norm": 0.48751263996047123, | |
| "learning_rate": 1.2076496153887587e-05, | |
| "loss": 0.0707, | |
| "mean_token_accuracy": 0.979655809700489, | |
| "step": 1182 | |
| }, | |
| { | |
| "epoch": 6.9208211143695015, | |
| "grad_norm": 0.35181511064354204, | |
| "learning_rate": 1.2047903665787633e-05, | |
| "loss": 0.0695, | |
| "mean_token_accuracy": 0.9796392843127251, | |
| "step": 1183 | |
| }, | |
| { | |
| "epoch": 6.926686217008798, | |
| "grad_norm": 0.4519666851787502, | |
| "learning_rate": 1.2019347299777981e-05, | |
| "loss": 0.0682, | |
| "mean_token_accuracy": 0.980430044233799, | |
| "step": 1184 | |
| }, | |
| { | |
| "epoch": 6.932551319648094, | |
| "grad_norm": 0.6175167131951601, | |
| "learning_rate": 1.199082715950671e-05, | |
| "loss": 0.0867, | |
| "mean_token_accuracy": 0.9735838696360588, | |
| "step": 1185 | |
| }, | |
| { | |
| "epoch": 6.93841642228739, | |
| "grad_norm": 0.5244767936392362, | |
| "learning_rate": 1.1962343348490407e-05, | |
| "loss": 0.0772, | |
| "mean_token_accuracy": 0.9765205755829811, | |
| "step": 1186 | |
| }, | |
| { | |
| "epoch": 6.944281524926686, | |
| "grad_norm": 0.5219220102479316, | |
| "learning_rate": 1.1933895970113798e-05, | |
| "loss": 0.0794, | |
| "mean_token_accuracy": 0.9774181470274925, | |
| "step": 1187 | |
| }, | |
| { | |
| "epoch": 6.9501466275659824, | |
| "grad_norm": 0.4556906562978321, | |
| "learning_rate": 1.1905485127629387e-05, | |
| "loss": 0.0818, | |
| "mean_token_accuracy": 0.9761421829462051, | |
| "step": 1188 | |
| }, | |
| { | |
| "epoch": 6.956011730205279, | |
| "grad_norm": 0.45508332549291347, | |
| "learning_rate": 1.1877110924157046e-05, | |
| "loss": 0.0718, | |
| "mean_token_accuracy": 0.977460503578186, | |
| "step": 1189 | |
| }, | |
| { | |
| "epoch": 6.961876832844575, | |
| "grad_norm": 0.4432193649311906, | |
| "learning_rate": 1.1848773462683684e-05, | |
| "loss": 0.0797, | |
| "mean_token_accuracy": 0.9757311940193176, | |
| "step": 1190 | |
| }, | |
| { | |
| "epoch": 6.967741935483871, | |
| "grad_norm": 0.48985886044639837, | |
| "learning_rate": 1.1820472846062842e-05, | |
| "loss": 0.0733, | |
| "mean_token_accuracy": 0.976937510073185, | |
| "step": 1191 | |
| }, | |
| { | |
| "epoch": 6.973607038123167, | |
| "grad_norm": 0.40019240637175396, | |
| "learning_rate": 1.1792209177014317e-05, | |
| "loss": 0.0756, | |
| "mean_token_accuracy": 0.9791741147637367, | |
| "step": 1192 | |
| }, | |
| { | |
| "epoch": 6.979472140762463, | |
| "grad_norm": 0.45859725535588075, | |
| "learning_rate": 1.1763982558123823e-05, | |
| "loss": 0.0786, | |
| "mean_token_accuracy": 0.9758159667253494, | |
| "step": 1193 | |
| }, | |
| { | |
| "epoch": 6.9853372434017595, | |
| "grad_norm": 0.6776544697191254, | |
| "learning_rate": 1.1735793091842583e-05, | |
| "loss": 0.0821, | |
| "mean_token_accuracy": 0.974973164498806, | |
| "step": 1194 | |
| }, | |
| { | |
| "epoch": 6.991202346041056, | |
| "grad_norm": 0.43821555095436865, | |
| "learning_rate": 1.1707640880486975e-05, | |
| "loss": 0.0894, | |
| "mean_token_accuracy": 0.9700244292616844, | |
| "step": 1195 | |
| }, | |
| { | |
| "epoch": 6.997067448680352, | |
| "grad_norm": 0.3282294489959217, | |
| "learning_rate": 1.1679526026238155e-05, | |
| "loss": 0.0584, | |
| "mean_token_accuracy": 0.9817990660667419, | |
| "step": 1196 | |
| }, | |
| { | |
| "epoch": 7.0, | |
| "grad_norm": 0.3282294489959217, | |
| "learning_rate": 1.165144863114169e-05, | |
| "loss": 0.0702, | |
| "mean_token_accuracy": 0.9798854738473892, | |
| "step": 1197 | |
| }, | |
| { | |
| "epoch": 7.005865102639296, | |
| "grad_norm": 0.6912206326178958, | |
| "learning_rate": 1.1623408797107185e-05, | |
| "loss": 0.0767, | |
| "mean_token_accuracy": 0.9741964638233185, | |
| "step": 1198 | |
| }, | |
| { | |
| "epoch": 7.011730205278592, | |
| "grad_norm": 0.38262777975799617, | |
| "learning_rate": 1.1595406625907914e-05, | |
| "loss": 0.066, | |
| "mean_token_accuracy": 0.9813871160149574, | |
| "step": 1199 | |
| }, | |
| { | |
| "epoch": 7.0175953079178885, | |
| "grad_norm": 0.3500785949484349, | |
| "learning_rate": 1.1567442219180446e-05, | |
| "loss": 0.0615, | |
| "mean_token_accuracy": 0.9810681790113449, | |
| "step": 1200 | |
| }, | |
| { | |
| "epoch": 7.023460410557185, | |
| "grad_norm": 0.23047356622297646, | |
| "learning_rate": 1.153951567842429e-05, | |
| "loss": 0.0545, | |
| "mean_token_accuracy": 0.9834885075688362, | |
| "step": 1201 | |
| }, | |
| { | |
| "epoch": 7.029325513196481, | |
| "grad_norm": 0.3641697975570479, | |
| "learning_rate": 1.1511627105001501e-05, | |
| "loss": 0.0771, | |
| "mean_token_accuracy": 0.9782865270972252, | |
| "step": 1202 | |
| }, | |
| { | |
| "epoch": 7.035190615835777, | |
| "grad_norm": 0.4077718026992906, | |
| "learning_rate": 1.1483776600136344e-05, | |
| "loss": 0.0754, | |
| "mean_token_accuracy": 0.975783459842205, | |
| "step": 1203 | |
| }, | |
| { | |
| "epoch": 7.041055718475073, | |
| "grad_norm": 0.3328632538786026, | |
| "learning_rate": 1.1455964264914906e-05, | |
| "loss": 0.0591, | |
| "mean_token_accuracy": 0.9806740134954453, | |
| "step": 1204 | |
| }, | |
| { | |
| "epoch": 7.0469208211143695, | |
| "grad_norm": 0.3565482658377984, | |
| "learning_rate": 1.142819020028472e-05, | |
| "loss": 0.0765, | |
| "mean_token_accuracy": 0.9753721952438354, | |
| "step": 1205 | |
| }, | |
| { | |
| "epoch": 7.052785923753666, | |
| "grad_norm": 0.513820218763689, | |
| "learning_rate": 1.140045450705443e-05, | |
| "loss": 0.0678, | |
| "mean_token_accuracy": 0.9776762798428535, | |
| "step": 1206 | |
| }, | |
| { | |
| "epoch": 7.058651026392962, | |
| "grad_norm": 0.33746029088525126, | |
| "learning_rate": 1.13727572858934e-05, | |
| "loss": 0.0616, | |
| "mean_token_accuracy": 0.9811746403574944, | |
| "step": 1207 | |
| }, | |
| { | |
| "epoch": 7.064516129032258, | |
| "grad_norm": 0.839529082956056, | |
| "learning_rate": 1.1345098637331356e-05, | |
| "loss": 0.0632, | |
| "mean_token_accuracy": 0.9804647564888, | |
| "step": 1208 | |
| }, | |
| { | |
| "epoch": 7.070381231671554, | |
| "grad_norm": 0.4512949406596747, | |
| "learning_rate": 1.1317478661758022e-05, | |
| "loss": 0.0808, | |
| "mean_token_accuracy": 0.9737851694226265, | |
| "step": 1209 | |
| }, | |
| { | |
| "epoch": 7.07624633431085, | |
| "grad_norm": 0.3599939117700741, | |
| "learning_rate": 1.1289897459422756e-05, | |
| "loss": 0.06, | |
| "mean_token_accuracy": 0.9815982356667519, | |
| "step": 1210 | |
| }, | |
| { | |
| "epoch": 7.0821114369501466, | |
| "grad_norm": 0.45954124827480125, | |
| "learning_rate": 1.126235513043418e-05, | |
| "loss": 0.0771, | |
| "mean_token_accuracy": 0.9773466065526009, | |
| "step": 1211 | |
| }, | |
| { | |
| "epoch": 7.087976539589443, | |
| "grad_norm": 0.4170706025787579, | |
| "learning_rate": 1.1234851774759828e-05, | |
| "loss": 0.0606, | |
| "mean_token_accuracy": 0.9824484139680862, | |
| "step": 1212 | |
| }, | |
| { | |
| "epoch": 7.093841642228739, | |
| "grad_norm": 0.3152206043073185, | |
| "learning_rate": 1.1207387492225772e-05, | |
| "loss": 0.0664, | |
| "mean_token_accuracy": 0.978251039981842, | |
| "step": 1213 | |
| }, | |
| { | |
| "epoch": 7.099706744868035, | |
| "grad_norm": 0.34231567251230943, | |
| "learning_rate": 1.1179962382516268e-05, | |
| "loss": 0.0747, | |
| "mean_token_accuracy": 0.9778275638818741, | |
| "step": 1214 | |
| }, | |
| { | |
| "epoch": 7.105571847507331, | |
| "grad_norm": 0.38154737914773634, | |
| "learning_rate": 1.1152576545173388e-05, | |
| "loss": 0.0661, | |
| "mean_token_accuracy": 0.9792356640100479, | |
| "step": 1215 | |
| }, | |
| { | |
| "epoch": 7.1114369501466275, | |
| "grad_norm": 0.38506430145241727, | |
| "learning_rate": 1.1125230079596654e-05, | |
| "loss": 0.0616, | |
| "mean_token_accuracy": 0.9793743640184402, | |
| "step": 1216 | |
| }, | |
| { | |
| "epoch": 7.117302052785924, | |
| "grad_norm": 0.29298285325763024, | |
| "learning_rate": 1.10979230850427e-05, | |
| "loss": 0.0692, | |
| "mean_token_accuracy": 0.9798607155680656, | |
| "step": 1217 | |
| }, | |
| { | |
| "epoch": 7.12316715542522, | |
| "grad_norm": 0.5131708622489167, | |
| "learning_rate": 1.1070655660624876e-05, | |
| "loss": 0.0762, | |
| "mean_token_accuracy": 0.9752130582928658, | |
| "step": 1218 | |
| }, | |
| { | |
| "epoch": 7.129032258064516, | |
| "grad_norm": 0.3530156134365258, | |
| "learning_rate": 1.1043427905312933e-05, | |
| "loss": 0.0764, | |
| "mean_token_accuracy": 0.9781614691019058, | |
| "step": 1219 | |
| }, | |
| { | |
| "epoch": 7.134897360703812, | |
| "grad_norm": 0.6047250936241889, | |
| "learning_rate": 1.1016239917932618e-05, | |
| "loss": 0.0705, | |
| "mean_token_accuracy": 0.97938072681427, | |
| "step": 1220 | |
| }, | |
| { | |
| "epoch": 7.140762463343108, | |
| "grad_norm": 0.43104388838903596, | |
| "learning_rate": 1.098909179716535e-05, | |
| "loss": 0.0743, | |
| "mean_token_accuracy": 0.9750777781009674, | |
| "step": 1221 | |
| }, | |
| { | |
| "epoch": 7.146627565982405, | |
| "grad_norm": 0.36410154505783404, | |
| "learning_rate": 1.096198364154784e-05, | |
| "loss": 0.0665, | |
| "mean_token_accuracy": 0.9790825098752975, | |
| "step": 1222 | |
| }, | |
| { | |
| "epoch": 7.152492668621701, | |
| "grad_norm": 0.34833475970741407, | |
| "learning_rate": 1.0934915549471747e-05, | |
| "loss": 0.0632, | |
| "mean_token_accuracy": 0.9804369062185287, | |
| "step": 1223 | |
| }, | |
| { | |
| "epoch": 7.158357771260997, | |
| "grad_norm": 0.38353938112532193, | |
| "learning_rate": 1.0907887619183308e-05, | |
| "loss": 0.0622, | |
| "mean_token_accuracy": 0.9804348051548004, | |
| "step": 1224 | |
| }, | |
| { | |
| "epoch": 7.164222873900293, | |
| "grad_norm": 0.3798952101192259, | |
| "learning_rate": 1.0880899948783002e-05, | |
| "loss": 0.0692, | |
| "mean_token_accuracy": 0.9765540808439255, | |
| "step": 1225 | |
| }, | |
| { | |
| "epoch": 7.170087976539589, | |
| "grad_norm": 0.3386618138568785, | |
| "learning_rate": 1.0853952636225165e-05, | |
| "loss": 0.0681, | |
| "mean_token_accuracy": 0.9791587889194489, | |
| "step": 1226 | |
| }, | |
| { | |
| "epoch": 7.1759530791788855, | |
| "grad_norm": 0.3955364270043223, | |
| "learning_rate": 1.0827045779317662e-05, | |
| "loss": 0.0627, | |
| "mean_token_accuracy": 0.9806070253252983, | |
| "step": 1227 | |
| }, | |
| { | |
| "epoch": 7.181818181818182, | |
| "grad_norm": 0.475027525545252, | |
| "learning_rate": 1.080017947572152e-05, | |
| "loss": 0.0606, | |
| "mean_token_accuracy": 0.9796654880046844, | |
| "step": 1228 | |
| }, | |
| { | |
| "epoch": 7.187683284457478, | |
| "grad_norm": 0.3715636804821357, | |
| "learning_rate": 1.0773353822950563e-05, | |
| "loss": 0.0778, | |
| "mean_token_accuracy": 0.9782843813300133, | |
| "step": 1229 | |
| }, | |
| { | |
| "epoch": 7.193548387096774, | |
| "grad_norm": 0.5439705897468128, | |
| "learning_rate": 1.074656891837108e-05, | |
| "loss": 0.058, | |
| "mean_token_accuracy": 0.9818157479166985, | |
| "step": 1230 | |
| }, | |
| { | |
| "epoch": 7.19941348973607, | |
| "grad_norm": 0.2837476614356044, | |
| "learning_rate": 1.0719824859201457e-05, | |
| "loss": 0.0679, | |
| "mean_token_accuracy": 0.9786864891648293, | |
| "step": 1231 | |
| }, | |
| { | |
| "epoch": 7.205278592375366, | |
| "grad_norm": 0.39130123442883724, | |
| "learning_rate": 1.0693121742511828e-05, | |
| "loss": 0.0819, | |
| "mean_token_accuracy": 0.9728628844022751, | |
| "step": 1232 | |
| }, | |
| { | |
| "epoch": 7.211143695014663, | |
| "grad_norm": 0.399415818262287, | |
| "learning_rate": 1.0666459665223718e-05, | |
| "loss": 0.0684, | |
| "mean_token_accuracy": 0.97879458963871, | |
| "step": 1233 | |
| }, | |
| { | |
| "epoch": 7.217008797653959, | |
| "grad_norm": 0.3830316696582617, | |
| "learning_rate": 1.0639838724109708e-05, | |
| "loss": 0.0679, | |
| "mean_token_accuracy": 0.9789978414773941, | |
| "step": 1234 | |
| }, | |
| { | |
| "epoch": 7.222873900293255, | |
| "grad_norm": 0.28801687651420393, | |
| "learning_rate": 1.0613259015793056e-05, | |
| "loss": 0.0561, | |
| "mean_token_accuracy": 0.9807603433728218, | |
| "step": 1235 | |
| }, | |
| { | |
| "epoch": 7.228739002932551, | |
| "grad_norm": 0.42958082726412133, | |
| "learning_rate": 1.0586720636747368e-05, | |
| "loss": 0.0762, | |
| "mean_token_accuracy": 0.9768098592758179, | |
| "step": 1236 | |
| }, | |
| { | |
| "epoch": 7.234604105571847, | |
| "grad_norm": 0.3079954178226424, | |
| "learning_rate": 1.0560223683296244e-05, | |
| "loss": 0.0678, | |
| "mean_token_accuracy": 0.9774455577135086, | |
| "step": 1237 | |
| }, | |
| { | |
| "epoch": 7.2404692082111435, | |
| "grad_norm": 0.5264608888186644, | |
| "learning_rate": 1.0533768251612924e-05, | |
| "loss": 0.0729, | |
| "mean_token_accuracy": 0.9767781645059586, | |
| "step": 1238 | |
| }, | |
| { | |
| "epoch": 7.24633431085044, | |
| "grad_norm": 0.41160315181988266, | |
| "learning_rate": 1.0507354437719938e-05, | |
| "loss": 0.0591, | |
| "mean_token_accuracy": 0.9828227832913399, | |
| "step": 1239 | |
| }, | |
| { | |
| "epoch": 7.252199413489736, | |
| "grad_norm": 0.3872454574394867, | |
| "learning_rate": 1.0480982337488768e-05, | |
| "loss": 0.0678, | |
| "mean_token_accuracy": 0.9776423200964928, | |
| "step": 1240 | |
| }, | |
| { | |
| "epoch": 7.258064516129032, | |
| "grad_norm": 0.45164649038274723, | |
| "learning_rate": 1.0454652046639486e-05, | |
| "loss": 0.077, | |
| "mean_token_accuracy": 0.9758302047848701, | |
| "step": 1241 | |
| }, | |
| { | |
| "epoch": 7.263929618768328, | |
| "grad_norm": 0.4139469705680152, | |
| "learning_rate": 1.0428363660740407e-05, | |
| "loss": 0.0708, | |
| "mean_token_accuracy": 0.9761909395456314, | |
| "step": 1242 | |
| }, | |
| { | |
| "epoch": 7.269794721407624, | |
| "grad_norm": 0.3840838494270553, | |
| "learning_rate": 1.0402117275207757e-05, | |
| "loss": 0.0754, | |
| "mean_token_accuracy": 0.9756297990679741, | |
| "step": 1243 | |
| }, | |
| { | |
| "epoch": 7.275659824046921, | |
| "grad_norm": 0.3808573639604821, | |
| "learning_rate": 1.0375912985305319e-05, | |
| "loss": 0.068, | |
| "mean_token_accuracy": 0.9781527444720268, | |
| "step": 1244 | |
| }, | |
| { | |
| "epoch": 7.281524926686217, | |
| "grad_norm": 0.4206553824325304, | |
| "learning_rate": 1.0349750886144077e-05, | |
| "loss": 0.0688, | |
| "mean_token_accuracy": 0.9749187454581261, | |
| "step": 1245 | |
| }, | |
| { | |
| "epoch": 7.287390029325513, | |
| "grad_norm": 0.33265484291156516, | |
| "learning_rate": 1.0323631072681888e-05, | |
| "loss": 0.0658, | |
| "mean_token_accuracy": 0.9795495644211769, | |
| "step": 1246 | |
| }, | |
| { | |
| "epoch": 7.293255131964809, | |
| "grad_norm": 0.3178171971797074, | |
| "learning_rate": 1.0297553639723123e-05, | |
| "loss": 0.0594, | |
| "mean_token_accuracy": 0.9814764708280563, | |
| "step": 1247 | |
| }, | |
| { | |
| "epoch": 7.299120234604105, | |
| "grad_norm": 0.3779867551878193, | |
| "learning_rate": 1.027151868191834e-05, | |
| "loss": 0.0745, | |
| "mean_token_accuracy": 0.9745180755853653, | |
| "step": 1248 | |
| }, | |
| { | |
| "epoch": 7.3049853372434015, | |
| "grad_norm": 0.4292350836002161, | |
| "learning_rate": 1.0245526293763908e-05, | |
| "loss": 0.0817, | |
| "mean_token_accuracy": 0.974598154425621, | |
| "step": 1249 | |
| }, | |
| { | |
| "epoch": 7.310850439882698, | |
| "grad_norm": 0.35067941778112494, | |
| "learning_rate": 1.0219576569601707e-05, | |
| "loss": 0.0793, | |
| "mean_token_accuracy": 0.9769597128033638, | |
| "step": 1250 | |
| }, | |
| { | |
| "epoch": 7.316715542521994, | |
| "grad_norm": 0.39563806400869955, | |
| "learning_rate": 1.0193669603618757e-05, | |
| "loss": 0.0742, | |
| "mean_token_accuracy": 0.976021520793438, | |
| "step": 1251 | |
| }, | |
| { | |
| "epoch": 7.32258064516129, | |
| "grad_norm": 0.46712769922706915, | |
| "learning_rate": 1.0167805489846873e-05, | |
| "loss": 0.0638, | |
| "mean_token_accuracy": 0.9811434298753738, | |
| "step": 1252 | |
| }, | |
| { | |
| "epoch": 7.328445747800586, | |
| "grad_norm": 0.37339627555491145, | |
| "learning_rate": 1.0141984322162353e-05, | |
| "loss": 0.0645, | |
| "mean_token_accuracy": 0.9801758378744125, | |
| "step": 1253 | |
| }, | |
| { | |
| "epoch": 7.334310850439882, | |
| "grad_norm": 0.3044155287959229, | |
| "learning_rate": 1.0116206194285598e-05, | |
| "loss": 0.0719, | |
| "mean_token_accuracy": 0.9779629483819008, | |
| "step": 1254 | |
| }, | |
| { | |
| "epoch": 7.340175953079179, | |
| "grad_norm": 0.39714577819514213, | |
| "learning_rate": 1.0090471199780812e-05, | |
| "loss": 0.0774, | |
| "mean_token_accuracy": 0.9755602031946182, | |
| "step": 1255 | |
| }, | |
| { | |
| "epoch": 7.346041055718475, | |
| "grad_norm": 0.4570877974493308, | |
| "learning_rate": 1.0064779432055616e-05, | |
| "loss": 0.0698, | |
| "mean_token_accuracy": 0.9759530946612358, | |
| "step": 1256 | |
| }, | |
| { | |
| "epoch": 7.351906158357771, | |
| "grad_norm": 0.4205327589510046, | |
| "learning_rate": 1.0039130984360761e-05, | |
| "loss": 0.0683, | |
| "mean_token_accuracy": 0.9774453565478325, | |
| "step": 1257 | |
| }, | |
| { | |
| "epoch": 7.357771260997067, | |
| "grad_norm": 0.3684606740091315, | |
| "learning_rate": 1.0013525949789745e-05, | |
| "loss": 0.0726, | |
| "mean_token_accuracy": 0.9773312881588936, | |
| "step": 1258 | |
| }, | |
| { | |
| "epoch": 7.363636363636363, | |
| "grad_norm": 0.3187355898488441, | |
| "learning_rate": 9.987964421278512e-06, | |
| "loss": 0.0659, | |
| "mean_token_accuracy": 0.9804589822888374, | |
| "step": 1259 | |
| }, | |
| { | |
| "epoch": 7.3695014662756595, | |
| "grad_norm": 0.4023244952241441, | |
| "learning_rate": 9.962446491605084e-06, | |
| "loss": 0.07, | |
| "mean_token_accuracy": 0.975837953388691, | |
| "step": 1260 | |
| }, | |
| { | |
| "epoch": 7.375366568914956, | |
| "grad_norm": 0.5341557069269197, | |
| "learning_rate": 9.936972253389235e-06, | |
| "loss": 0.0646, | |
| "mean_token_accuracy": 0.9801123738288879, | |
| "step": 1261 | |
| }, | |
| { | |
| "epoch": 7.381231671554252, | |
| "grad_norm": 0.3564112574933918, | |
| "learning_rate": 9.911541799092162e-06, | |
| "loss": 0.0724, | |
| "mean_token_accuracy": 0.9757182076573372, | |
| "step": 1262 | |
| }, | |
| { | |
| "epoch": 7.387096774193548, | |
| "grad_norm": 0.2346449467553508, | |
| "learning_rate": 9.88615522101615e-06, | |
| "loss": 0.0641, | |
| "mean_token_accuracy": 0.978550560772419, | |
| "step": 1263 | |
| }, | |
| { | |
| "epoch": 7.392961876832844, | |
| "grad_norm": 0.3271500195862692, | |
| "learning_rate": 9.860812611304225e-06, | |
| "loss": 0.0596, | |
| "mean_token_accuracy": 0.9800273403525352, | |
| "step": 1264 | |
| }, | |
| { | |
| "epoch": 7.39882697947214, | |
| "grad_norm": 0.29052657488602945, | |
| "learning_rate": 9.835514061939822e-06, | |
| "loss": 0.0562, | |
| "mean_token_accuracy": 0.9814219176769257, | |
| "step": 1265 | |
| }, | |
| { | |
| "epoch": 7.404692082111437, | |
| "grad_norm": 0.32197042134754045, | |
| "learning_rate": 9.810259664746454e-06, | |
| "loss": 0.0694, | |
| "mean_token_accuracy": 0.9777977392077446, | |
| "step": 1266 | |
| }, | |
| { | |
| "epoch": 7.410557184750733, | |
| "grad_norm": 0.48962506212270573, | |
| "learning_rate": 9.785049511387383e-06, | |
| "loss": 0.0683, | |
| "mean_token_accuracy": 0.975383386015892, | |
| "step": 1267 | |
| }, | |
| { | |
| "epoch": 7.416422287390029, | |
| "grad_norm": 0.42616315538478555, | |
| "learning_rate": 9.759883693365287e-06, | |
| "loss": 0.0748, | |
| "mean_token_accuracy": 0.976491704583168, | |
| "step": 1268 | |
| }, | |
| { | |
| "epoch": 7.422287390029325, | |
| "grad_norm": 0.5004041655379637, | |
| "learning_rate": 9.734762302021923e-06, | |
| "loss": 0.0609, | |
| "mean_token_accuracy": 0.9809972047805786, | |
| "step": 1269 | |
| }, | |
| { | |
| "epoch": 7.428152492668621, | |
| "grad_norm": 0.41423957139830336, | |
| "learning_rate": 9.709685428537794e-06, | |
| "loss": 0.065, | |
| "mean_token_accuracy": 0.982276625931263, | |
| "step": 1270 | |
| }, | |
| { | |
| "epoch": 7.4340175953079175, | |
| "grad_norm": 0.4463820203523401, | |
| "learning_rate": 9.684653163931823e-06, | |
| "loss": 0.0748, | |
| "mean_token_accuracy": 0.9780172407627106, | |
| "step": 1271 | |
| }, | |
| { | |
| "epoch": 7.439882697947214, | |
| "grad_norm": 0.32726953724707947, | |
| "learning_rate": 9.659665599061019e-06, | |
| "loss": 0.0785, | |
| "mean_token_accuracy": 0.9734633192420006, | |
| "step": 1272 | |
| }, | |
| { | |
| "epoch": 7.44574780058651, | |
| "grad_norm": 0.41686111725495745, | |
| "learning_rate": 9.634722824620154e-06, | |
| "loss": 0.0595, | |
| "mean_token_accuracy": 0.9789967909455299, | |
| "step": 1273 | |
| }, | |
| { | |
| "epoch": 7.451612903225806, | |
| "grad_norm": 0.25252483483957594, | |
| "learning_rate": 9.609824931141423e-06, | |
| "loss": 0.0608, | |
| "mean_token_accuracy": 0.9781973287463188, | |
| "step": 1274 | |
| }, | |
| { | |
| "epoch": 7.457478005865102, | |
| "grad_norm": 0.35317028243851656, | |
| "learning_rate": 9.584972008994123e-06, | |
| "loss": 0.0652, | |
| "mean_token_accuracy": 0.980886660516262, | |
| "step": 1275 | |
| }, | |
| { | |
| "epoch": 7.463343108504398, | |
| "grad_norm": 0.3588023504100938, | |
| "learning_rate": 9.560164148384328e-06, | |
| "loss": 0.0817, | |
| "mean_token_accuracy": 0.9760325774550438, | |
| "step": 1276 | |
| }, | |
| { | |
| "epoch": 7.469208211143695, | |
| "grad_norm": 0.42465358965052863, | |
| "learning_rate": 9.53540143935455e-06, | |
| "loss": 0.072, | |
| "mean_token_accuracy": 0.9772544130682945, | |
| "step": 1277 | |
| }, | |
| { | |
| "epoch": 7.475073313782991, | |
| "grad_norm": 0.40089777234913554, | |
| "learning_rate": 9.510683971783425e-06, | |
| "loss": 0.0904, | |
| "mean_token_accuracy": 0.9742084890604019, | |
| "step": 1278 | |
| }, | |
| { | |
| "epoch": 7.480938416422287, | |
| "grad_norm": 0.5755039890987124, | |
| "learning_rate": 9.486011835385372e-06, | |
| "loss": 0.0484, | |
| "mean_token_accuracy": 0.9854866787791252, | |
| "step": 1279 | |
| }, | |
| { | |
| "epoch": 7.486803519061583, | |
| "grad_norm": 0.26318968566984136, | |
| "learning_rate": 9.461385119710282e-06, | |
| "loss": 0.074, | |
| "mean_token_accuracy": 0.9743320271372795, | |
| "step": 1280 | |
| }, | |
| { | |
| "epoch": 7.492668621700879, | |
| "grad_norm": 0.32345568363196076, | |
| "learning_rate": 9.436803914143189e-06, | |
| "loss": 0.0726, | |
| "mean_token_accuracy": 0.9740675911307335, | |
| "step": 1281 | |
| }, | |
| { | |
| "epoch": 7.4985337243401755, | |
| "grad_norm": 0.3359866371891183, | |
| "learning_rate": 9.41226830790394e-06, | |
| "loss": 0.0679, | |
| "mean_token_accuracy": 0.9786439761519432, | |
| "step": 1282 | |
| }, | |
| { | |
| "epoch": 7.504398826979472, | |
| "grad_norm": 0.28449592977664995, | |
| "learning_rate": 9.387778390046881e-06, | |
| "loss": 0.0644, | |
| "mean_token_accuracy": 0.9785284176468849, | |
| "step": 1283 | |
| }, | |
| { | |
| "epoch": 7.510263929618768, | |
| "grad_norm": 0.26900051978234174, | |
| "learning_rate": 9.363334249460519e-06, | |
| "loss": 0.0654, | |
| "mean_token_accuracy": 0.9803383573889732, | |
| "step": 1284 | |
| }, | |
| { | |
| "epoch": 7.516129032258064, | |
| "grad_norm": 0.32220420095575997, | |
| "learning_rate": 9.338935974867213e-06, | |
| "loss": 0.0725, | |
| "mean_token_accuracy": 0.9765875190496445, | |
| "step": 1285 | |
| }, | |
| { | |
| "epoch": 7.52199413489736, | |
| "grad_norm": 0.5224308102926841, | |
| "learning_rate": 9.314583654822844e-06, | |
| "loss": 0.0751, | |
| "mean_token_accuracy": 0.9766501858830452, | |
| "step": 1286 | |
| }, | |
| { | |
| "epoch": 7.527859237536656, | |
| "grad_norm": 0.5037316754055287, | |
| "learning_rate": 9.290277377716503e-06, | |
| "loss": 0.0796, | |
| "mean_token_accuracy": 0.9741085171699524, | |
| "step": 1287 | |
| }, | |
| { | |
| "epoch": 7.533724340175953, | |
| "grad_norm": 0.42377305298738516, | |
| "learning_rate": 9.266017231770155e-06, | |
| "loss": 0.0588, | |
| "mean_token_accuracy": 0.9798842146992683, | |
| "step": 1288 | |
| }, | |
| { | |
| "epoch": 7.539589442815249, | |
| "grad_norm": 0.2573789224293022, | |
| "learning_rate": 9.241803305038333e-06, | |
| "loss": 0.0744, | |
| "mean_token_accuracy": 0.9776958003640175, | |
| "step": 1289 | |
| }, | |
| { | |
| "epoch": 7.545454545454545, | |
| "grad_norm": 0.2965363860970641, | |
| "learning_rate": 9.217635685407813e-06, | |
| "loss": 0.0657, | |
| "mean_token_accuracy": 0.9799975752830505, | |
| "step": 1290 | |
| }, | |
| { | |
| "epoch": 7.551319648093841, | |
| "grad_norm": 0.32631428816151475, | |
| "learning_rate": 9.19351446059729e-06, | |
| "loss": 0.0602, | |
| "mean_token_accuracy": 0.9817659631371498, | |
| "step": 1291 | |
| }, | |
| { | |
| "epoch": 7.557184750733137, | |
| "grad_norm": 0.3125871316017961, | |
| "learning_rate": 9.16943971815708e-06, | |
| "loss": 0.0636, | |
| "mean_token_accuracy": 0.9800999537110329, | |
| "step": 1292 | |
| }, | |
| { | |
| "epoch": 7.563049853372434, | |
| "grad_norm": 0.2734057475439993, | |
| "learning_rate": 9.145411545468756e-06, | |
| "loss": 0.0587, | |
| "mean_token_accuracy": 0.9795428663492203, | |
| "step": 1293 | |
| }, | |
| { | |
| "epoch": 7.568914956011731, | |
| "grad_norm": 0.2604148879143986, | |
| "learning_rate": 9.121430029744893e-06, | |
| "loss": 0.062, | |
| "mean_token_accuracy": 0.9805775061249733, | |
| "step": 1294 | |
| }, | |
| { | |
| "epoch": 7.574780058651027, | |
| "grad_norm": 0.31702286505675215, | |
| "learning_rate": 9.097495258028703e-06, | |
| "loss": 0.0693, | |
| "mean_token_accuracy": 0.9769936874508858, | |
| "step": 1295 | |
| }, | |
| { | |
| "epoch": 7.580645161290323, | |
| "grad_norm": 0.3605926122000164, | |
| "learning_rate": 9.073607317193742e-06, | |
| "loss": 0.0619, | |
| "mean_token_accuracy": 0.9788065627217293, | |
| "step": 1296 | |
| }, | |
| { | |
| "epoch": 7.586510263929619, | |
| "grad_norm": 0.2535069897924097, | |
| "learning_rate": 9.049766293943589e-06, | |
| "loss": 0.0706, | |
| "mean_token_accuracy": 0.9772609323263168, | |
| "step": 1297 | |
| }, | |
| { | |
| "epoch": 7.592375366568915, | |
| "grad_norm": 0.4109347943974271, | |
| "learning_rate": 9.025972274811527e-06, | |
| "loss": 0.0684, | |
| "mean_token_accuracy": 0.9789599850773811, | |
| "step": 1298 | |
| }, | |
| { | |
| "epoch": 7.5982404692082115, | |
| "grad_norm": 0.2871829801660241, | |
| "learning_rate": 9.002225346160238e-06, | |
| "loss": 0.0649, | |
| "mean_token_accuracy": 0.9778304621577263, | |
| "step": 1299 | |
| }, | |
| { | |
| "epoch": 7.604105571847508, | |
| "grad_norm": 0.32674313911500424, | |
| "learning_rate": 8.97852559418148e-06, | |
| "loss": 0.0637, | |
| "mean_token_accuracy": 0.9782585576176643, | |
| "step": 1300 | |
| }, | |
| { | |
| "epoch": 7.609970674486804, | |
| "grad_norm": 0.34327052126908203, | |
| "learning_rate": 8.954873104895787e-06, | |
| "loss": 0.0631, | |
| "mean_token_accuracy": 0.9815321713685989, | |
| "step": 1301 | |
| }, | |
| { | |
| "epoch": 7.6158357771261, | |
| "grad_norm": 0.33708528837651625, | |
| "learning_rate": 8.931267964152132e-06, | |
| "loss": 0.0688, | |
| "mean_token_accuracy": 0.9765809625387192, | |
| "step": 1302 | |
| }, | |
| { | |
| "epoch": 7.621700879765396, | |
| "grad_norm": 0.5041195790884399, | |
| "learning_rate": 8.907710257627651e-06, | |
| "loss": 0.0676, | |
| "mean_token_accuracy": 0.9783420264720917, | |
| "step": 1303 | |
| }, | |
| { | |
| "epoch": 7.627565982404692, | |
| "grad_norm": 0.2989590349789707, | |
| "learning_rate": 8.884200070827303e-06, | |
| "loss": 0.0632, | |
| "mean_token_accuracy": 0.9762579947710037, | |
| "step": 1304 | |
| }, | |
| { | |
| "epoch": 7.633431085043989, | |
| "grad_norm": 0.4094635271370586, | |
| "learning_rate": 8.86073748908357e-06, | |
| "loss": 0.0631, | |
| "mean_token_accuracy": 0.9789704233407974, | |
| "step": 1305 | |
| }, | |
| { | |
| "epoch": 7.639296187683285, | |
| "grad_norm": 0.27117288447287735, | |
| "learning_rate": 8.837322597556146e-06, | |
| "loss": 0.0641, | |
| "mean_token_accuracy": 0.9795557036995888, | |
| "step": 1306 | |
| }, | |
| { | |
| "epoch": 7.645161290322581, | |
| "grad_norm": 0.4138735803384374, | |
| "learning_rate": 8.813955481231633e-06, | |
| "loss": 0.0751, | |
| "mean_token_accuracy": 0.9753428846597672, | |
| "step": 1307 | |
| }, | |
| { | |
| "epoch": 7.651026392961877, | |
| "grad_norm": 0.30570757377595903, | |
| "learning_rate": 8.790636224923221e-06, | |
| "loss": 0.068, | |
| "mean_token_accuracy": 0.9780451580882072, | |
| "step": 1308 | |
| }, | |
| { | |
| "epoch": 7.656891495601173, | |
| "grad_norm": 0.3261846476792855, | |
| "learning_rate": 8.767364913270399e-06, | |
| "loss": 0.0768, | |
| "mean_token_accuracy": 0.9772769138216972, | |
| "step": 1309 | |
| }, | |
| { | |
| "epoch": 7.6627565982404695, | |
| "grad_norm": 0.45484472620842326, | |
| "learning_rate": 8.744141630738624e-06, | |
| "loss": 0.0741, | |
| "mean_token_accuracy": 0.9788528978824615, | |
| "step": 1310 | |
| }, | |
| { | |
| "epoch": 7.668621700879766, | |
| "grad_norm": 0.274785803295409, | |
| "learning_rate": 8.720966461619038e-06, | |
| "loss": 0.0801, | |
| "mean_token_accuracy": 0.9781883060932159, | |
| "step": 1311 | |
| }, | |
| { | |
| "epoch": 7.674486803519062, | |
| "grad_norm": 0.5798168184695972, | |
| "learning_rate": 8.69783949002814e-06, | |
| "loss": 0.0614, | |
| "mean_token_accuracy": 0.9807113409042358, | |
| "step": 1312 | |
| }, | |
| { | |
| "epoch": 7.680351906158358, | |
| "grad_norm": 0.30491164949294325, | |
| "learning_rate": 8.6747607999075e-06, | |
| "loss": 0.0565, | |
| "mean_token_accuracy": 0.9811301380395889, | |
| "step": 1313 | |
| }, | |
| { | |
| "epoch": 7.686217008797654, | |
| "grad_norm": 0.38152749678409453, | |
| "learning_rate": 8.651730475023435e-06, | |
| "loss": 0.0726, | |
| "mean_token_accuracy": 0.9776108860969543, | |
| "step": 1314 | |
| }, | |
| { | |
| "epoch": 7.69208211143695, | |
| "grad_norm": 0.41854013127871614, | |
| "learning_rate": 8.628748598966739e-06, | |
| "loss": 0.0655, | |
| "mean_token_accuracy": 0.9768750295042992, | |
| "step": 1315 | |
| }, | |
| { | |
| "epoch": 7.697947214076247, | |
| "grad_norm": 0.45499424698989827, | |
| "learning_rate": 8.605815255152323e-06, | |
| "loss": 0.0791, | |
| "mean_token_accuracy": 0.9720618352293968, | |
| "step": 1316 | |
| }, | |
| { | |
| "epoch": 7.703812316715543, | |
| "grad_norm": 0.30913926089594507, | |
| "learning_rate": 8.582930526818973e-06, | |
| "loss": 0.0725, | |
| "mean_token_accuracy": 0.9765586853027344, | |
| "step": 1317 | |
| }, | |
| { | |
| "epoch": 7.709677419354839, | |
| "grad_norm": 0.5304284265697857, | |
| "learning_rate": 8.560094497029008e-06, | |
| "loss": 0.0723, | |
| "mean_token_accuracy": 0.9783492982387543, | |
| "step": 1318 | |
| }, | |
| { | |
| "epoch": 7.715542521994135, | |
| "grad_norm": 0.44193236209697745, | |
| "learning_rate": 8.537307248667992e-06, | |
| "loss": 0.0664, | |
| "mean_token_accuracy": 0.9785650745034218, | |
| "step": 1319 | |
| }, | |
| { | |
| "epoch": 7.721407624633431, | |
| "grad_norm": 0.3801324857485209, | |
| "learning_rate": 8.514568864444432e-06, | |
| "loss": 0.0765, | |
| "mean_token_accuracy": 0.9757534116506577, | |
| "step": 1320 | |
| }, | |
| { | |
| "epoch": 7.7272727272727275, | |
| "grad_norm": 0.2924649756631599, | |
| "learning_rate": 8.491879426889483e-06, | |
| "loss": 0.0635, | |
| "mean_token_accuracy": 0.9798558130860329, | |
| "step": 1321 | |
| }, | |
| { | |
| "epoch": 7.733137829912024, | |
| "grad_norm": 0.46934446068300567, | |
| "learning_rate": 8.469239018356636e-06, | |
| "loss": 0.0761, | |
| "mean_token_accuracy": 0.9742545410990715, | |
| "step": 1322 | |
| }, | |
| { | |
| "epoch": 7.73900293255132, | |
| "grad_norm": 0.42318394489101063, | |
| "learning_rate": 8.446647721021435e-06, | |
| "loss": 0.0852, | |
| "mean_token_accuracy": 0.9734436348080635, | |
| "step": 1323 | |
| }, | |
| { | |
| "epoch": 7.744868035190616, | |
| "grad_norm": 0.4709962099950871, | |
| "learning_rate": 8.424105616881161e-06, | |
| "loss": 0.0698, | |
| "mean_token_accuracy": 0.9796672537922859, | |
| "step": 1324 | |
| }, | |
| { | |
| "epoch": 7.750733137829912, | |
| "grad_norm": 0.5052962775096801, | |
| "learning_rate": 8.40161278775455e-06, | |
| "loss": 0.079, | |
| "mean_token_accuracy": 0.9778957739472389, | |
| "step": 1325 | |
| }, | |
| { | |
| "epoch": 7.756598240469208, | |
| "grad_norm": 0.794953716358727, | |
| "learning_rate": 8.379169315281485e-06, | |
| "loss": 0.0723, | |
| "mean_token_accuracy": 0.9770526960492134, | |
| "step": 1326 | |
| }, | |
| { | |
| "epoch": 7.762463343108505, | |
| "grad_norm": 0.4708633571052823, | |
| "learning_rate": 8.356775280922708e-06, | |
| "loss": 0.0761, | |
| "mean_token_accuracy": 0.9773171544075012, | |
| "step": 1327 | |
| }, | |
| { | |
| "epoch": 7.768328445747801, | |
| "grad_norm": 0.2977066686368222, | |
| "learning_rate": 8.334430765959522e-06, | |
| "loss": 0.073, | |
| "mean_token_accuracy": 0.9756058230996132, | |
| "step": 1328 | |
| }, | |
| { | |
| "epoch": 7.774193548387097, | |
| "grad_norm": 0.34688096761186954, | |
| "learning_rate": 8.312135851493494e-06, | |
| "loss": 0.0764, | |
| "mean_token_accuracy": 0.9759645387530327, | |
| "step": 1329 | |
| }, | |
| { | |
| "epoch": 7.780058651026393, | |
| "grad_norm": 0.37759510575185457, | |
| "learning_rate": 8.28989061844615e-06, | |
| "loss": 0.0559, | |
| "mean_token_accuracy": 0.9825332537293434, | |
| "step": 1330 | |
| }, | |
| { | |
| "epoch": 7.785923753665689, | |
| "grad_norm": 0.25548539074972576, | |
| "learning_rate": 8.267695147558705e-06, | |
| "loss": 0.0753, | |
| "mean_token_accuracy": 0.9778474643826485, | |
| "step": 1331 | |
| }, | |
| { | |
| "epoch": 7.7917888563049855, | |
| "grad_norm": 0.3247899940629153, | |
| "learning_rate": 8.245549519391758e-06, | |
| "loss": 0.0763, | |
| "mean_token_accuracy": 0.9769897162914276, | |
| "step": 1332 | |
| }, | |
| { | |
| "epoch": 7.797653958944282, | |
| "grad_norm": 0.4165530275778508, | |
| "learning_rate": 8.22345381432499e-06, | |
| "loss": 0.0704, | |
| "mean_token_accuracy": 0.9790071472525597, | |
| "step": 1333 | |
| }, | |
| { | |
| "epoch": 7.803519061583578, | |
| "grad_norm": 0.3026205083529094, | |
| "learning_rate": 8.201408112556893e-06, | |
| "loss": 0.0669, | |
| "mean_token_accuracy": 0.9787802696228027, | |
| "step": 1334 | |
| }, | |
| { | |
| "epoch": 7.809384164222874, | |
| "grad_norm": 0.35726307580476363, | |
| "learning_rate": 8.179412494104457e-06, | |
| "loss": 0.0698, | |
| "mean_token_accuracy": 0.9751449227333069, | |
| "step": 1335 | |
| }, | |
| { | |
| "epoch": 7.81524926686217, | |
| "grad_norm": 0.4509292195222995, | |
| "learning_rate": 8.15746703880289e-06, | |
| "loss": 0.0686, | |
| "mean_token_accuracy": 0.9777436852455139, | |
| "step": 1336 | |
| }, | |
| { | |
| "epoch": 7.821114369501466, | |
| "grad_norm": 0.2686585360899267, | |
| "learning_rate": 8.135571826305339e-06, | |
| "loss": 0.0604, | |
| "mean_token_accuracy": 0.9788657277822495, | |
| "step": 1337 | |
| }, | |
| { | |
| "epoch": 7.826979472140763, | |
| "grad_norm": 0.4818930537808581, | |
| "learning_rate": 8.113726936082576e-06, | |
| "loss": 0.0806, | |
| "mean_token_accuracy": 0.9743361845612526, | |
| "step": 1338 | |
| }, | |
| { | |
| "epoch": 7.832844574780059, | |
| "grad_norm": 0.49508100194206756, | |
| "learning_rate": 8.091932447422737e-06, | |
| "loss": 0.074, | |
| "mean_token_accuracy": 0.9731877073645592, | |
| "step": 1339 | |
| }, | |
| { | |
| "epoch": 7.838709677419355, | |
| "grad_norm": 0.29711247347821884, | |
| "learning_rate": 8.070188439431005e-06, | |
| "loss": 0.0657, | |
| "mean_token_accuracy": 0.9789380878210068, | |
| "step": 1340 | |
| }, | |
| { | |
| "epoch": 7.844574780058651, | |
| "grad_norm": 0.3397541983633956, | |
| "learning_rate": 8.048494991029352e-06, | |
| "loss": 0.0606, | |
| "mean_token_accuracy": 0.9785341024398804, | |
| "step": 1341 | |
| }, | |
| { | |
| "epoch": 7.850439882697947, | |
| "grad_norm": 0.480376564004997, | |
| "learning_rate": 8.02685218095624e-06, | |
| "loss": 0.0727, | |
| "mean_token_accuracy": 0.9771670550107956, | |
| "step": 1342 | |
| }, | |
| { | |
| "epoch": 7.8563049853372435, | |
| "grad_norm": 0.40889844988462554, | |
| "learning_rate": 8.005260087766318e-06, | |
| "loss": 0.0724, | |
| "mean_token_accuracy": 0.9762912541627884, | |
| "step": 1343 | |
| }, | |
| { | |
| "epoch": 7.86217008797654, | |
| "grad_norm": 0.30638673540582606, | |
| "learning_rate": 7.983718789830167e-06, | |
| "loss": 0.0723, | |
| "mean_token_accuracy": 0.9767410978674889, | |
| "step": 1344 | |
| }, | |
| { | |
| "epoch": 7.868035190615836, | |
| "grad_norm": 0.3766404101128721, | |
| "learning_rate": 7.962228365333999e-06, | |
| "loss": 0.073, | |
| "mean_token_accuracy": 0.9778221324086189, | |
| "step": 1345 | |
| }, | |
| { | |
| "epoch": 7.873900293255132, | |
| "grad_norm": 0.3875864282647274, | |
| "learning_rate": 7.940788892279375e-06, | |
| "loss": 0.074, | |
| "mean_token_accuracy": 0.9775504246354103, | |
| "step": 1346 | |
| }, | |
| { | |
| "epoch": 7.879765395894428, | |
| "grad_norm": 0.3486193538218716, | |
| "learning_rate": 7.919400448482928e-06, | |
| "loss": 0.0657, | |
| "mean_token_accuracy": 0.9794389456510544, | |
| "step": 1347 | |
| }, | |
| { | |
| "epoch": 7.885630498533724, | |
| "grad_norm": 0.2955046780982513, | |
| "learning_rate": 7.898063111576066e-06, | |
| "loss": 0.0696, | |
| "mean_token_accuracy": 0.9779496192932129, | |
| "step": 1348 | |
| }, | |
| { | |
| "epoch": 7.891495601173021, | |
| "grad_norm": 0.3234432215282191, | |
| "learning_rate": 7.876776959004706e-06, | |
| "loss": 0.0846, | |
| "mean_token_accuracy": 0.9718790277838707, | |
| "step": 1349 | |
| }, | |
| { | |
| "epoch": 7.897360703812317, | |
| "grad_norm": 0.35172886904487044, | |
| "learning_rate": 7.855542068028981e-06, | |
| "loss": 0.0644, | |
| "mean_token_accuracy": 0.9782620742917061, | |
| "step": 1350 | |
| }, | |
| { | |
| "epoch": 7.903225806451613, | |
| "grad_norm": 0.2484032872765626, | |
| "learning_rate": 7.834358515722977e-06, | |
| "loss": 0.0667, | |
| "mean_token_accuracy": 0.9796766042709351, | |
| "step": 1351 | |
| }, | |
| { | |
| "epoch": 7.909090909090909, | |
| "grad_norm": 0.3189929514946592, | |
| "learning_rate": 7.813226378974427e-06, | |
| "loss": 0.0696, | |
| "mean_token_accuracy": 0.976318895816803, | |
| "step": 1352 | |
| }, | |
| { | |
| "epoch": 7.914956011730205, | |
| "grad_norm": 0.32186645853668916, | |
| "learning_rate": 7.792145734484455e-06, | |
| "loss": 0.0665, | |
| "mean_token_accuracy": 0.9768570438027382, | |
| "step": 1353 | |
| }, | |
| { | |
| "epoch": 7.9208211143695015, | |
| "grad_norm": 0.28509948839859406, | |
| "learning_rate": 7.771116658767286e-06, | |
| "loss": 0.0729, | |
| "mean_token_accuracy": 0.9778455495834351, | |
| "step": 1354 | |
| }, | |
| { | |
| "epoch": 7.926686217008798, | |
| "grad_norm": 0.31897252819873134, | |
| "learning_rate": 7.750139228149978e-06, | |
| "loss": 0.0782, | |
| "mean_token_accuracy": 0.9735360145568848, | |
| "step": 1355 | |
| }, | |
| { | |
| "epoch": 7.932551319648094, | |
| "grad_norm": 0.4594664457151054, | |
| "learning_rate": 7.729213518772121e-06, | |
| "loss": 0.0682, | |
| "mean_token_accuracy": 0.980343259871006, | |
| "step": 1356 | |
| }, | |
| { | |
| "epoch": 7.93841642228739, | |
| "grad_norm": 0.37424885169765215, | |
| "learning_rate": 7.708339606585591e-06, | |
| "loss": 0.0698, | |
| "mean_token_accuracy": 0.9746110588312149, | |
| "step": 1357 | |
| }, | |
| { | |
| "epoch": 7.944281524926686, | |
| "grad_norm": 0.3355832207100877, | |
| "learning_rate": 7.687517567354266e-06, | |
| "loss": 0.0829, | |
| "mean_token_accuracy": 0.9746036380529404, | |
| "step": 1358 | |
| }, | |
| { | |
| "epoch": 7.9501466275659824, | |
| "grad_norm": 0.37310850047357214, | |
| "learning_rate": 7.66674747665373e-06, | |
| "loss": 0.0643, | |
| "mean_token_accuracy": 0.9776176363229752, | |
| "step": 1359 | |
| }, | |
| { | |
| "epoch": 7.956011730205279, | |
| "grad_norm": 0.34627662256254366, | |
| "learning_rate": 7.646029409871029e-06, | |
| "loss": 0.0715, | |
| "mean_token_accuracy": 0.9754548743367195, | |
| "step": 1360 | |
| }, | |
| { | |
| "epoch": 7.961876832844575, | |
| "grad_norm": 0.2924727330598229, | |
| "learning_rate": 7.625363442204379e-06, | |
| "loss": 0.0588, | |
| "mean_token_accuracy": 0.9818178787827492, | |
| "step": 1361 | |
| }, | |
| { | |
| "epoch": 7.967741935483871, | |
| "grad_norm": 0.37066267600800856, | |
| "learning_rate": 7.604749648662892e-06, | |
| "loss": 0.0682, | |
| "mean_token_accuracy": 0.9784936085343361, | |
| "step": 1362 | |
| }, | |
| { | |
| "epoch": 7.973607038123167, | |
| "grad_norm": 0.2726562666079756, | |
| "learning_rate": 7.584188104066317e-06, | |
| "loss": 0.0586, | |
| "mean_token_accuracy": 0.9794305935502052, | |
| "step": 1363 | |
| }, | |
| { | |
| "epoch": 7.979472140762463, | |
| "grad_norm": 0.46100400117349094, | |
| "learning_rate": 7.563678883044754e-06, | |
| "loss": 0.0843, | |
| "mean_token_accuracy": 0.9759550020098686, | |
| "step": 1364 | |
| }, | |
| { | |
| "epoch": 7.9853372434017595, | |
| "grad_norm": 0.46096923720257865, | |
| "learning_rate": 7.5432220600383935e-06, | |
| "loss": 0.0832, | |
| "mean_token_accuracy": 0.9736724197864532, | |
| "step": 1365 | |
| }, | |
| { | |
| "epoch": 7.991202346041056, | |
| "grad_norm": 0.3537209449519693, | |
| "learning_rate": 7.522817709297241e-06, | |
| "loss": 0.0676, | |
| "mean_token_accuracy": 0.9790577068924904, | |
| "step": 1366 | |
| }, | |
| { | |
| "epoch": 7.997067448680352, | |
| "grad_norm": 0.5417913017909552, | |
| "learning_rate": 7.502465904880849e-06, | |
| "loss": 0.0717, | |
| "mean_token_accuracy": 0.9783544093370438, | |
| "step": 1367 | |
| }, | |
| { | |
| "epoch": 8.0, | |
| "grad_norm": 0.6652206583753479, | |
| "learning_rate": 7.482166720658046e-06, | |
| "loss": 0.0685, | |
| "mean_token_accuracy": 0.980608344078064, | |
| "step": 1368 | |
| }, | |
| { | |
| "epoch": 8.005865102639296, | |
| "grad_norm": 0.33294774705553826, | |
| "learning_rate": 7.461920230306674e-06, | |
| "loss": 0.0651, | |
| "mean_token_accuracy": 0.9774910733103752, | |
| "step": 1369 | |
| }, | |
| { | |
| "epoch": 8.011730205278592, | |
| "grad_norm": 0.4730798517874754, | |
| "learning_rate": 7.441726507313318e-06, | |
| "loss": 0.0586, | |
| "mean_token_accuracy": 0.9798407405614853, | |
| "step": 1370 | |
| }, | |
| { | |
| "epoch": 8.017595307917889, | |
| "grad_norm": 0.4072909224715183, | |
| "learning_rate": 7.421585624973033e-06, | |
| "loss": 0.0685, | |
| "mean_token_accuracy": 0.9780117720365524, | |
| "step": 1371 | |
| }, | |
| { | |
| "epoch": 8.023460410557185, | |
| "grad_norm": 0.23290629620105716, | |
| "learning_rate": 7.4014976563890915e-06, | |
| "loss": 0.0558, | |
| "mean_token_accuracy": 0.9813942089676857, | |
| "step": 1372 | |
| }, | |
| { | |
| "epoch": 8.029325513196481, | |
| "grad_norm": 0.23204586036189112, | |
| "learning_rate": 7.381462674472702e-06, | |
| "loss": 0.0562, | |
| "mean_token_accuracy": 0.9837755858898163, | |
| "step": 1373 | |
| }, | |
| { | |
| "epoch": 8.035190615835777, | |
| "grad_norm": 0.26570656097468354, | |
| "learning_rate": 7.36148075194276e-06, | |
| "loss": 0.0575, | |
| "mean_token_accuracy": 0.9810594543814659, | |
| "step": 1374 | |
| }, | |
| { | |
| "epoch": 8.041055718475073, | |
| "grad_norm": 0.2534234167607354, | |
| "learning_rate": 7.341551961325574e-06, | |
| "loss": 0.0579, | |
| "mean_token_accuracy": 0.981083907186985, | |
| "step": 1375 | |
| }, | |
| { | |
| "epoch": 8.04692082111437, | |
| "grad_norm": 0.22482533363895343, | |
| "learning_rate": 7.3216763749546025e-06, | |
| "loss": 0.0536, | |
| "mean_token_accuracy": 0.9840012043714523, | |
| "step": 1376 | |
| }, | |
| { | |
| "epoch": 8.052785923753666, | |
| "grad_norm": 0.3539307170644396, | |
| "learning_rate": 7.301854064970202e-06, | |
| "loss": 0.064, | |
| "mean_token_accuracy": 0.9816621989011765, | |
| "step": 1377 | |
| }, | |
| { | |
| "epoch": 8.058651026392962, | |
| "grad_norm": 0.3133359023630968, | |
| "learning_rate": 7.282085103319349e-06, | |
| "loss": 0.0603, | |
| "mean_token_accuracy": 0.9810803234577179, | |
| "step": 1378 | |
| }, | |
| { | |
| "epoch": 8.064516129032258, | |
| "grad_norm": 0.2856689566183232, | |
| "learning_rate": 7.2623695617553934e-06, | |
| "loss": 0.0665, | |
| "mean_token_accuracy": 0.9795011952519417, | |
| "step": 1379 | |
| }, | |
| { | |
| "epoch": 8.070381231671554, | |
| "grad_norm": 0.32524667631472975, | |
| "learning_rate": 7.242707511837781e-06, | |
| "loss": 0.0566, | |
| "mean_token_accuracy": 0.9811884611845016, | |
| "step": 1380 | |
| }, | |
| { | |
| "epoch": 8.07624633431085, | |
| "grad_norm": 0.22155385666931476, | |
| "learning_rate": 7.223099024931817e-06, | |
| "loss": 0.0533, | |
| "mean_token_accuracy": 0.9849221631884575, | |
| "step": 1381 | |
| }, | |
| { | |
| "epoch": 8.082111436950147, | |
| "grad_norm": 0.2639512786516192, | |
| "learning_rate": 7.203544172208387e-06, | |
| "loss": 0.0602, | |
| "mean_token_accuracy": 0.9811254888772964, | |
| "step": 1382 | |
| }, | |
| { | |
| "epoch": 8.087976539589443, | |
| "grad_norm": 0.4099521690739927, | |
| "learning_rate": 7.184043024643712e-06, | |
| "loss": 0.0645, | |
| "mean_token_accuracy": 0.980625793337822, | |
| "step": 1383 | |
| }, | |
| { | |
| "epoch": 8.093841642228739, | |
| "grad_norm": 0.2746270859860697, | |
| "learning_rate": 7.16459565301908e-06, | |
| "loss": 0.0602, | |
| "mean_token_accuracy": 0.9808760434389114, | |
| "step": 1384 | |
| }, | |
| { | |
| "epoch": 8.099706744868035, | |
| "grad_norm": 0.3336582722499348, | |
| "learning_rate": 7.145202127920598e-06, | |
| "loss": 0.0709, | |
| "mean_token_accuracy": 0.9770251661539078, | |
| "step": 1385 | |
| }, | |
| { | |
| "epoch": 8.105571847507331, | |
| "grad_norm": 0.3159956433838349, | |
| "learning_rate": 7.125862519738924e-06, | |
| "loss": 0.0601, | |
| "mean_token_accuracy": 0.9790099188685417, | |
| "step": 1386 | |
| }, | |
| { | |
| "epoch": 8.111436950146627, | |
| "grad_norm": 0.27751907752313126, | |
| "learning_rate": 7.106576898669031e-06, | |
| "loss": 0.0645, | |
| "mean_token_accuracy": 0.9811584055423737, | |
| "step": 1387 | |
| }, | |
| { | |
| "epoch": 8.117302052785924, | |
| "grad_norm": 0.3229154865255592, | |
| "learning_rate": 7.087345334709931e-06, | |
| "loss": 0.0661, | |
| "mean_token_accuracy": 0.9760906621813774, | |
| "step": 1388 | |
| }, | |
| { | |
| "epoch": 8.12316715542522, | |
| "grad_norm": 0.302786075924986, | |
| "learning_rate": 7.068167897664433e-06, | |
| "loss": 0.0672, | |
| "mean_token_accuracy": 0.9771555885672569, | |
| "step": 1389 | |
| }, | |
| { | |
| "epoch": 8.129032258064516, | |
| "grad_norm": 0.34533438808047695, | |
| "learning_rate": 7.0490446571388925e-06, | |
| "loss": 0.0714, | |
| "mean_token_accuracy": 0.978884294629097, | |
| "step": 1390 | |
| }, | |
| { | |
| "epoch": 8.134897360703812, | |
| "grad_norm": 0.27542127742914396, | |
| "learning_rate": 7.0299756825429465e-06, | |
| "loss": 0.0605, | |
| "mean_token_accuracy": 0.9800280183553696, | |
| "step": 1391 | |
| }, | |
| { | |
| "epoch": 8.140762463343108, | |
| "grad_norm": 0.2371774920683244, | |
| "learning_rate": 7.010961043089277e-06, | |
| "loss": 0.0511, | |
| "mean_token_accuracy": 0.9846341237425804, | |
| "step": 1392 | |
| }, | |
| { | |
| "epoch": 8.146627565982405, | |
| "grad_norm": 0.24268229094962124, | |
| "learning_rate": 6.992000807793333e-06, | |
| "loss": 0.0578, | |
| "mean_token_accuracy": 0.9803787469863892, | |
| "step": 1393 | |
| }, | |
| { | |
| "epoch": 8.1524926686217, | |
| "grad_norm": 0.3643306900167006, | |
| "learning_rate": 6.973095045473124e-06, | |
| "loss": 0.0677, | |
| "mean_token_accuracy": 0.9794165417551994, | |
| "step": 1394 | |
| }, | |
| { | |
| "epoch": 8.158357771260997, | |
| "grad_norm": 0.2947855628136845, | |
| "learning_rate": 6.954243824748922e-06, | |
| "loss": 0.0677, | |
| "mean_token_accuracy": 0.9798598140478134, | |
| "step": 1395 | |
| }, | |
| { | |
| "epoch": 8.164222873900293, | |
| "grad_norm": 0.24965567868424024, | |
| "learning_rate": 6.93544721404305e-06, | |
| "loss": 0.0621, | |
| "mean_token_accuracy": 0.979515366256237, | |
| "step": 1396 | |
| }, | |
| { | |
| "epoch": 8.17008797653959, | |
| "grad_norm": 0.33114852320813654, | |
| "learning_rate": 6.916705281579612e-06, | |
| "loss": 0.0616, | |
| "mean_token_accuracy": 0.9791374951601028, | |
| "step": 1397 | |
| }, | |
| { | |
| "epoch": 8.175953079178885, | |
| "grad_norm": 0.29068631223410873, | |
| "learning_rate": 6.898018095384252e-06, | |
| "loss": 0.075, | |
| "mean_token_accuracy": 0.976100243628025, | |
| "step": 1398 | |
| }, | |
| { | |
| "epoch": 8.181818181818182, | |
| "grad_norm": 0.3513269186709242, | |
| "learning_rate": 6.879385723283913e-06, | |
| "loss": 0.0624, | |
| "mean_token_accuracy": 0.9797561913728714, | |
| "step": 1399 | |
| }, | |
| { | |
| "epoch": 8.187683284457478, | |
| "grad_norm": 0.23415965488620152, | |
| "learning_rate": 6.8608082329065775e-06, | |
| "loss": 0.061, | |
| "mean_token_accuracy": 0.9810163378715515, | |
| "step": 1400 | |
| }, | |
| { | |
| "epoch": 8.193548387096774, | |
| "grad_norm": 0.2887897085336456, | |
| "learning_rate": 6.842285691681032e-06, | |
| "loss": 0.0702, | |
| "mean_token_accuracy": 0.9803082495927811, | |
| "step": 1401 | |
| }, | |
| { | |
| "epoch": 8.19941348973607, | |
| "grad_norm": 0.31338662556740365, | |
| "learning_rate": 6.8238181668366244e-06, | |
| "loss": 0.0544, | |
| "mean_token_accuracy": 0.9798052236437798, | |
| "step": 1402 | |
| }, | |
| { | |
| "epoch": 8.205278592375366, | |
| "grad_norm": 0.26118312597759613, | |
| "learning_rate": 6.805405725403006e-06, | |
| "loss": 0.069, | |
| "mean_token_accuracy": 0.9780385494232178, | |
| "step": 1403 | |
| }, | |
| { | |
| "epoch": 8.211143695014663, | |
| "grad_norm": 0.25966801207442913, | |
| "learning_rate": 6.787048434209906e-06, | |
| "loss": 0.0642, | |
| "mean_token_accuracy": 0.9776748195290565, | |
| "step": 1404 | |
| }, | |
| { | |
| "epoch": 8.217008797653959, | |
| "grad_norm": 0.2944460445098481, | |
| "learning_rate": 6.768746359886882e-06, | |
| "loss": 0.0635, | |
| "mean_token_accuracy": 0.9799527376890182, | |
| "step": 1405 | |
| }, | |
| { | |
| "epoch": 8.222873900293255, | |
| "grad_norm": 0.25208455985617567, | |
| "learning_rate": 6.750499568863061e-06, | |
| "loss": 0.0632, | |
| "mean_token_accuracy": 0.9808372110128403, | |
| "step": 1406 | |
| }, | |
| { | |
| "epoch": 8.228739002932551, | |
| "grad_norm": 0.28741775439220846, | |
| "learning_rate": 6.732308127366931e-06, | |
| "loss": 0.0734, | |
| "mean_token_accuracy": 0.9774434566497803, | |
| "step": 1407 | |
| }, | |
| { | |
| "epoch": 8.234604105571847, | |
| "grad_norm": 0.3289864539935258, | |
| "learning_rate": 6.714172101426077e-06, | |
| "loss": 0.0683, | |
| "mean_token_accuracy": 0.9762885868549347, | |
| "step": 1408 | |
| }, | |
| { | |
| "epoch": 8.240469208211143, | |
| "grad_norm": 0.23736345312152154, | |
| "learning_rate": 6.696091556866948e-06, | |
| "loss": 0.0511, | |
| "mean_token_accuracy": 0.9827252328395844, | |
| "step": 1409 | |
| }, | |
| { | |
| "epoch": 8.24633431085044, | |
| "grad_norm": 0.2977687940939231, | |
| "learning_rate": 6.678066559314622e-06, | |
| "loss": 0.0722, | |
| "mean_token_accuracy": 0.9760833904147148, | |
| "step": 1410 | |
| }, | |
| { | |
| "epoch": 8.252199413489736, | |
| "grad_norm": 0.3982183165411704, | |
| "learning_rate": 6.660097174192556e-06, | |
| "loss": 0.0674, | |
| "mean_token_accuracy": 0.9767892211675644, | |
| "step": 1411 | |
| }, | |
| { | |
| "epoch": 8.258064516129032, | |
| "grad_norm": 0.24845009306240756, | |
| "learning_rate": 6.642183466722363e-06, | |
| "loss": 0.0655, | |
| "mean_token_accuracy": 0.9782160073518753, | |
| "step": 1412 | |
| }, | |
| { | |
| "epoch": 8.263929618768328, | |
| "grad_norm": 0.25047139942879637, | |
| "learning_rate": 6.624325501923565e-06, | |
| "loss": 0.0656, | |
| "mean_token_accuracy": 0.9789937734603882, | |
| "step": 1413 | |
| }, | |
| { | |
| "epoch": 8.269794721407624, | |
| "grad_norm": 0.3582552285931662, | |
| "learning_rate": 6.606523344613362e-06, | |
| "loss": 0.0728, | |
| "mean_token_accuracy": 0.974414773285389, | |
| "step": 1414 | |
| }, | |
| { | |
| "epoch": 8.27565982404692, | |
| "grad_norm": 0.23281831399691813, | |
| "learning_rate": 6.588777059406397e-06, | |
| "loss": 0.0633, | |
| "mean_token_accuracy": 0.9821778386831284, | |
| "step": 1415 | |
| }, | |
| { | |
| "epoch": 8.281524926686217, | |
| "grad_norm": 0.21783610764412928, | |
| "learning_rate": 6.571086710714516e-06, | |
| "loss": 0.0523, | |
| "mean_token_accuracy": 0.9819829240441322, | |
| "step": 1416 | |
| }, | |
| { | |
| "epoch": 8.287390029325513, | |
| "grad_norm": 0.26618761667003027, | |
| "learning_rate": 6.553452362746543e-06, | |
| "loss": 0.0688, | |
| "mean_token_accuracy": 0.9776479974389076, | |
| "step": 1417 | |
| }, | |
| { | |
| "epoch": 8.29325513196481, | |
| "grad_norm": 0.3307461688951337, | |
| "learning_rate": 6.5358740795080335e-06, | |
| "loss": 0.0753, | |
| "mean_token_accuracy": 0.9744124263525009, | |
| "step": 1418 | |
| }, | |
| { | |
| "epoch": 8.299120234604105, | |
| "grad_norm": 0.3877945336373047, | |
| "learning_rate": 6.518351924801061e-06, | |
| "loss": 0.0712, | |
| "mean_token_accuracy": 0.9782202020287514, | |
| "step": 1419 | |
| }, | |
| { | |
| "epoch": 8.304985337243401, | |
| "grad_norm": 0.21829692333074224, | |
| "learning_rate": 6.500885962223969e-06, | |
| "loss": 0.0588, | |
| "mean_token_accuracy": 0.9835516288876534, | |
| "step": 1420 | |
| }, | |
| { | |
| "epoch": 8.310850439882698, | |
| "grad_norm": 0.2737695133055407, | |
| "learning_rate": 6.483476255171146e-06, | |
| "loss": 0.0726, | |
| "mean_token_accuracy": 0.9777635931968689, | |
| "step": 1421 | |
| }, | |
| { | |
| "epoch": 8.316715542521994, | |
| "grad_norm": 0.3156742313219525, | |
| "learning_rate": 6.4661228668328015e-06, | |
| "loss": 0.0609, | |
| "mean_token_accuracy": 0.9810267016291618, | |
| "step": 1422 | |
| }, | |
| { | |
| "epoch": 8.32258064516129, | |
| "grad_norm": 0.29043044693215986, | |
| "learning_rate": 6.448825860194722e-06, | |
| "loss": 0.0696, | |
| "mean_token_accuracy": 0.9785061553120613, | |
| "step": 1423 | |
| }, | |
| { | |
| "epoch": 8.328445747800586, | |
| "grad_norm": 0.20232305876185708, | |
| "learning_rate": 6.431585298038057e-06, | |
| "loss": 0.0474, | |
| "mean_token_accuracy": 0.9857818335294724, | |
| "step": 1424 | |
| }, | |
| { | |
| "epoch": 8.334310850439882, | |
| "grad_norm": 0.2604739184613935, | |
| "learning_rate": 6.414401242939087e-06, | |
| "loss": 0.0682, | |
| "mean_token_accuracy": 0.9787532687187195, | |
| "step": 1425 | |
| }, | |
| { | |
| "epoch": 8.340175953079179, | |
| "grad_norm": 0.28996723077653824, | |
| "learning_rate": 6.397273757268987e-06, | |
| "loss": 0.0619, | |
| "mean_token_accuracy": 0.981784000992775, | |
| "step": 1426 | |
| }, | |
| { | |
| "epoch": 8.346041055718475, | |
| "grad_norm": 0.2826379503854459, | |
| "learning_rate": 6.380202903193616e-06, | |
| "loss": 0.0741, | |
| "mean_token_accuracy": 0.9775265082716942, | |
| "step": 1427 | |
| }, | |
| { | |
| "epoch": 8.351906158357771, | |
| "grad_norm": 0.3247668563192037, | |
| "learning_rate": 6.363188742673281e-06, | |
| "loss": 0.0656, | |
| "mean_token_accuracy": 0.978668600320816, | |
| "step": 1428 | |
| }, | |
| { | |
| "epoch": 8.357771260997067, | |
| "grad_norm": 0.2769284132122931, | |
| "learning_rate": 6.346231337462513e-06, | |
| "loss": 0.0597, | |
| "mean_token_accuracy": 0.9785289466381073, | |
| "step": 1429 | |
| }, | |
| { | |
| "epoch": 8.363636363636363, | |
| "grad_norm": 0.35215981895710224, | |
| "learning_rate": 6.329330749109839e-06, | |
| "loss": 0.0798, | |
| "mean_token_accuracy": 0.9739778935909271, | |
| "step": 1430 | |
| }, | |
| { | |
| "epoch": 8.36950146627566, | |
| "grad_norm": 0.2895199872166017, | |
| "learning_rate": 6.312487038957573e-06, | |
| "loss": 0.0661, | |
| "mean_token_accuracy": 0.9788866117596626, | |
| "step": 1431 | |
| }, | |
| { | |
| "epoch": 8.375366568914956, | |
| "grad_norm": 0.30231800425923677, | |
| "learning_rate": 6.295700268141579e-06, | |
| "loss": 0.0573, | |
| "mean_token_accuracy": 0.981073260307312, | |
| "step": 1432 | |
| }, | |
| { | |
| "epoch": 8.381231671554252, | |
| "grad_norm": 0.2861524694796837, | |
| "learning_rate": 6.2789704975910574e-06, | |
| "loss": 0.0551, | |
| "mean_token_accuracy": 0.9809359386563301, | |
| "step": 1433 | |
| }, | |
| { | |
| "epoch": 8.387096774193548, | |
| "grad_norm": 0.27965800064820345, | |
| "learning_rate": 6.262297788028316e-06, | |
| "loss": 0.0576, | |
| "mean_token_accuracy": 0.9791011437773705, | |
| "step": 1434 | |
| }, | |
| { | |
| "epoch": 8.392961876832844, | |
| "grad_norm": 0.2567178381963598, | |
| "learning_rate": 6.245682199968556e-06, | |
| "loss": 0.0666, | |
| "mean_token_accuracy": 0.977754257619381, | |
| "step": 1435 | |
| }, | |
| { | |
| "epoch": 8.39882697947214, | |
| "grad_norm": 0.252089258921102, | |
| "learning_rate": 6.229123793719656e-06, | |
| "loss": 0.0629, | |
| "mean_token_accuracy": 0.9788324162364006, | |
| "step": 1436 | |
| }, | |
| { | |
| "epoch": 8.404692082111437, | |
| "grad_norm": 0.2498020291209392, | |
| "learning_rate": 6.21262262938194e-06, | |
| "loss": 0.0588, | |
| "mean_token_accuracy": 0.9818584844470024, | |
| "step": 1437 | |
| }, | |
| { | |
| "epoch": 8.410557184750733, | |
| "grad_norm": 0.3067068353713511, | |
| "learning_rate": 6.196178766847969e-06, | |
| "loss": 0.0635, | |
| "mean_token_accuracy": 0.977524109184742, | |
| "step": 1438 | |
| }, | |
| { | |
| "epoch": 8.416422287390029, | |
| "grad_norm": 0.279049594579598, | |
| "learning_rate": 6.1797922658023264e-06, | |
| "loss": 0.0748, | |
| "mean_token_accuracy": 0.9750313237309456, | |
| "step": 1439 | |
| }, | |
| { | |
| "epoch": 8.422287390029325, | |
| "grad_norm": 0.2428016087284499, | |
| "learning_rate": 6.16346318572139e-06, | |
| "loss": 0.063, | |
| "mean_token_accuracy": 0.9801136925816536, | |
| "step": 1440 | |
| }, | |
| { | |
| "epoch": 8.428152492668621, | |
| "grad_norm": 0.3642333111467534, | |
| "learning_rate": 6.147191585873128e-06, | |
| "loss": 0.0722, | |
| "mean_token_accuracy": 0.9765574038028717, | |
| "step": 1441 | |
| }, | |
| { | |
| "epoch": 8.434017595307918, | |
| "grad_norm": 0.2452302120598073, | |
| "learning_rate": 6.130977525316878e-06, | |
| "loss": 0.0662, | |
| "mean_token_accuracy": 0.9802148938179016, | |
| "step": 1442 | |
| }, | |
| { | |
| "epoch": 8.439882697947214, | |
| "grad_norm": 0.23546548291503536, | |
| "learning_rate": 6.114821062903125e-06, | |
| "loss": 0.0634, | |
| "mean_token_accuracy": 0.9796994179487228, | |
| "step": 1443 | |
| }, | |
| { | |
| "epoch": 8.44574780058651, | |
| "grad_norm": 0.34248040168779487, | |
| "learning_rate": 6.098722257273303e-06, | |
| "loss": 0.0627, | |
| "mean_token_accuracy": 0.9773654565215111, | |
| "step": 1444 | |
| }, | |
| { | |
| "epoch": 8.451612903225806, | |
| "grad_norm": 0.3267948785376941, | |
| "learning_rate": 6.082681166859579e-06, | |
| "loss": 0.074, | |
| "mean_token_accuracy": 0.9782762229442596, | |
| "step": 1445 | |
| }, | |
| { | |
| "epoch": 8.457478005865102, | |
| "grad_norm": 0.24944133665569335, | |
| "learning_rate": 6.066697849884629e-06, | |
| "loss": 0.0678, | |
| "mean_token_accuracy": 0.9791956692934036, | |
| "step": 1446 | |
| }, | |
| { | |
| "epoch": 8.463343108504398, | |
| "grad_norm": 0.22140226528324586, | |
| "learning_rate": 6.0507723643614415e-06, | |
| "loss": 0.0484, | |
| "mean_token_accuracy": 0.9839403405785561, | |
| "step": 1447 | |
| }, | |
| { | |
| "epoch": 8.469208211143695, | |
| "grad_norm": 0.3931794295766522, | |
| "learning_rate": 6.034904768093095e-06, | |
| "loss": 0.0651, | |
| "mean_token_accuracy": 0.9783346280455589, | |
| "step": 1448 | |
| }, | |
| { | |
| "epoch": 8.47507331378299, | |
| "grad_norm": 0.28221103656025875, | |
| "learning_rate": 6.019095118672557e-06, | |
| "loss": 0.072, | |
| "mean_token_accuracy": 0.9760680794715881, | |
| "step": 1449 | |
| }, | |
| { | |
| "epoch": 8.480938416422287, | |
| "grad_norm": 0.35500720339693564, | |
| "learning_rate": 6.003343473482469e-06, | |
| "loss": 0.0663, | |
| "mean_token_accuracy": 0.9795534163713455, | |
| "step": 1450 | |
| }, | |
| { | |
| "epoch": 8.486803519061583, | |
| "grad_norm": 0.44206392730367217, | |
| "learning_rate": 5.98764988969494e-06, | |
| "loss": 0.0712, | |
| "mean_token_accuracy": 0.9759310409426689, | |
| "step": 1451 | |
| }, | |
| { | |
| "epoch": 8.49266862170088, | |
| "grad_norm": 0.2884122822560015, | |
| "learning_rate": 5.972014424271344e-06, | |
| "loss": 0.0574, | |
| "mean_token_accuracy": 0.9818243160843849, | |
| "step": 1452 | |
| }, | |
| { | |
| "epoch": 8.498533724340176, | |
| "grad_norm": 0.24169715217818574, | |
| "learning_rate": 5.956437133962103e-06, | |
| "loss": 0.064, | |
| "mean_token_accuracy": 0.9796706140041351, | |
| "step": 1453 | |
| }, | |
| { | |
| "epoch": 8.504398826979472, | |
| "grad_norm": 0.3567541736942327, | |
| "learning_rate": 5.94091807530649e-06, | |
| "loss": 0.0676, | |
| "mean_token_accuracy": 0.976375125348568, | |
| "step": 1454 | |
| }, | |
| { | |
| "epoch": 8.510263929618768, | |
| "grad_norm": 0.2660628076236119, | |
| "learning_rate": 5.925457304632421e-06, | |
| "loss": 0.068, | |
| "mean_token_accuracy": 0.9772655889391899, | |
| "step": 1455 | |
| }, | |
| { | |
| "epoch": 8.516129032258064, | |
| "grad_norm": 0.3586723680013531, | |
| "learning_rate": 5.91005487805625e-06, | |
| "loss": 0.0781, | |
| "mean_token_accuracy": 0.975320614874363, | |
| "step": 1456 | |
| }, | |
| { | |
| "epoch": 8.52199413489736, | |
| "grad_norm": 0.2531027806770582, | |
| "learning_rate": 5.894710851482563e-06, | |
| "loss": 0.0638, | |
| "mean_token_accuracy": 0.9815174117684364, | |
| "step": 1457 | |
| }, | |
| { | |
| "epoch": 8.527859237536656, | |
| "grad_norm": 0.25474047304957326, | |
| "learning_rate": 5.879425280603981e-06, | |
| "loss": 0.0663, | |
| "mean_token_accuracy": 0.9790660366415977, | |
| "step": 1458 | |
| }, | |
| { | |
| "epoch": 8.533724340175953, | |
| "grad_norm": 0.29693469905798703, | |
| "learning_rate": 5.864198220900952e-06, | |
| "loss": 0.0593, | |
| "mean_token_accuracy": 0.9797492399811745, | |
| "step": 1459 | |
| }, | |
| { | |
| "epoch": 8.539589442815249, | |
| "grad_norm": 0.28442650282700044, | |
| "learning_rate": 5.849029727641552e-06, | |
| "loss": 0.0645, | |
| "mean_token_accuracy": 0.9790033251047134, | |
| "step": 1460 | |
| }, | |
| { | |
| "epoch": 8.545454545454545, | |
| "grad_norm": 0.29862524636179977, | |
| "learning_rate": 5.833919855881286e-06, | |
| "loss": 0.0687, | |
| "mean_token_accuracy": 0.9779817909002304, | |
| "step": 1461 | |
| }, | |
| { | |
| "epoch": 8.551319648093841, | |
| "grad_norm": 0.2766831332028397, | |
| "learning_rate": 5.818868660462886e-06, | |
| "loss": 0.0615, | |
| "mean_token_accuracy": 0.9803951904177666, | |
| "step": 1462 | |
| }, | |
| { | |
| "epoch": 8.557184750733137, | |
| "grad_norm": 0.23218479374158082, | |
| "learning_rate": 5.803876196016114e-06, | |
| "loss": 0.0607, | |
| "mean_token_accuracy": 0.9823366180062294, | |
| "step": 1463 | |
| }, | |
| { | |
| "epoch": 8.563049853372434, | |
| "grad_norm": 0.2464367593263216, | |
| "learning_rate": 5.788942516957561e-06, | |
| "loss": 0.0597, | |
| "mean_token_accuracy": 0.9808334708213806, | |
| "step": 1464 | |
| }, | |
| { | |
| "epoch": 8.56891495601173, | |
| "grad_norm": 0.30192138899810556, | |
| "learning_rate": 5.774067677490448e-06, | |
| "loss": 0.0719, | |
| "mean_token_accuracy": 0.9759406819939613, | |
| "step": 1465 | |
| }, | |
| { | |
| "epoch": 8.574780058651026, | |
| "grad_norm": 0.3158998321878276, | |
| "learning_rate": 5.759251731604435e-06, | |
| "loss": 0.0559, | |
| "mean_token_accuracy": 0.9803323373198509, | |
| "step": 1466 | |
| }, | |
| { | |
| "epoch": 8.580645161290322, | |
| "grad_norm": 0.31413085082208875, | |
| "learning_rate": 5.744494733075424e-06, | |
| "loss": 0.067, | |
| "mean_token_accuracy": 0.9777957648038864, | |
| "step": 1467 | |
| }, | |
| { | |
| "epoch": 8.586510263929618, | |
| "grad_norm": 0.2555349585045622, | |
| "learning_rate": 5.729796735465359e-06, | |
| "loss": 0.0647, | |
| "mean_token_accuracy": 0.9778914302587509, | |
| "step": 1468 | |
| }, | |
| { | |
| "epoch": 8.592375366568914, | |
| "grad_norm": 0.26756135754205695, | |
| "learning_rate": 5.7151577921220356e-06, | |
| "loss": 0.066, | |
| "mean_token_accuracy": 0.9788284227252007, | |
| "step": 1469 | |
| }, | |
| { | |
| "epoch": 8.59824046920821, | |
| "grad_norm": 0.244072542005457, | |
| "learning_rate": 5.7005779561789046e-06, | |
| "loss": 0.0536, | |
| "mean_token_accuracy": 0.9816020429134369, | |
| "step": 1470 | |
| }, | |
| { | |
| "epoch": 8.604105571847507, | |
| "grad_norm": 0.23753259941339863, | |
| "learning_rate": 5.686057280554882e-06, | |
| "loss": 0.0588, | |
| "mean_token_accuracy": 0.9804639518260956, | |
| "step": 1471 | |
| }, | |
| { | |
| "epoch": 8.609970674486803, | |
| "grad_norm": 0.2801786700225434, | |
| "learning_rate": 5.671595817954157e-06, | |
| "loss": 0.064, | |
| "mean_token_accuracy": 0.9800218939781189, | |
| "step": 1472 | |
| }, | |
| { | |
| "epoch": 8.6158357771261, | |
| "grad_norm": 0.2970190736822563, | |
| "learning_rate": 5.657193620865997e-06, | |
| "loss": 0.0591, | |
| "mean_token_accuracy": 0.9817324727773666, | |
| "step": 1473 | |
| }, | |
| { | |
| "epoch": 8.621700879765395, | |
| "grad_norm": 0.36583022561839285, | |
| "learning_rate": 5.642850741564562e-06, | |
| "loss": 0.0714, | |
| "mean_token_accuracy": 0.9777909740805626, | |
| "step": 1474 | |
| }, | |
| { | |
| "epoch": 8.627565982404692, | |
| "grad_norm": 0.2688933336929723, | |
| "learning_rate": 5.62856723210871e-06, | |
| "loss": 0.0669, | |
| "mean_token_accuracy": 0.9783223196864128, | |
| "step": 1475 | |
| }, | |
| { | |
| "epoch": 8.633431085043988, | |
| "grad_norm": 0.3036719866897421, | |
| "learning_rate": 5.614343144341814e-06, | |
| "loss": 0.07, | |
| "mean_token_accuracy": 0.9752448201179504, | |
| "step": 1476 | |
| }, | |
| { | |
| "epoch": 8.639296187683284, | |
| "grad_norm": 0.2737493833380235, | |
| "learning_rate": 5.600178529891564e-06, | |
| "loss": 0.0581, | |
| "mean_token_accuracy": 0.9796081408858299, | |
| "step": 1477 | |
| }, | |
| { | |
| "epoch": 8.64516129032258, | |
| "grad_norm": 0.3011765116046612, | |
| "learning_rate": 5.58607344016979e-06, | |
| "loss": 0.0761, | |
| "mean_token_accuracy": 0.9739836007356644, | |
| "step": 1478 | |
| }, | |
| { | |
| "epoch": 8.651026392961876, | |
| "grad_norm": 0.26931220609417367, | |
| "learning_rate": 5.5720279263722795e-06, | |
| "loss": 0.0613, | |
| "mean_token_accuracy": 0.9778261408209801, | |
| "step": 1479 | |
| }, | |
| { | |
| "epoch": 8.656891495601172, | |
| "grad_norm": 0.28849384987997934, | |
| "learning_rate": 5.558042039478564e-06, | |
| "loss": 0.0604, | |
| "mean_token_accuracy": 0.979794979095459, | |
| "step": 1480 | |
| }, | |
| { | |
| "epoch": 8.662756598240469, | |
| "grad_norm": 0.4481433032150989, | |
| "learning_rate": 5.544115830251769e-06, | |
| "loss": 0.0769, | |
| "mean_token_accuracy": 0.9757302552461624, | |
| "step": 1481 | |
| }, | |
| { | |
| "epoch": 8.668621700879765, | |
| "grad_norm": 0.24979918369391133, | |
| "learning_rate": 5.530249349238407e-06, | |
| "loss": 0.0655, | |
| "mean_token_accuracy": 0.9798487946391106, | |
| "step": 1482 | |
| }, | |
| { | |
| "epoch": 8.674486803519061, | |
| "grad_norm": 0.41387409378504497, | |
| "learning_rate": 5.516442646768207e-06, | |
| "loss": 0.0753, | |
| "mean_token_accuracy": 0.973071850836277, | |
| "step": 1483 | |
| }, | |
| { | |
| "epoch": 8.680351906158357, | |
| "grad_norm": 0.25828185399248105, | |
| "learning_rate": 5.502695772953922e-06, | |
| "loss": 0.0724, | |
| "mean_token_accuracy": 0.9765612930059433, | |
| "step": 1484 | |
| }, | |
| { | |
| "epoch": 8.686217008797653, | |
| "grad_norm": 0.26377515536943485, | |
| "learning_rate": 5.489008777691151e-06, | |
| "loss": 0.0633, | |
| "mean_token_accuracy": 0.9816120192408562, | |
| "step": 1485 | |
| }, | |
| { | |
| "epoch": 8.69208211143695, | |
| "grad_norm": 0.33227377501750116, | |
| "learning_rate": 5.475381710658161e-06, | |
| "loss": 0.066, | |
| "mean_token_accuracy": 0.9787192866206169, | |
| "step": 1486 | |
| }, | |
| { | |
| "epoch": 8.697947214076246, | |
| "grad_norm": 0.2817841011781313, | |
| "learning_rate": 5.4618146213157e-06, | |
| "loss": 0.0738, | |
| "mean_token_accuracy": 0.9738593846559525, | |
| "step": 1487 | |
| }, | |
| { | |
| "epoch": 8.703812316715542, | |
| "grad_norm": 0.35285024202029736, | |
| "learning_rate": 5.448307558906822e-06, | |
| "loss": 0.0704, | |
| "mean_token_accuracy": 0.9770059287548065, | |
| "step": 1488 | |
| }, | |
| { | |
| "epoch": 8.709677419354838, | |
| "grad_norm": 0.2562314555826694, | |
| "learning_rate": 5.434860572456711e-06, | |
| "loss": 0.0625, | |
| "mean_token_accuracy": 0.9778687655925751, | |
| "step": 1489 | |
| }, | |
| { | |
| "epoch": 8.715542521994134, | |
| "grad_norm": 0.2685199038032354, | |
| "learning_rate": 5.421473710772496e-06, | |
| "loss": 0.0656, | |
| "mean_token_accuracy": 0.9799297451972961, | |
| "step": 1490 | |
| }, | |
| { | |
| "epoch": 8.72140762463343, | |
| "grad_norm": 0.23958679172858152, | |
| "learning_rate": 5.408147022443077e-06, | |
| "loss": 0.0589, | |
| "mean_token_accuracy": 0.9788787066936493, | |
| "step": 1491 | |
| }, | |
| { | |
| "epoch": 8.727272727272727, | |
| "grad_norm": 0.37532852839555414, | |
| "learning_rate": 5.39488055583895e-06, | |
| "loss": 0.0699, | |
| "mean_token_accuracy": 0.9793713614344597, | |
| "step": 1492 | |
| }, | |
| { | |
| "epoch": 8.733137829912023, | |
| "grad_norm": 0.30168346493046544, | |
| "learning_rate": 5.3816743591120365e-06, | |
| "loss": 0.0652, | |
| "mean_token_accuracy": 0.9784124940633774, | |
| "step": 1493 | |
| }, | |
| { | |
| "epoch": 8.739002932551319, | |
| "grad_norm": 0.32283133480358384, | |
| "learning_rate": 5.368528480195492e-06, | |
| "loss": 0.0686, | |
| "mean_token_accuracy": 0.9787509068846703, | |
| "step": 1494 | |
| }, | |
| { | |
| "epoch": 8.744868035190615, | |
| "grad_norm": 0.18242889955187325, | |
| "learning_rate": 5.355442966803544e-06, | |
| "loss": 0.0514, | |
| "mean_token_accuracy": 0.9821577444672585, | |
| "step": 1495 | |
| }, | |
| { | |
| "epoch": 8.750733137829911, | |
| "grad_norm": 0.3571788534447416, | |
| "learning_rate": 5.342417866431326e-06, | |
| "loss": 0.073, | |
| "mean_token_accuracy": 0.9728782624006271, | |
| "step": 1496 | |
| }, | |
| { | |
| "epoch": 8.756598240469208, | |
| "grad_norm": 0.32907123108892783, | |
| "learning_rate": 5.329453226354692e-06, | |
| "loss": 0.068, | |
| "mean_token_accuracy": 0.9787439778447151, | |
| "step": 1497 | |
| }, | |
| { | |
| "epoch": 8.762463343108504, | |
| "grad_norm": 0.23510121292718625, | |
| "learning_rate": 5.31654909363005e-06, | |
| "loss": 0.0637, | |
| "mean_token_accuracy": 0.98048335313797, | |
| "step": 1498 | |
| }, | |
| { | |
| "epoch": 8.7683284457478, | |
| "grad_norm": 0.3292238101609866, | |
| "learning_rate": 5.303705515094187e-06, | |
| "loss": 0.0817, | |
| "mean_token_accuracy": 0.9763019159436226, | |
| "step": 1499 | |
| }, | |
| { | |
| "epoch": 8.774193548387096, | |
| "grad_norm": 0.31772186010158904, | |
| "learning_rate": 5.290922537364109e-06, | |
| "loss": 0.0784, | |
| "mean_token_accuracy": 0.9719724953174591, | |
| "step": 1500 | |
| }, | |
| { | |
| "epoch": 8.780058651026392, | |
| "grad_norm": 0.43601699445852304, | |
| "learning_rate": 5.278200206836861e-06, | |
| "loss": 0.0697, | |
| "mean_token_accuracy": 0.9765214696526527, | |
| "step": 1501 | |
| }, | |
| { | |
| "epoch": 8.785923753665688, | |
| "grad_norm": 0.24631726988703956, | |
| "learning_rate": 5.265538569689365e-06, | |
| "loss": 0.0614, | |
| "mean_token_accuracy": 0.9786100387573242, | |
| "step": 1502 | |
| }, | |
| { | |
| "epoch": 8.791788856304985, | |
| "grad_norm": 0.23348947782863988, | |
| "learning_rate": 5.25293767187825e-06, | |
| "loss": 0.0599, | |
| "mean_token_accuracy": 0.9810864999890327, | |
| "step": 1503 | |
| }, | |
| { | |
| "epoch": 8.79765395894428, | |
| "grad_norm": 0.3059628956328224, | |
| "learning_rate": 5.240397559139685e-06, | |
| "loss": 0.07, | |
| "mean_token_accuracy": 0.9774987623095512, | |
| "step": 1504 | |
| }, | |
| { | |
| "epoch": 8.803519061583577, | |
| "grad_norm": 0.2271743851090028, | |
| "learning_rate": 5.227918276989215e-06, | |
| "loss": 0.0613, | |
| "mean_token_accuracy": 0.9777540266513824, | |
| "step": 1505 | |
| }, | |
| { | |
| "epoch": 8.809384164222873, | |
| "grad_norm": 0.24351844409606843, | |
| "learning_rate": 5.2154998707215976e-06, | |
| "loss": 0.0626, | |
| "mean_token_accuracy": 0.9778146594762802, | |
| "step": 1506 | |
| }, | |
| { | |
| "epoch": 8.81524926686217, | |
| "grad_norm": 0.32814994186809865, | |
| "learning_rate": 5.203142385410628e-06, | |
| "loss": 0.06, | |
| "mean_token_accuracy": 0.9819656610488892, | |
| "step": 1507 | |
| }, | |
| { | |
| "epoch": 8.821114369501466, | |
| "grad_norm": 0.24451284850690339, | |
| "learning_rate": 5.190845865908987e-06, | |
| "loss": 0.0597, | |
| "mean_token_accuracy": 0.9768688082695007, | |
| "step": 1508 | |
| }, | |
| { | |
| "epoch": 8.826979472140762, | |
| "grad_norm": 0.35278726245532943, | |
| "learning_rate": 5.178610356848075e-06, | |
| "loss": 0.0667, | |
| "mean_token_accuracy": 0.977623425424099, | |
| "step": 1509 | |
| }, | |
| { | |
| "epoch": 8.832844574780058, | |
| "grad_norm": 0.26703275114310665, | |
| "learning_rate": 5.166435902637848e-06, | |
| "loss": 0.0577, | |
| "mean_token_accuracy": 0.9784714952111244, | |
| "step": 1510 | |
| }, | |
| { | |
| "epoch": 8.838709677419354, | |
| "grad_norm": 0.248994392474072, | |
| "learning_rate": 5.154322547466658e-06, | |
| "loss": 0.0606, | |
| "mean_token_accuracy": 0.9805686995387077, | |
| "step": 1511 | |
| }, | |
| { | |
| "epoch": 8.84457478005865, | |
| "grad_norm": 0.2515813584107477, | |
| "learning_rate": 5.142270335301095e-06, | |
| "loss": 0.0598, | |
| "mean_token_accuracy": 0.9793656393885612, | |
| "step": 1512 | |
| }, | |
| { | |
| "epoch": 8.850439882697946, | |
| "grad_norm": 0.2800317772193338, | |
| "learning_rate": 5.130279309885817e-06, | |
| "loss": 0.0611, | |
| "mean_token_accuracy": 0.9782811179757118, | |
| "step": 1513 | |
| }, | |
| { | |
| "epoch": 8.856304985337243, | |
| "grad_norm": 0.3103625834439301, | |
| "learning_rate": 5.118349514743404e-06, | |
| "loss": 0.0761, | |
| "mean_token_accuracy": 0.9753215909004211, | |
| "step": 1514 | |
| }, | |
| { | |
| "epoch": 8.862170087976539, | |
| "grad_norm": 0.3526394713823649, | |
| "learning_rate": 5.1064809931741975e-06, | |
| "loss": 0.08, | |
| "mean_token_accuracy": 0.9744403213262558, | |
| "step": 1515 | |
| }, | |
| { | |
| "epoch": 8.868035190615835, | |
| "grad_norm": 0.2560684899292793, | |
| "learning_rate": 5.094673788256137e-06, | |
| "loss": 0.0603, | |
| "mean_token_accuracy": 0.9820585176348686, | |
| "step": 1516 | |
| }, | |
| { | |
| "epoch": 8.873900293255131, | |
| "grad_norm": 0.29971977388113474, | |
| "learning_rate": 5.082927942844603e-06, | |
| "loss": 0.0725, | |
| "mean_token_accuracy": 0.9782021939754486, | |
| "step": 1517 | |
| }, | |
| { | |
| "epoch": 8.879765395894427, | |
| "grad_norm": 0.2637842941875261, | |
| "learning_rate": 5.0712434995722734e-06, | |
| "loss": 0.0655, | |
| "mean_token_accuracy": 0.9771728515625, | |
| "step": 1518 | |
| }, | |
| { | |
| "epoch": 8.885630498533724, | |
| "grad_norm": 0.314860671900927, | |
| "learning_rate": 5.059620500848964e-06, | |
| "loss": 0.067, | |
| "mean_token_accuracy": 0.9802365675568581, | |
| "step": 1519 | |
| }, | |
| { | |
| "epoch": 8.89149560117302, | |
| "grad_norm": 0.2764897492165904, | |
| "learning_rate": 5.048058988861455e-06, | |
| "loss": 0.066, | |
| "mean_token_accuracy": 0.9786038100719452, | |
| "step": 1520 | |
| }, | |
| { | |
| "epoch": 8.897360703812316, | |
| "grad_norm": 0.22492681630529185, | |
| "learning_rate": 5.0365590055733715e-06, | |
| "loss": 0.0603, | |
| "mean_token_accuracy": 0.9813904538750648, | |
| "step": 1521 | |
| }, | |
| { | |
| "epoch": 8.903225806451612, | |
| "grad_norm": 0.3147598263869332, | |
| "learning_rate": 5.025120592725009e-06, | |
| "loss": 0.0735, | |
| "mean_token_accuracy": 0.9776310920715332, | |
| "step": 1522 | |
| }, | |
| { | |
| "epoch": 8.909090909090908, | |
| "grad_norm": 0.34074469808815616, | |
| "learning_rate": 5.013743791833187e-06, | |
| "loss": 0.0694, | |
| "mean_token_accuracy": 0.979078084230423, | |
| "step": 1523 | |
| }, | |
| { | |
| "epoch": 8.914956011730204, | |
| "grad_norm": 0.26152297778605377, | |
| "learning_rate": 5.002428644191094e-06, | |
| "loss": 0.0648, | |
| "mean_token_accuracy": 0.9802012667059898, | |
| "step": 1524 | |
| }, | |
| { | |
| "epoch": 8.9208211143695, | |
| "grad_norm": 0.2578124682100099, | |
| "learning_rate": 4.991175190868148e-06, | |
| "loss": 0.0684, | |
| "mean_token_accuracy": 0.9790066555142403, | |
| "step": 1525 | |
| }, | |
| { | |
| "epoch": 8.926686217008797, | |
| "grad_norm": 0.2376683233781485, | |
| "learning_rate": 4.9799834727098415e-06, | |
| "loss": 0.0576, | |
| "mean_token_accuracy": 0.9810256138443947, | |
| "step": 1526 | |
| }, | |
| { | |
| "epoch": 8.932551319648093, | |
| "grad_norm": 0.28798216028853235, | |
| "learning_rate": 4.968853530337587e-06, | |
| "loss": 0.0679, | |
| "mean_token_accuracy": 0.9783130586147308, | |
| "step": 1527 | |
| }, | |
| { | |
| "epoch": 8.93841642228739, | |
| "grad_norm": 0.21284234625947226, | |
| "learning_rate": 4.957785404148585e-06, | |
| "loss": 0.0564, | |
| "mean_token_accuracy": 0.9768381863832474, | |
| "step": 1528 | |
| }, | |
| { | |
| "epoch": 8.944281524926687, | |
| "grad_norm": 0.28902188952693547, | |
| "learning_rate": 4.946779134315662e-06, | |
| "loss": 0.0715, | |
| "mean_token_accuracy": 0.9778866022825241, | |
| "step": 1529 | |
| }, | |
| { | |
| "epoch": 8.950146627565982, | |
| "grad_norm": 0.24256775933774233, | |
| "learning_rate": 4.935834760787133e-06, | |
| "loss": 0.0648, | |
| "mean_token_accuracy": 0.9800405576825142, | |
| "step": 1530 | |
| }, | |
| { | |
| "epoch": 8.95601173020528, | |
| "grad_norm": 0.2761848112737178, | |
| "learning_rate": 4.924952323286651e-06, | |
| "loss": 0.0657, | |
| "mean_token_accuracy": 0.9786167815327644, | |
| "step": 1531 | |
| }, | |
| { | |
| "epoch": 8.961876832844574, | |
| "grad_norm": 0.29420548049004036, | |
| "learning_rate": 4.91413186131307e-06, | |
| "loss": 0.0668, | |
| "mean_token_accuracy": 0.979533426463604, | |
| "step": 1532 | |
| }, | |
| { | |
| "epoch": 8.967741935483872, | |
| "grad_norm": 0.26436955721941907, | |
| "learning_rate": 4.9033734141402964e-06, | |
| "loss": 0.0671, | |
| "mean_token_accuracy": 0.9764868766069412, | |
| "step": 1533 | |
| }, | |
| { | |
| "epoch": 8.973607038123166, | |
| "grad_norm": 0.2454668416645823, | |
| "learning_rate": 4.892677020817151e-06, | |
| "loss": 0.0626, | |
| "mean_token_accuracy": 0.9776885583996773, | |
| "step": 1534 | |
| }, | |
| { | |
| "epoch": 8.979472140762464, | |
| "grad_norm": 0.274662605031312, | |
| "learning_rate": 4.8820427201672195e-06, | |
| "loss": 0.0657, | |
| "mean_token_accuracy": 0.9772161841392517, | |
| "step": 1535 | |
| }, | |
| { | |
| "epoch": 8.985337243401759, | |
| "grad_norm": 0.46063707991159386, | |
| "learning_rate": 4.871470550788717e-06, | |
| "loss": 0.0773, | |
| "mean_token_accuracy": 0.9726253524422646, | |
| "step": 1536 | |
| }, | |
| { | |
| "epoch": 8.991202346041057, | |
| "grad_norm": 0.2598327356469058, | |
| "learning_rate": 4.860960551054352e-06, | |
| "loss": 0.0663, | |
| "mean_token_accuracy": 0.9794004708528519, | |
| "step": 1537 | |
| }, | |
| { | |
| "epoch": 8.997067448680351, | |
| "grad_norm": 0.22718132289500315, | |
| "learning_rate": 4.850512759111177e-06, | |
| "loss": 0.0649, | |
| "mean_token_accuracy": 0.9775163903832436, | |
| "step": 1538 | |
| }, | |
| { | |
| "epoch": 9.0, | |
| "grad_norm": 0.22718132289500315, | |
| "learning_rate": 4.840127212880457e-06, | |
| "loss": 0.0574, | |
| "mean_token_accuracy": 0.9819877296686172, | |
| "step": 1539 | |
| }, | |
| { | |
| "epoch": 9.005865102639296, | |
| "grad_norm": 0.3373449133841611, | |
| "learning_rate": 4.82980395005753e-06, | |
| "loss": 0.0627, | |
| "mean_token_accuracy": 0.9807324483990669, | |
| "step": 1540 | |
| }, | |
| { | |
| "epoch": 9.011730205278592, | |
| "grad_norm": 0.2660171293852756, | |
| "learning_rate": 4.8195430081116715e-06, | |
| "loss": 0.0652, | |
| "mean_token_accuracy": 0.9796838536858559, | |
| "step": 1541 | |
| }, | |
| { | |
| "epoch": 9.017595307917889, | |
| "grad_norm": 0.22636028477722886, | |
| "learning_rate": 4.809344424285959e-06, | |
| "loss": 0.0533, | |
| "mean_token_accuracy": 0.9821007177233696, | |
| "step": 1542 | |
| }, | |
| { | |
| "epoch": 9.023460410557185, | |
| "grad_norm": 0.26586548560149015, | |
| "learning_rate": 4.799208235597129e-06, | |
| "loss": 0.0677, | |
| "mean_token_accuracy": 0.9777081608772278, | |
| "step": 1543 | |
| }, | |
| { | |
| "epoch": 9.029325513196481, | |
| "grad_norm": 0.2777149628567429, | |
| "learning_rate": 4.7891344788354535e-06, | |
| "loss": 0.0635, | |
| "mean_token_accuracy": 0.9795635268092155, | |
| "step": 1544 | |
| }, | |
| { | |
| "epoch": 9.035190615835777, | |
| "grad_norm": 0.3049973622375955, | |
| "learning_rate": 4.779123190564601e-06, | |
| "loss": 0.0726, | |
| "mean_token_accuracy": 0.9791153743863106, | |
| "step": 1545 | |
| }, | |
| { | |
| "epoch": 9.041055718475073, | |
| "grad_norm": 0.2593981713494098, | |
| "learning_rate": 4.769174407121508e-06, | |
| "loss": 0.0592, | |
| "mean_token_accuracy": 0.9796656146645546, | |
| "step": 1546 | |
| }, | |
| { | |
| "epoch": 9.04692082111437, | |
| "grad_norm": 0.2288825869864337, | |
| "learning_rate": 4.7592881646162336e-06, | |
| "loss": 0.0728, | |
| "mean_token_accuracy": 0.9769270122051239, | |
| "step": 1547 | |
| }, | |
| { | |
| "epoch": 9.052785923753666, | |
| "grad_norm": 0.28065390101886106, | |
| "learning_rate": 4.749464498931852e-06, | |
| "loss": 0.0512, | |
| "mean_token_accuracy": 0.9819205701351166, | |
| "step": 1548 | |
| }, | |
| { | |
| "epoch": 9.058651026392962, | |
| "grad_norm": 0.23404322036009262, | |
| "learning_rate": 4.739703445724296e-06, | |
| "loss": 0.0628, | |
| "mean_token_accuracy": 0.9835373759269714, | |
| "step": 1549 | |
| }, | |
| { | |
| "epoch": 9.064516129032258, | |
| "grad_norm": 0.2359254198323517, | |
| "learning_rate": 4.730005040422253e-06, | |
| "loss": 0.0562, | |
| "mean_token_accuracy": 0.9820079058408737, | |
| "step": 1550 | |
| }, | |
| { | |
| "epoch": 9.070381231671554, | |
| "grad_norm": 0.251373443947105, | |
| "learning_rate": 4.720369318227014e-06, | |
| "loss": 0.0562, | |
| "mean_token_accuracy": 0.9808452799916267, | |
| "step": 1551 | |
| }, | |
| { | |
| "epoch": 9.07624633431085, | |
| "grad_norm": 0.28719476489184603, | |
| "learning_rate": 4.710796314112358e-06, | |
| "loss": 0.0619, | |
| "mean_token_accuracy": 0.9796859174966812, | |
| "step": 1552 | |
| }, | |
| { | |
| "epoch": 9.082111436950147, | |
| "grad_norm": 0.2642294407118322, | |
| "learning_rate": 4.701286062824425e-06, | |
| "loss": 0.0582, | |
| "mean_token_accuracy": 0.9804991409182549, | |
| "step": 1553 | |
| }, | |
| { | |
| "epoch": 9.087976539589443, | |
| "grad_norm": 0.2775262974710003, | |
| "learning_rate": 4.691838598881587e-06, | |
| "loss": 0.064, | |
| "mean_token_accuracy": 0.9793780148029327, | |
| "step": 1554 | |
| }, | |
| { | |
| "epoch": 9.093841642228739, | |
| "grad_norm": 0.2170466478297633, | |
| "learning_rate": 4.68245395657432e-06, | |
| "loss": 0.0591, | |
| "mean_token_accuracy": 0.9822722375392914, | |
| "step": 1555 | |
| }, | |
| { | |
| "epoch": 9.099706744868035, | |
| "grad_norm": 0.22836245350689496, | |
| "learning_rate": 4.673132169965089e-06, | |
| "loss": 0.058, | |
| "mean_token_accuracy": 0.9818886816501617, | |
| "step": 1556 | |
| }, | |
| { | |
| "epoch": 9.105571847507331, | |
| "grad_norm": 0.20963749864182238, | |
| "learning_rate": 4.663873272888212e-06, | |
| "loss": 0.0539, | |
| "mean_token_accuracy": 0.9836770445108414, | |
| "step": 1557 | |
| }, | |
| { | |
| "epoch": 9.111436950146627, | |
| "grad_norm": 0.21765369676261487, | |
| "learning_rate": 4.654677298949746e-06, | |
| "loss": 0.0598, | |
| "mean_token_accuracy": 0.9772438853979111, | |
| "step": 1558 | |
| }, | |
| { | |
| "epoch": 9.117302052785924, | |
| "grad_norm": 0.22920276916129523, | |
| "learning_rate": 4.645544281527362e-06, | |
| "loss": 0.0584, | |
| "mean_token_accuracy": 0.9798052459955215, | |
| "step": 1559 | |
| }, | |
| { | |
| "epoch": 9.12316715542522, | |
| "grad_norm": 0.2248070439208056, | |
| "learning_rate": 4.636474253770226e-06, | |
| "loss": 0.0512, | |
| "mean_token_accuracy": 0.9813929721713066, | |
| "step": 1560 | |
| }, | |
| { | |
| "epoch": 9.129032258064516, | |
| "grad_norm": 0.23392902750221167, | |
| "learning_rate": 4.627467248598876e-06, | |
| "loss": 0.0608, | |
| "mean_token_accuracy": 0.9810582026839256, | |
| "step": 1561 | |
| }, | |
| { | |
| "epoch": 9.134897360703812, | |
| "grad_norm": 0.35068933786793133, | |
| "learning_rate": 4.618523298705101e-06, | |
| "loss": 0.0607, | |
| "mean_token_accuracy": 0.9805464595556259, | |
| "step": 1562 | |
| }, | |
| { | |
| "epoch": 9.140762463343108, | |
| "grad_norm": 0.2639000982065526, | |
| "learning_rate": 4.609642436551828e-06, | |
| "loss": 0.0598, | |
| "mean_token_accuracy": 0.9803919643163681, | |
| "step": 1563 | |
| }, | |
| { | |
| "epoch": 9.146627565982405, | |
| "grad_norm": 0.23878300312488335, | |
| "learning_rate": 4.600824694373e-06, | |
| "loss": 0.0547, | |
| "mean_token_accuracy": 0.983349584043026, | |
| "step": 1564 | |
| }, | |
| { | |
| "epoch": 9.1524926686217, | |
| "grad_norm": 0.245763989099376, | |
| "learning_rate": 4.592070104173461e-06, | |
| "loss": 0.0577, | |
| "mean_token_accuracy": 0.9818849116563797, | |
| "step": 1565 | |
| }, | |
| { | |
| "epoch": 9.158357771260997, | |
| "grad_norm": 0.23279489695026834, | |
| "learning_rate": 4.583378697728835e-06, | |
| "loss": 0.0607, | |
| "mean_token_accuracy": 0.979814425110817, | |
| "step": 1566 | |
| }, | |
| { | |
| "epoch": 9.164222873900293, | |
| "grad_norm": 0.2356722070822205, | |
| "learning_rate": 4.574750506585419e-06, | |
| "loss": 0.0548, | |
| "mean_token_accuracy": 0.9795899465680122, | |
| "step": 1567 | |
| }, | |
| { | |
| "epoch": 9.17008797653959, | |
| "grad_norm": 0.3100951878952791, | |
| "learning_rate": 4.566185562060062e-06, | |
| "loss": 0.0645, | |
| "mean_token_accuracy": 0.9785728603601456, | |
| "step": 1568 | |
| }, | |
| { | |
| "epoch": 9.175953079178885, | |
| "grad_norm": 0.23758456109764423, | |
| "learning_rate": 4.557683895240052e-06, | |
| "loss": 0.0715, | |
| "mean_token_accuracy": 0.9810535088181496, | |
| "step": 1569 | |
| }, | |
| { | |
| "epoch": 9.181818181818182, | |
| "grad_norm": 0.397590962919886, | |
| "learning_rate": 4.549245536983009e-06, | |
| "loss": 0.0605, | |
| "mean_token_accuracy": 0.9803250953555107, | |
| "step": 1570 | |
| }, | |
| { | |
| "epoch": 9.187683284457478, | |
| "grad_norm": 0.2709295526230137, | |
| "learning_rate": 4.540870517916765e-06, | |
| "loss": 0.0604, | |
| "mean_token_accuracy": 0.9819755181670189, | |
| "step": 1571 | |
| }, | |
| { | |
| "epoch": 9.193548387096774, | |
| "grad_norm": 0.26365206186856915, | |
| "learning_rate": 4.532558868439249e-06, | |
| "loss": 0.0625, | |
| "mean_token_accuracy": 0.9806329160928726, | |
| "step": 1572 | |
| }, | |
| { | |
| "epoch": 9.19941348973607, | |
| "grad_norm": 0.23180936574802335, | |
| "learning_rate": 4.524310618718403e-06, | |
| "loss": 0.0584, | |
| "mean_token_accuracy": 0.9812423959374428, | |
| "step": 1573 | |
| }, | |
| { | |
| "epoch": 9.205278592375366, | |
| "grad_norm": 0.2479537384786009, | |
| "learning_rate": 4.516125798692037e-06, | |
| "loss": 0.0602, | |
| "mean_token_accuracy": 0.9807397574186325, | |
| "step": 1574 | |
| }, | |
| { | |
| "epoch": 9.211143695014663, | |
| "grad_norm": 0.2703042461210012, | |
| "learning_rate": 4.508004438067742e-06, | |
| "loss": 0.0654, | |
| "mean_token_accuracy": 0.9777510985732079, | |
| "step": 1575 | |
| }, | |
| { | |
| "epoch": 9.217008797653959, | |
| "grad_norm": 0.24641143582704136, | |
| "learning_rate": 4.4999465663227785e-06, | |
| "loss": 0.055, | |
| "mean_token_accuracy": 0.9833300933241844, | |
| "step": 1576 | |
| }, | |
| { | |
| "epoch": 9.222873900293255, | |
| "grad_norm": 0.3188227218692964, | |
| "learning_rate": 4.491952212703964e-06, | |
| "loss": 0.0627, | |
| "mean_token_accuracy": 0.9781920313835144, | |
| "step": 1577 | |
| }, | |
| { | |
| "epoch": 9.228739002932551, | |
| "grad_norm": 0.24287084230549283, | |
| "learning_rate": 4.484021406227576e-06, | |
| "loss": 0.0672, | |
| "mean_token_accuracy": 0.9793609604239464, | |
| "step": 1578 | |
| }, | |
| { | |
| "epoch": 9.234604105571847, | |
| "grad_norm": 0.5712888100286926, | |
| "learning_rate": 4.476154175679239e-06, | |
| "loss": 0.0651, | |
| "mean_token_accuracy": 0.9787584096193314, | |
| "step": 1579 | |
| }, | |
| { | |
| "epoch": 9.240469208211143, | |
| "grad_norm": 0.24728813746312206, | |
| "learning_rate": 4.468350549613822e-06, | |
| "loss": 0.0518, | |
| "mean_token_accuracy": 0.9819226488471031, | |
| "step": 1580 | |
| }, | |
| { | |
| "epoch": 9.24633431085044, | |
| "grad_norm": 0.31373054201391787, | |
| "learning_rate": 4.460610556355333e-06, | |
| "loss": 0.0667, | |
| "mean_token_accuracy": 0.9764752313494682, | |
| "step": 1581 | |
| }, | |
| { | |
| "epoch": 9.252199413489736, | |
| "grad_norm": 0.2385955767542648, | |
| "learning_rate": 4.452934223996824e-06, | |
| "loss": 0.0551, | |
| "mean_token_accuracy": 0.982081227004528, | |
| "step": 1582 | |
| }, | |
| { | |
| "epoch": 9.258064516129032, | |
| "grad_norm": 0.22005978869984863, | |
| "learning_rate": 4.445321580400281e-06, | |
| "loss": 0.0577, | |
| "mean_token_accuracy": 0.9792909696698189, | |
| "step": 1583 | |
| }, | |
| { | |
| "epoch": 9.263929618768328, | |
| "grad_norm": 0.24484280057621527, | |
| "learning_rate": 4.437772653196527e-06, | |
| "loss": 0.064, | |
| "mean_token_accuracy": 0.9810920730233192, | |
| "step": 1584 | |
| }, | |
| { | |
| "epoch": 9.269794721407624, | |
| "grad_norm": 0.2744886690898688, | |
| "learning_rate": 4.430287469785118e-06, | |
| "loss": 0.0749, | |
| "mean_token_accuracy": 0.9745521992444992, | |
| "step": 1585 | |
| }, | |
| { | |
| "epoch": 9.27565982404692, | |
| "grad_norm": 0.30860823590247866, | |
| "learning_rate": 4.422866057334246e-06, | |
| "loss": 0.0646, | |
| "mean_token_accuracy": 0.9808021262288094, | |
| "step": 1586 | |
| }, | |
| { | |
| "epoch": 9.281524926686217, | |
| "grad_norm": 0.2704520421355319, | |
| "learning_rate": 4.415508442780642e-06, | |
| "loss": 0.0712, | |
| "mean_token_accuracy": 0.9760096520185471, | |
| "step": 1587 | |
| }, | |
| { | |
| "epoch": 9.287390029325513, | |
| "grad_norm": 0.424541508090357, | |
| "learning_rate": 4.408214652829473e-06, | |
| "loss": 0.0649, | |
| "mean_token_accuracy": 0.9811082854866982, | |
| "step": 1588 | |
| }, | |
| { | |
| "epoch": 9.29325513196481, | |
| "grad_norm": 0.2165123109041628, | |
| "learning_rate": 4.400984713954253e-06, | |
| "loss": 0.0502, | |
| "mean_token_accuracy": 0.9843787923455238, | |
| "step": 1589 | |
| }, | |
| { | |
| "epoch": 9.299120234604105, | |
| "grad_norm": 0.2700478826426142, | |
| "learning_rate": 4.39381865239674e-06, | |
| "loss": 0.0688, | |
| "mean_token_accuracy": 0.9778083339333534, | |
| "step": 1590 | |
| }, | |
| { | |
| "epoch": 9.304985337243401, | |
| "grad_norm": 0.2938143417822229, | |
| "learning_rate": 4.386716494166842e-06, | |
| "loss": 0.0647, | |
| "mean_token_accuracy": 0.9770649150013924, | |
| "step": 1591 | |
| }, | |
| { | |
| "epoch": 9.310850439882698, | |
| "grad_norm": 0.3118340992348746, | |
| "learning_rate": 4.379678265042529e-06, | |
| "loss": 0.0636, | |
| "mean_token_accuracy": 0.9765070602297783, | |
| "step": 1592 | |
| }, | |
| { | |
| "epoch": 9.316715542521994, | |
| "grad_norm": 0.2695776935953318, | |
| "learning_rate": 4.372703990569725e-06, | |
| "loss": 0.0634, | |
| "mean_token_accuracy": 0.9807394593954086, | |
| "step": 1593 | |
| }, | |
| { | |
| "epoch": 9.32258064516129, | |
| "grad_norm": 0.32864370703275114, | |
| "learning_rate": 4.365793696062231e-06, | |
| "loss": 0.0659, | |
| "mean_token_accuracy": 0.9772561341524124, | |
| "step": 1594 | |
| }, | |
| { | |
| "epoch": 9.328445747800586, | |
| "grad_norm": 0.2561966735650709, | |
| "learning_rate": 4.358947406601626e-06, | |
| "loss": 0.0566, | |
| "mean_token_accuracy": 0.9803736731410027, | |
| "step": 1595 | |
| }, | |
| { | |
| "epoch": 9.334310850439882, | |
| "grad_norm": 0.21566529835698572, | |
| "learning_rate": 4.352165147037177e-06, | |
| "loss": 0.0622, | |
| "mean_token_accuracy": 0.9788841158151627, | |
| "step": 1596 | |
| }, | |
| { | |
| "epoch": 9.340175953079179, | |
| "grad_norm": 0.2560030399776893, | |
| "learning_rate": 4.345446941985741e-06, | |
| "loss": 0.0571, | |
| "mean_token_accuracy": 0.9801218211650848, | |
| "step": 1597 | |
| }, | |
| { | |
| "epoch": 9.346041055718475, | |
| "grad_norm": 0.23202327830121788, | |
| "learning_rate": 4.338792815831698e-06, | |
| "loss": 0.0576, | |
| "mean_token_accuracy": 0.9768727198243141, | |
| "step": 1598 | |
| }, | |
| { | |
| "epoch": 9.351906158357771, | |
| "grad_norm": 0.28701737565493496, | |
| "learning_rate": 4.332202792726832e-06, | |
| "loss": 0.0699, | |
| "mean_token_accuracy": 0.9776144996285439, | |
| "step": 1599 | |
| }, | |
| { | |
| "epoch": 9.357771260997067, | |
| "grad_norm": 0.26539530331958283, | |
| "learning_rate": 4.3256768965902684e-06, | |
| "loss": 0.0649, | |
| "mean_token_accuracy": 0.977261483669281, | |
| "step": 1600 | |
| }, | |
| { | |
| "epoch": 9.363636363636363, | |
| "grad_norm": 0.28930665981214404, | |
| "learning_rate": 4.319215151108373e-06, | |
| "loss": 0.0768, | |
| "mean_token_accuracy": 0.9744983091950417, | |
| "step": 1601 | |
| }, | |
| { | |
| "epoch": 9.36950146627566, | |
| "grad_norm": 0.26360393051635667, | |
| "learning_rate": 4.312817579734673e-06, | |
| "loss": 0.0604, | |
| "mean_token_accuracy": 0.9822754934430122, | |
| "step": 1602 | |
| }, | |
| { | |
| "epoch": 9.375366568914956, | |
| "grad_norm": 0.26032508841711777, | |
| "learning_rate": 4.306484205689768e-06, | |
| "loss": 0.0672, | |
| "mean_token_accuracy": 0.9772569611668587, | |
| "step": 1603 | |
| }, | |
| { | |
| "epoch": 9.381231671554252, | |
| "grad_norm": 0.2589814486685344, | |
| "learning_rate": 4.300215051961248e-06, | |
| "loss": 0.0668, | |
| "mean_token_accuracy": 0.9801448434591293, | |
| "step": 1604 | |
| }, | |
| { | |
| "epoch": 9.387096774193548, | |
| "grad_norm": 0.25436834908109707, | |
| "learning_rate": 4.2940101413036115e-06, | |
| "loss": 0.058, | |
| "mean_token_accuracy": 0.9820843636989594, | |
| "step": 1605 | |
| }, | |
| { | |
| "epoch": 9.392961876832844, | |
| "grad_norm": 0.2864933202075239, | |
| "learning_rate": 4.287869496238174e-06, | |
| "loss": 0.0699, | |
| "mean_token_accuracy": 0.9779408723115921, | |
| "step": 1606 | |
| }, | |
| { | |
| "epoch": 9.39882697947214, | |
| "grad_norm": 0.24064124268536968, | |
| "learning_rate": 4.281793139053001e-06, | |
| "loss": 0.0598, | |
| "mean_token_accuracy": 0.9796174690127373, | |
| "step": 1607 | |
| }, | |
| { | |
| "epoch": 9.404692082111437, | |
| "grad_norm": 0.26540587503625385, | |
| "learning_rate": 4.275781091802811e-06, | |
| "loss": 0.0803, | |
| "mean_token_accuracy": 0.9738316759467125, | |
| "step": 1608 | |
| }, | |
| { | |
| "epoch": 9.410557184750733, | |
| "grad_norm": 0.32309718949209215, | |
| "learning_rate": 4.26983337630891e-06, | |
| "loss": 0.0641, | |
| "mean_token_accuracy": 0.9797775819897652, | |
| "step": 1609 | |
| }, | |
| { | |
| "epoch": 9.416422287390029, | |
| "grad_norm": 0.3022671879036554, | |
| "learning_rate": 4.263950014159103e-06, | |
| "loss": 0.0662, | |
| "mean_token_accuracy": 0.9769906178116798, | |
| "step": 1610 | |
| }, | |
| { | |
| "epoch": 9.422287390029325, | |
| "grad_norm": 0.25312838489835093, | |
| "learning_rate": 4.258131026707618e-06, | |
| "loss": 0.0564, | |
| "mean_token_accuracy": 0.9816729798913002, | |
| "step": 1611 | |
| }, | |
| { | |
| "epoch": 9.428152492668621, | |
| "grad_norm": 0.24041663444550618, | |
| "learning_rate": 4.2523764350750305e-06, | |
| "loss": 0.067, | |
| "mean_token_accuracy": 0.9787357822060585, | |
| "step": 1612 | |
| }, | |
| { | |
| "epoch": 9.434017595307918, | |
| "grad_norm": 0.24088936329489447, | |
| "learning_rate": 4.246686260148179e-06, | |
| "loss": 0.0605, | |
| "mean_token_accuracy": 0.9813699051737785, | |
| "step": 1613 | |
| }, | |
| { | |
| "epoch": 9.439882697947214, | |
| "grad_norm": 0.2902013485039333, | |
| "learning_rate": 4.241060522580108e-06, | |
| "loss": 0.0752, | |
| "mean_token_accuracy": 0.9762661457061768, | |
| "step": 1614 | |
| }, | |
| { | |
| "epoch": 9.44574780058651, | |
| "grad_norm": 0.26593653778779147, | |
| "learning_rate": 4.2354992427899674e-06, | |
| "loss": 0.0574, | |
| "mean_token_accuracy": 0.9807288646697998, | |
| "step": 1615 | |
| }, | |
| { | |
| "epoch": 9.451612903225806, | |
| "grad_norm": 0.27473853653381974, | |
| "learning_rate": 4.23000244096296e-06, | |
| "loss": 0.0619, | |
| "mean_token_accuracy": 0.9793464988470078, | |
| "step": 1616 | |
| }, | |
| { | |
| "epoch": 9.457478005865102, | |
| "grad_norm": 0.23339911568703745, | |
| "learning_rate": 4.224570137050254e-06, | |
| "loss": 0.0492, | |
| "mean_token_accuracy": 0.9841224849224091, | |
| "step": 1617 | |
| }, | |
| { | |
| "epoch": 9.463343108504398, | |
| "grad_norm": 0.22630713206222056, | |
| "learning_rate": 4.219202350768919e-06, | |
| "loss": 0.0625, | |
| "mean_token_accuracy": 0.9775300472974777, | |
| "step": 1618 | |
| }, | |
| { | |
| "epoch": 9.469208211143695, | |
| "grad_norm": 0.23955620707093253, | |
| "learning_rate": 4.213899101601853e-06, | |
| "loss": 0.0627, | |
| "mean_token_accuracy": 0.9794919416308403, | |
| "step": 1619 | |
| }, | |
| { | |
| "epoch": 9.47507331378299, | |
| "grad_norm": 0.2457762790452542, | |
| "learning_rate": 4.208660408797708e-06, | |
| "loss": 0.0624, | |
| "mean_token_accuracy": 0.9798106178641319, | |
| "step": 1620 | |
| }, | |
| { | |
| "epoch": 9.480938416422287, | |
| "grad_norm": 0.2431662748020246, | |
| "learning_rate": 4.203486291370821e-06, | |
| "loss": 0.0603, | |
| "mean_token_accuracy": 0.9812995418906212, | |
| "step": 1621 | |
| }, | |
| { | |
| "epoch": 9.486803519061583, | |
| "grad_norm": 0.2723357831813373, | |
| "learning_rate": 4.198376768101149e-06, | |
| "loss": 0.0726, | |
| "mean_token_accuracy": 0.9783790037035942, | |
| "step": 1622 | |
| }, | |
| { | |
| "epoch": 9.49266862170088, | |
| "grad_norm": 0.3520178956023517, | |
| "learning_rate": 4.193331857534198e-06, | |
| "loss": 0.0589, | |
| "mean_token_accuracy": 0.980711355805397, | |
| "step": 1623 | |
| }, | |
| { | |
| "epoch": 9.498533724340176, | |
| "grad_norm": 0.22927005303270442, | |
| "learning_rate": 4.188351577980961e-06, | |
| "loss": 0.0549, | |
| "mean_token_accuracy": 0.9831141978502274, | |
| "step": 1624 | |
| }, | |
| { | |
| "epoch": 9.504398826979472, | |
| "grad_norm": 0.238973278958498, | |
| "learning_rate": 4.183435947517836e-06, | |
| "loss": 0.0584, | |
| "mean_token_accuracy": 0.979368269443512, | |
| "step": 1625 | |
| }, | |
| { | |
| "epoch": 9.510263929618768, | |
| "grad_norm": 0.22548713924712482, | |
| "learning_rate": 4.178584983986575e-06, | |
| "loss": 0.0515, | |
| "mean_token_accuracy": 0.9827914386987686, | |
| "step": 1626 | |
| }, | |
| { | |
| "epoch": 9.516129032258064, | |
| "grad_norm": 0.2155318232305598, | |
| "learning_rate": 4.173798704994221e-06, | |
| "loss": 0.0571, | |
| "mean_token_accuracy": 0.9815772697329521, | |
| "step": 1627 | |
| }, | |
| { | |
| "epoch": 9.52199413489736, | |
| "grad_norm": 0.2525525304429569, | |
| "learning_rate": 4.169077127913031e-06, | |
| "loss": 0.0663, | |
| "mean_token_accuracy": 0.9765826910734177, | |
| "step": 1628 | |
| }, | |
| { | |
| "epoch": 9.527859237536656, | |
| "grad_norm": 0.22780826504919216, | |
| "learning_rate": 4.164420269880422e-06, | |
| "loss": 0.0614, | |
| "mean_token_accuracy": 0.9760891944169998, | |
| "step": 1629 | |
| }, | |
| { | |
| "epoch": 9.533724340175953, | |
| "grad_norm": 0.2531272072086709, | |
| "learning_rate": 4.159828147798914e-06, | |
| "loss": 0.0569, | |
| "mean_token_accuracy": 0.9820951670408249, | |
| "step": 1630 | |
| }, | |
| { | |
| "epoch": 9.539589442815249, | |
| "grad_norm": 0.24380644152767034, | |
| "learning_rate": 4.155300778336047e-06, | |
| "loss": 0.0617, | |
| "mean_token_accuracy": 0.9779926687479019, | |
| "step": 1631 | |
| }, | |
| { | |
| "epoch": 9.545454545454545, | |
| "grad_norm": 0.28384668257122436, | |
| "learning_rate": 4.150838177924349e-06, | |
| "loss": 0.0599, | |
| "mean_token_accuracy": 0.9832035973668098, | |
| "step": 1632 | |
| }, | |
| { | |
| "epoch": 9.551319648093841, | |
| "grad_norm": 0.22414889555068185, | |
| "learning_rate": 4.146440362761256e-06, | |
| "loss": 0.0621, | |
| "mean_token_accuracy": 0.9802982956171036, | |
| "step": 1633 | |
| }, | |
| { | |
| "epoch": 9.557184750733137, | |
| "grad_norm": 0.23750844736936616, | |
| "learning_rate": 4.142107348809058e-06, | |
| "loss": 0.0696, | |
| "mean_token_accuracy": 0.9758076518774033, | |
| "step": 1634 | |
| }, | |
| { | |
| "epoch": 9.563049853372434, | |
| "grad_norm": 0.25194978074346364, | |
| "learning_rate": 4.1378391517948505e-06, | |
| "loss": 0.0564, | |
| "mean_token_accuracy": 0.9831016063690186, | |
| "step": 1635 | |
| }, | |
| { | |
| "epoch": 9.56891495601173, | |
| "grad_norm": 0.24188617681913702, | |
| "learning_rate": 4.1336357872104614e-06, | |
| "loss": 0.063, | |
| "mean_token_accuracy": 0.9811783134937286, | |
| "step": 1636 | |
| }, | |
| { | |
| "epoch": 9.574780058651026, | |
| "grad_norm": 0.24307965914320542, | |
| "learning_rate": 4.12949727031241e-06, | |
| "loss": 0.0632, | |
| "mean_token_accuracy": 0.9804549887776375, | |
| "step": 1637 | |
| }, | |
| { | |
| "epoch": 9.580645161290322, | |
| "grad_norm": 0.2209724909626426, | |
| "learning_rate": 4.125423616121837e-06, | |
| "loss": 0.0562, | |
| "mean_token_accuracy": 0.9810581132769585, | |
| "step": 1638 | |
| }, | |
| { | |
| "epoch": 9.586510263929618, | |
| "grad_norm": 0.19695458594988682, | |
| "learning_rate": 4.121414839424464e-06, | |
| "loss": 0.0581, | |
| "mean_token_accuracy": 0.9823091104626656, | |
| "step": 1639 | |
| }, | |
| { | |
| "epoch": 9.592375366568914, | |
| "grad_norm": 0.25761669321110686, | |
| "learning_rate": 4.117470954770529e-06, | |
| "loss": 0.0668, | |
| "mean_token_accuracy": 0.9789463207125664, | |
| "step": 1640 | |
| }, | |
| { | |
| "epoch": 9.59824046920821, | |
| "grad_norm": 0.19877925033471974, | |
| "learning_rate": 4.1135919764747454e-06, | |
| "loss": 0.056, | |
| "mean_token_accuracy": 0.9804951846599579, | |
| "step": 1641 | |
| }, | |
| { | |
| "epoch": 9.604105571847507, | |
| "grad_norm": 0.22402798094527665, | |
| "learning_rate": 4.109777918616235e-06, | |
| "loss": 0.0628, | |
| "mean_token_accuracy": 0.9828111082315445, | |
| "step": 1642 | |
| }, | |
| { | |
| "epoch": 9.609970674486803, | |
| "grad_norm": 0.2454189049005734, | |
| "learning_rate": 4.106028795038487e-06, | |
| "loss": 0.0657, | |
| "mean_token_accuracy": 0.977626658976078, | |
| "step": 1643 | |
| }, | |
| { | |
| "epoch": 9.6158357771261, | |
| "grad_norm": 0.3023925092807269, | |
| "learning_rate": 4.102344619349307e-06, | |
| "loss": 0.0759, | |
| "mean_token_accuracy": 0.9734556525945663, | |
| "step": 1644 | |
| }, | |
| { | |
| "epoch": 9.621700879765395, | |
| "grad_norm": 0.2774980485563469, | |
| "learning_rate": 4.098725404920763e-06, | |
| "loss": 0.072, | |
| "mean_token_accuracy": 0.9775623232126236, | |
| "step": 1645 | |
| }, | |
| { | |
| "epoch": 9.627565982404692, | |
| "grad_norm": 0.30333570428748763, | |
| "learning_rate": 4.095171164889143e-06, | |
| "loss": 0.0605, | |
| "mean_token_accuracy": 0.9795369878411293, | |
| "step": 1646 | |
| }, | |
| { | |
| "epoch": 9.633431085043988, | |
| "grad_norm": 0.2356131369923837, | |
| "learning_rate": 4.091681912154903e-06, | |
| "loss": 0.0619, | |
| "mean_token_accuracy": 0.9773572832345963, | |
| "step": 1647 | |
| }, | |
| { | |
| "epoch": 9.639296187683284, | |
| "grad_norm": 0.3972197577494257, | |
| "learning_rate": 4.088257659382619e-06, | |
| "loss": 0.085, | |
| "mean_token_accuracy": 0.972993515431881, | |
| "step": 1648 | |
| }, | |
| { | |
| "epoch": 9.64516129032258, | |
| "grad_norm": 0.2869208785212869, | |
| "learning_rate": 4.0848984190009495e-06, | |
| "loss": 0.0686, | |
| "mean_token_accuracy": 0.9753344133496284, | |
| "step": 1649 | |
| }, | |
| { | |
| "epoch": 9.651026392961876, | |
| "grad_norm": 0.21142625693572772, | |
| "learning_rate": 4.081604203202577e-06, | |
| "loss": 0.0534, | |
| "mean_token_accuracy": 0.983475349843502, | |
| "step": 1650 | |
| }, | |
| { | |
| "epoch": 9.656891495601172, | |
| "grad_norm": 0.21591702452821918, | |
| "learning_rate": 4.078375023944175e-06, | |
| "loss": 0.0603, | |
| "mean_token_accuracy": 0.9807283952832222, | |
| "step": 1651 | |
| }, | |
| { | |
| "epoch": 9.662756598240469, | |
| "grad_norm": 0.2510568273569073, | |
| "learning_rate": 4.0752108929463625e-06, | |
| "loss": 0.0718, | |
| "mean_token_accuracy": 0.9732875376939774, | |
| "step": 1652 | |
| }, | |
| { | |
| "epoch": 9.668621700879765, | |
| "grad_norm": 0.2863797336182336, | |
| "learning_rate": 4.072111821693655e-06, | |
| "loss": 0.0666, | |
| "mean_token_accuracy": 0.9797648787498474, | |
| "step": 1653 | |
| }, | |
| { | |
| "epoch": 9.674486803519061, | |
| "grad_norm": 0.38771617138049674, | |
| "learning_rate": 4.069077821434429e-06, | |
| "loss": 0.0695, | |
| "mean_token_accuracy": 0.9792503714561462, | |
| "step": 1654 | |
| }, | |
| { | |
| "epoch": 9.680351906158357, | |
| "grad_norm": 0.2900452759214232, | |
| "learning_rate": 4.06610890318088e-06, | |
| "loss": 0.0581, | |
| "mean_token_accuracy": 0.9800984635949135, | |
| "step": 1655 | |
| }, | |
| { | |
| "epoch": 9.686217008797653, | |
| "grad_norm": 0.20255555978860618, | |
| "learning_rate": 4.063205077708986e-06, | |
| "loss": 0.0584, | |
| "mean_token_accuracy": 0.9810278192162514, | |
| "step": 1656 | |
| }, | |
| { | |
| "epoch": 9.69208211143695, | |
| "grad_norm": 0.29768882425093995, | |
| "learning_rate": 4.060366355558456e-06, | |
| "loss": 0.0658, | |
| "mean_token_accuracy": 0.976965144276619, | |
| "step": 1657 | |
| }, | |
| { | |
| "epoch": 9.697947214076246, | |
| "grad_norm": 0.279989351630936, | |
| "learning_rate": 4.057592747032707e-06, | |
| "loss": 0.0769, | |
| "mean_token_accuracy": 0.9744569063186646, | |
| "step": 1658 | |
| }, | |
| { | |
| "epoch": 9.703812316715542, | |
| "grad_norm": 0.264998673954461, | |
| "learning_rate": 4.054884262198816e-06, | |
| "loss": 0.0545, | |
| "mean_token_accuracy": 0.980218268930912, | |
| "step": 1659 | |
| }, | |
| { | |
| "epoch": 9.709677419354838, | |
| "grad_norm": 0.2122981558643174, | |
| "learning_rate": 4.052240910887493e-06, | |
| "loss": 0.0593, | |
| "mean_token_accuracy": 0.9808973520994186, | |
| "step": 1660 | |
| }, | |
| { | |
| "epoch": 9.715542521994134, | |
| "grad_norm": 0.23351588515140767, | |
| "learning_rate": 4.049662702693031e-06, | |
| "loss": 0.0594, | |
| "mean_token_accuracy": 0.9794720560312271, | |
| "step": 1661 | |
| }, | |
| { | |
| "epoch": 9.72140762463343, | |
| "grad_norm": 0.23921226195250828, | |
| "learning_rate": 4.047149646973288e-06, | |
| "loss": 0.0615, | |
| "mean_token_accuracy": 0.9779072403907776, | |
| "step": 1662 | |
| }, | |
| { | |
| "epoch": 9.727272727272727, | |
| "grad_norm": 0.2719726622731542, | |
| "learning_rate": 4.044701752849639e-06, | |
| "loss": 0.059, | |
| "mean_token_accuracy": 0.9809428751468658, | |
| "step": 1663 | |
| }, | |
| { | |
| "epoch": 9.733137829912023, | |
| "grad_norm": 0.22023511820730315, | |
| "learning_rate": 4.042319029206954e-06, | |
| "loss": 0.0573, | |
| "mean_token_accuracy": 0.9804383143782616, | |
| "step": 1664 | |
| }, | |
| { | |
| "epoch": 9.739002932551319, | |
| "grad_norm": 0.2645304598593257, | |
| "learning_rate": 4.040001484693553e-06, | |
| "loss": 0.0567, | |
| "mean_token_accuracy": 0.9822018891572952, | |
| "step": 1665 | |
| }, | |
| { | |
| "epoch": 9.744868035190615, | |
| "grad_norm": 0.26898083015274943, | |
| "learning_rate": 4.037749127721191e-06, | |
| "loss": 0.0602, | |
| "mean_token_accuracy": 0.9816476553678513, | |
| "step": 1666 | |
| }, | |
| { | |
| "epoch": 9.750733137829911, | |
| "grad_norm": 0.2148855087589927, | |
| "learning_rate": 4.03556196646501e-06, | |
| "loss": 0.0557, | |
| "mean_token_accuracy": 0.9825252592563629, | |
| "step": 1667 | |
| }, | |
| { | |
| "epoch": 9.756598240469208, | |
| "grad_norm": 0.258599519645029, | |
| "learning_rate": 4.033440008863528e-06, | |
| "loss": 0.0686, | |
| "mean_token_accuracy": 0.9784804806113243, | |
| "step": 1668 | |
| }, | |
| { | |
| "epoch": 9.762463343108504, | |
| "grad_norm": 0.2561946434849189, | |
| "learning_rate": 4.031383262618588e-06, | |
| "loss": 0.0691, | |
| "mean_token_accuracy": 0.9773758798837662, | |
| "step": 1669 | |
| }, | |
| { | |
| "epoch": 9.7683284457478, | |
| "grad_norm": 0.2762399259206808, | |
| "learning_rate": 4.0293917351953505e-06, | |
| "loss": 0.0618, | |
| "mean_token_accuracy": 0.9803431034088135, | |
| "step": 1670 | |
| }, | |
| { | |
| "epoch": 9.774193548387096, | |
| "grad_norm": 0.355930013011736, | |
| "learning_rate": 4.027465433822255e-06, | |
| "loss": 0.0584, | |
| "mean_token_accuracy": 0.9785107672214508, | |
| "step": 1671 | |
| }, | |
| { | |
| "epoch": 9.780058651026392, | |
| "grad_norm": 0.24895531218985303, | |
| "learning_rate": 4.025604365490999e-06, | |
| "loss": 0.0609, | |
| "mean_token_accuracy": 0.9814844280481339, | |
| "step": 1672 | |
| }, | |
| { | |
| "epoch": 9.785923753665688, | |
| "grad_norm": 0.2340877368955121, | |
| "learning_rate": 4.0238085369565085e-06, | |
| "loss": 0.0607, | |
| "mean_token_accuracy": 0.9819305539131165, | |
| "step": 1673 | |
| }, | |
| { | |
| "epoch": 9.791788856304985, | |
| "grad_norm": 0.21828086268261218, | |
| "learning_rate": 4.022077954736916e-06, | |
| "loss": 0.0604, | |
| "mean_token_accuracy": 0.9818969219923019, | |
| "step": 1674 | |
| }, | |
| { | |
| "epoch": 9.79765395894428, | |
| "grad_norm": 0.2658598614336904, | |
| "learning_rate": 4.020412625113535e-06, | |
| "loss": 0.0607, | |
| "mean_token_accuracy": 0.9813483133912086, | |
| "step": 1675 | |
| }, | |
| { | |
| "epoch": 9.803519061583577, | |
| "grad_norm": 0.2602573672355344, | |
| "learning_rate": 4.018812554130839e-06, | |
| "loss": 0.0753, | |
| "mean_token_accuracy": 0.9781445488333702, | |
| "step": 1676 | |
| }, | |
| { | |
| "epoch": 9.809384164222873, | |
| "grad_norm": 0.2801981767369684, | |
| "learning_rate": 4.01727774759644e-06, | |
| "loss": 0.0667, | |
| "mean_token_accuracy": 0.9778248742222786, | |
| "step": 1677 | |
| }, | |
| { | |
| "epoch": 9.81524926686217, | |
| "grad_norm": 0.2703585276782977, | |
| "learning_rate": 4.0158082110810695e-06, | |
| "loss": 0.06, | |
| "mean_token_accuracy": 0.9797122403979301, | |
| "step": 1678 | |
| }, | |
| { | |
| "epoch": 9.821114369501466, | |
| "grad_norm": 0.31803486063306685, | |
| "learning_rate": 4.014403949918545e-06, | |
| "loss": 0.0599, | |
| "mean_token_accuracy": 0.9799430221319199, | |
| "step": 1679 | |
| }, | |
| { | |
| "epoch": 9.826979472140762, | |
| "grad_norm": 0.2820420179958132, | |
| "learning_rate": 4.0130649692057715e-06, | |
| "loss": 0.0643, | |
| "mean_token_accuracy": 0.9790126010775566, | |
| "step": 1680 | |
| }, | |
| { | |
| "epoch": 9.832844574780058, | |
| "grad_norm": 0.25148689534618957, | |
| "learning_rate": 4.01179127380271e-06, | |
| "loss": 0.0684, | |
| "mean_token_accuracy": 0.9769551530480385, | |
| "step": 1681 | |
| }, | |
| { | |
| "epoch": 9.838709677419354, | |
| "grad_norm": 0.2379631065406326, | |
| "learning_rate": 4.010582868332353e-06, | |
| "loss": 0.0538, | |
| "mean_token_accuracy": 0.9826568216085434, | |
| "step": 1682 | |
| }, | |
| { | |
| "epoch": 9.84457478005865, | |
| "grad_norm": 0.27305533671205534, | |
| "learning_rate": 4.009439757180732e-06, | |
| "loss": 0.0628, | |
| "mean_token_accuracy": 0.9774700924754143, | |
| "step": 1683 | |
| }, | |
| { | |
| "epoch": 9.850439882697946, | |
| "grad_norm": 0.2964998847163188, | |
| "learning_rate": 4.008361944496875e-06, | |
| "loss": 0.063, | |
| "mean_token_accuracy": 0.9798463135957718, | |
| "step": 1684 | |
| }, | |
| { | |
| "epoch": 9.856304985337243, | |
| "grad_norm": 0.2965938750339812, | |
| "learning_rate": 4.00734943419281e-06, | |
| "loss": 0.0736, | |
| "mean_token_accuracy": 0.9754137769341469, | |
| "step": 1685 | |
| }, | |
| { | |
| "epoch": 9.862170087976539, | |
| "grad_norm": 0.28260924704377915, | |
| "learning_rate": 4.006402229943534e-06, | |
| "loss": 0.0647, | |
| "mean_token_accuracy": 0.9779395908117294, | |
| "step": 1686 | |
| }, | |
| { | |
| "epoch": 9.868035190615835, | |
| "grad_norm": 0.23371054288209903, | |
| "learning_rate": 4.005520335187023e-06, | |
| "loss": 0.0655, | |
| "mean_token_accuracy": 0.9794416725635529, | |
| "step": 1687 | |
| }, | |
| { | |
| "epoch": 9.873900293255131, | |
| "grad_norm": 0.2612525488005821, | |
| "learning_rate": 4.004703753124195e-06, | |
| "loss": 0.0655, | |
| "mean_token_accuracy": 0.9788842275738716, | |
| "step": 1688 | |
| }, | |
| { | |
| "epoch": 9.879765395894427, | |
| "grad_norm": 0.215311765680089, | |
| "learning_rate": 4.003952486718913e-06, | |
| "loss": 0.0552, | |
| "mean_token_accuracy": 0.9810579568147659, | |
| "step": 1689 | |
| }, | |
| { | |
| "epoch": 9.885630498533724, | |
| "grad_norm": 0.2504746637761669, | |
| "learning_rate": 4.003266538697973e-06, | |
| "loss": 0.0616, | |
| "mean_token_accuracy": 0.978359691798687, | |
| "step": 1690 | |
| }, | |
| { | |
| "epoch": 9.89149560117302, | |
| "grad_norm": 0.3033572404106571, | |
| "learning_rate": 4.002645911551086e-06, | |
| "loss": 0.0551, | |
| "mean_token_accuracy": 0.9803460240364075, | |
| "step": 1691 | |
| }, | |
| { | |
| "epoch": 9.897360703812316, | |
| "grad_norm": 0.2266595329570515, | |
| "learning_rate": 4.002090607530882e-06, | |
| "loss": 0.0626, | |
| "mean_token_accuracy": 0.9790749177336693, | |
| "step": 1692 | |
| }, | |
| { | |
| "epoch": 9.903225806451612, | |
| "grad_norm": 0.25758436439961796, | |
| "learning_rate": 4.001600628652887e-06, | |
| "loss": 0.0751, | |
| "mean_token_accuracy": 0.9739339500665665, | |
| "step": 1693 | |
| }, | |
| { | |
| "epoch": 9.909090909090908, | |
| "grad_norm": 0.2730797336241935, | |
| "learning_rate": 4.001175976695527e-06, | |
| "loss": 0.0679, | |
| "mean_token_accuracy": 0.9755261316895485, | |
| "step": 1694 | |
| }, | |
| { | |
| "epoch": 9.914956011730204, | |
| "grad_norm": 0.24102622763373174, | |
| "learning_rate": 4.000816653200117e-06, | |
| "loss": 0.0533, | |
| "mean_token_accuracy": 0.9845007807016373, | |
| "step": 1695 | |
| }, | |
| { | |
| "epoch": 9.9208211143695, | |
| "grad_norm": 0.2813958161148632, | |
| "learning_rate": 4.000522659470857e-06, | |
| "loss": 0.0606, | |
| "mean_token_accuracy": 0.9797054156661034, | |
| "step": 1696 | |
| }, | |
| { | |
| "epoch": 9.926686217008797, | |
| "grad_norm": 0.3285137376648042, | |
| "learning_rate": 4.000293996574826e-06, | |
| "loss": 0.0799, | |
| "mean_token_accuracy": 0.9748532995581627, | |
| "step": 1697 | |
| }, | |
| { | |
| "epoch": 9.932551319648093, | |
| "grad_norm": 0.28907914247281624, | |
| "learning_rate": 4.000130665341977e-06, | |
| "loss": 0.0749, | |
| "mean_token_accuracy": 0.9762020409107208, | |
| "step": 1698 | |
| }, | |
| { | |
| "epoch": 9.93841642228739, | |
| "grad_norm": 0.2506289774234882, | |
| "learning_rate": 4.000032666365136e-06, | |
| "loss": 0.0595, | |
| "mean_token_accuracy": 0.9811992347240448, | |
| "step": 1699 | |
| }, | |
| { | |
| "epoch": 9.944281524926687, | |
| "grad_norm": 0.23581196926582984, | |
| "learning_rate": 4.000000000000001e-06, | |
| "loss": 0.0598, | |
| "mean_token_accuracy": 0.9796453937888145, | |
| "step": 1700 | |
| }, | |
| { | |
| "epoch": 9.944281524926687, | |
| "step": 1700, | |
| "total_flos": 13396901806080.0, | |
| "train_loss": 0.21142534818062012, | |
| "train_runtime": 61693.825, | |
| "train_samples_per_second": 0.884, | |
| "train_steps_per_second": 0.028 | |
| } | |
| ], | |
| "logging_steps": 1, | |
| "max_steps": 1700, | |
| "num_input_tokens_seen": 0, | |
| "num_train_epochs": 10, | |
| "save_steps": 200, | |
| "stateful_callbacks": { | |
| "TrainerControl": { | |
| "args": { | |
| "should_epoch_stop": false, | |
| "should_evaluate": false, | |
| "should_log": false, | |
| "should_save": true, | |
| "should_training_stop": true | |
| }, | |
| "attributes": {} | |
| } | |
| }, | |
| "total_flos": 13396901806080.0, | |
| "train_batch_size": 1, | |
| "trial_name": null, | |
| "trial_params": null | |
| } | |