Instructions to use Vikaspandey582003/echo-calibration-adapter with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Vikaspandey582003/echo-calibration-adapter with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen2.5-7B-Instruct")
model = PeftModel.from_pretrained(base_model, "Vikaspandey582003/echo-calibration-adapter")

Transformers

How to use Vikaspandey582003/echo-calibration-adapter with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Vikaspandey582003/echo-calibration-adapter")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Vikaspandey582003/echo-calibration-adapter", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Vikaspandey582003/echo-calibration-adapter with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Vikaspandey582003/echo-calibration-adapter"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Vikaspandey582003/echo-calibration-adapter",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Vikaspandey582003/echo-calibration-adapter

SGLang

How to use Vikaspandey582003/echo-calibration-adapter with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Vikaspandey582003/echo-calibration-adapter" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Vikaspandey582003/echo-calibration-adapter",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Vikaspandey582003/echo-calibration-adapter" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Vikaspandey582003/echo-calibration-adapter",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Vikaspandey582003/echo-calibration-adapter with Docker Model Runner:
```
docker model run hf.co/Vikaspandey582003/echo-calibration-adapter
```

Vikaspandey582003 commited on 13 days ago

Commit

c00d096

verified ·

1 Parent(s): 7a51852

checkpoint step 50

Browse files

Files changed (6) hide show

checkpoint-50/adapter_model.safetensors +1 -1
checkpoint-50/optimizer.pt +1 -1
checkpoint-50/rng_state.pth +1 -1
checkpoint-50/scheduler.pt +1 -1
checkpoint-50/trainer_state.json +190 -190
checkpoint-50/training_args.bin +1 -1

checkpoint-50/adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:8774212fba617daa40bef4cef915e047bc2e6feee4ceebd5580bf1ee2fb50370
 size 80792880

 version https://git-lfs.github.com/spec/v1
+oid sha256:a8f6b3da54d6d5e4c4c6546f5e250e766dc426eb113dc1c45ab4ed567ffd48b1
 size 80792880

checkpoint-50/optimizer.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0f91af0fa995043cb9e216dca74e80d09ee819a9d274a9dcf260d0411a6b48ed
 size 161816251

 version https://git-lfs.github.com/spec/v1
+oid sha256:29b4a30cf81e86d5a32d909da97f25a65e4690b495e874547c51260ea02d771e
 size 161816251

checkpoint-50/rng_state.pth CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:42c8957cbd17b37e5391f10035f189ea0492f94bda207d033f16b09cc832dbcf
 size 14645

 version https://git-lfs.github.com/spec/v1
+oid sha256:77323eeff9b7a4a4a795c260f92519ee32bd1dec272ef8a510f9120534b72ca2
 size 14645

checkpoint-50/scheduler.pt CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:259b1303d82fa78c2e55eeb7df6096e0b57593f81a8f6658f2d1675d51e39965
 size 1465

 version https://git-lfs.github.com/spec/v1
+oid sha256:78a83610ba4b367fed3f2c1f69cd0080618139093c0ae8eb9639608a8f1d40eb
 size 1465

checkpoint-50/trainer_state.json CHANGED Viewed

@@ -2,7 +2,7 @@
   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
-  "epoch": 0.0021551724137931034,
   "eval_steps": 500,
   "global_step": 50,
   "is_hyper_param_search": false,
@@ -15,26 +15,26 @@
       "clip_ratio/low_mean": 0.0,
       "clip_ratio/low_min": 0.0,
       "clip_ratio/region_mean": 0.0,
-      "completions/clipped_ratio": 0.0,
-      "completions/max_length": 130.0,
-      "completions/max_terminated_length": 130.0,
-      "completions/mean_length": 68.5,
-      "completions/mean_terminated_length": 68.5,
-      "completions/min_length": 26.8,
-      "completions/min_terminated_length": 26.8,
-      "entropy": 0.25826080311089755,
-      "epoch": 0.00021551724137931034,
-      "frac_reward_zero_std": 0.1,
-      "grad_norm": 0.28125,
       "learning_rate": 1.0000000000000002e-06,
-      "loss": 0.10238287448883057,
-      "num_tokens": 12064.0,
-      "reward": 0.3290319949388504,
-      "reward_std": 0.40028320252895355,
-      "rewards/reward_fn/mean": 0.3290319949388504,
-      "rewards/reward_fn/std": 0.400283208489418,
       "step": 5,
-      "step_time": 22.194597821800016
     },
     {
       "clip_ratio/high_max": 0.0,
@@ -42,26 +42,26 @@
       "clip_ratio/low_mean": 0.0,
       "clip_ratio/low_min": 0.0,
       "clip_ratio/region_mean": 0.0,
-      "completions/clipped_ratio": 0.0,
-      "completions/max_length": 87.6,
-      "completions/max_terminated_length": 87.6,
-      "completions/mean_length": 48.525,
-      "completions/mean_terminated_length": 48.525,
-      "completions/min_length": 17.0,
-      "completions/min_terminated_length": 17.0,
-      "entropy": 0.2673827801831067,
-      "epoch": 0.0004310344827586207,
-      "frac_reward_zero_std": 0.1,
-      "grad_norm": 0.484375,
       "learning_rate": 2.25e-06,
-      "loss": 0.05849265456199646,
-      "num_tokens": 23089.0,
-      "reward": 0.45969198942184447,
-      "reward_std": 0.2795014828443527,
-      "rewards/reward_fn/mean": 0.45969198942184447,
-      "rewards/reward_fn/std": 0.27950150668621065,
       "step": 10,
-      "step_time": 16.331819552399928
     },
     {
       "clip_ratio/high_max": 0.0,
@@ -69,26 +69,26 @@
       "clip_ratio/low_mean": 0.0,
       "clip_ratio/low_min": 0.0,
       "clip_ratio/region_mean": 0.0,
-      "completions/clipped_ratio": 0.0,
-      "completions/max_length": 85.6,
-      "completions/max_terminated_length": 85.6,
-      "completions/mean_length": 47.35,
-      "completions/mean_terminated_length": 47.35,
-      "completions/min_length": 25.0,
-      "completions/min_terminated_length": 25.0,
-      "entropy": 0.20174795808270574,
-      "epoch": 0.000646551724137931,
-      "frac_reward_zero_std": 0.4,
-      "grad_norm": 0.66015625,
       "learning_rate": 3.5e-06,
-      "loss": -0.03555725216865539,
-      "num_tokens": 34239.0,
-      "reward": 0.5545999944210053,
-      "reward_std": 0.32832055240869523,
-      "rewards/reward_fn/mean": 0.5545999944210053,
-      "rewards/reward_fn/std": 0.3283205583691597,
       "step": 15,
-      "step_time": 16.44816417159991
     },
     {
       "clip_ratio/high_max": 0.0,
@@ -96,26 +96,26 @@
       "clip_ratio/low_mean": 0.0,
       "clip_ratio/low_min": 0.0,
       "clip_ratio/region_mean": 0.0,
-      "completions/clipped_ratio": 0.0,
-      "completions/max_length": 98.2,
-      "completions/max_terminated_length": 98.2,
-      "completions/mean_length": 53.6,
-      "completions/mean_terminated_length": 53.6,
-      "completions/min_length": 25.0,
-      "completions/min_terminated_length": 25.0,
-      "entropy": 0.2620095924474299,
-      "epoch": 0.0008620689655172414,
-      "frac_reward_zero_std": 0.2,
-      "grad_norm": 0.5390625,
       "learning_rate": 4.75e-06,
-      "loss": 0.017401468753814698,
-      "num_tokens": 45399.0,
-      "reward": 0.4466240078210831,
-      "reward_std": 0.27573536019772293,
-      "rewards/reward_fn/mean": 0.4466240078210831,
-      "rewards/reward_fn/std": 0.27573537137359383,
       "step": 20,
-      "step_time": 17.988385831799953
     },
     {
       "clip_ratio/high_max": 0.0,
@@ -123,26 +123,26 @@
       "clip_ratio/low_mean": 0.0,
       "clip_ratio/low_min": 0.0,
       "clip_ratio/region_mean": 0.0,
-      "completions/clipped_ratio": 0.0,
-      "completions/max_length": 123.8,
-      "completions/max_terminated_length": 123.8,
-      "completions/mean_length": 59.15,
-      "completions/mean_terminated_length": 59.15,
-      "completions/min_length": 21.8,
-      "completions/min_terminated_length": 21.8,
-      "entropy": 0.33701689867302775,
-      "epoch": 0.0010775862068965517,
-      "frac_reward_zero_std": 0.0,
-      "grad_norm": 0.74609375,
-      "learning_rate": 4.965517241379311e-06,
-      "loss": -0.02674364447593689,
-      "num_tokens": 56993.0,
-      "reward": 0.5006439983844757,
-      "reward_std": 0.2351181447505951,
-      "rewards/reward_fn/mean": 0.5006439983844757,
-      "rewards/reward_fn/std": 0.23511814773082734,
       "step": 25,
-      "step_time": 21.11461545459997
     },
     {
       "clip_ratio/high_max": 0.0,
@@ -150,26 +150,26 @@
       "clip_ratio/low_mean": 0.0,
       "clip_ratio/low_min": 0.0,
       "clip_ratio/region_mean": 0.0,
-      "completions/clipped_ratio": 0.0,
-      "completions/max_length": 110.2,
-      "completions/max_terminated_length": 110.2,
-      "completions/mean_length": 47.95,
-      "completions/mean_terminated_length": 47.95,
-      "completions/min_length": 15.8,
-      "completions/min_terminated_length": 15.8,
-      "entropy": 0.24504410615190864,
-      "epoch": 0.001293103448275862,
-      "frac_reward_zero_std": 0.4,
-      "grad_norm": 0.92578125,
-      "learning_rate": 4.922413793103449e-06,
-      "loss": -0.05023183822631836,
-      "num_tokens": 67659.0,
-      "reward": 0.6457239985466003,
-      "reward_std": 0.11179900387069211,
-      "rewards/reward_fn/mean": 0.6457239985466003,
-      "rewards/reward_fn/std": 0.11179901438299567,
       "step": 30,
-      "step_time": 19.134268762199827
     },
     {
       "clip_ratio/high_max": 0.0,
@@ -177,26 +177,26 @@
       "clip_ratio/low_mean": 0.0,
       "clip_ratio/low_min": 0.0,
       "clip_ratio/region_mean": 0.0,
-      "completions/clipped_ratio": 0.1,
-      "completions/max_length": 150.8,
-      "completions/max_terminated_length": 120.2,
-      "completions/mean_length": 73.375,
-      "completions/mean_terminated_length": 54.36666717529297,
-      "completions/min_length": 15.2,
-      "completions/min_terminated_length": 15.2,
-      "entropy": 0.23285924410447478,
-      "epoch": 0.0015086206896551724,
-      "frac_reward_zero_std": 0.3,
-      "grad_norm": 0.58203125,
-      "learning_rate": 4.879310344827586e-06,
-      "loss": -0.05149807929992676,
-      "num_tokens": 79734.0,
-      "reward": 0.46414998471736907,
-      "reward_std": 0.4071369742392562,
-      "rewards/reward_fn/mean": 0.46414998471736907,
-      "rewards/reward_fn/std": 0.40713699695188554,
       "step": 35,
-      "step_time": 24.679699858600042
     },
     {
       "clip_ratio/high_max": 0.0,
@@ -204,26 +204,26 @@
       "clip_ratio/low_mean": 0.0,
       "clip_ratio/low_min": 0.0,
       "clip_ratio/region_mean": 0.0,
-      "completions/clipped_ratio": 0.0,
-      "completions/max_length": 125.4,
-      "completions/max_terminated_length": 125.4,
-      "completions/mean_length": 68.925,
-      "completions/mean_terminated_length": 68.925,
-      "completions/min_length": 26.0,
-      "completions/min_terminated_length": 26.0,
-      "entropy": 0.3110098702833056,
-      "epoch": 0.0017241379310344827,
-      "frac_reward_zero_std": 0.1,
-      "grad_norm": 0.419921875,
-      "learning_rate": 4.836206896551724e-06,
-      "loss": 0.030162644386291505,
-      "num_tokens": 91695.0,
-      "reward": 0.5143539935350419,
-      "reward_std": 0.2790891878306866,
-      "rewards/reward_fn/mean": 0.5143539935350419,
-      "rewards/reward_fn/std": 0.279089218378067,
       "step": 40,
-      "step_time": 21.565513409599998
     },
     {
       "clip_ratio/high_max": 0.0,
@@ -231,26 +231,26 @@
       "clip_ratio/low_mean": 0.0,
       "clip_ratio/low_min": 0.0,
       "clip_ratio/region_mean": 0.0,
-      "completions/clipped_ratio": 0.0,
-      "completions/max_length": 100.0,
-      "completions/max_terminated_length": 100.0,
-      "completions/mean_length": 53.875,
-      "completions/mean_terminated_length": 53.875,
-      "completions/min_length": 16.6,
-      "completions/min_terminated_length": 16.6,
-      "entropy": 0.23047098610550165,
-      "epoch": 0.001939655172413793,
-      "frac_reward_zero_std": 0.4,
-      "grad_norm": 1.15625,
-      "learning_rate": 4.793103448275862e-06,
-      "loss": -0.04350074529647827,
-      "num_tokens": 102994.0,
-      "reward": 0.6398920059204102,
-      "reward_std": 0.2279826147481799,
-      "rewards/reward_fn/mean": 0.6398920059204102,
-      "rewards/reward_fn/std": 0.22798261437565087,
       "step": 45,
-      "step_time": 18.23975218899968
     },
     {
       "clip_ratio/high_max": 0.0,
@@ -258,32 +258,32 @@
       "clip_ratio/low_mean": 0.0,
       "clip_ratio/low_min": 0.0,
       "clip_ratio/region_mean": 0.0,
-      "completions/clipped_ratio": 0.0,
-      "completions/max_length": 133.4,
-      "completions/max_terminated_length": 133.4,
-      "completions/mean_length": 75.975,
-      "completions/mean_terminated_length": 75.975,
-      "completions/min_length": 23.8,
-      "completions/min_terminated_length": 23.8,
-      "entropy": 0.34613882582634686,
-      "epoch": 0.0021551724137931034,
-      "frac_reward_zero_std": 0.2,
-      "grad_norm": 0.1318359375,
-      "learning_rate": 4.75e-06,
-      "loss": -0.03668657541275024,
-      "num_tokens": 115289.0,
-      "reward": 0.33042599707841874,
-      "reward_std": 0.5918701648712158,
-      "rewards/reward_fn/mean": 0.33042599707841874,
-      "rewards/reward_fn/std": 0.5918702006340026,
       "step": 50,
-      "step_time": 22.43372436519985
     }
   ],
   "logging_steps": 5,
-  "max_steps": 600,
-  "num_input_tokens_seen": 115289,
-  "num_train_epochs": 1,
   "save_steps": 50,
   "stateful_callbacks": {
     "TrainerControl": {

   "best_global_step": null,
   "best_metric": null,
   "best_model_checkpoint": null,
+  "epoch": 25.0,
   "eval_steps": 500,
   "global_step": 50,
   "is_hyper_param_search": false,
       "clip_ratio/low_mean": 0.0,
       "clip_ratio/low_min": 0.0,
       "clip_ratio/region_mean": 0.0,
+      "completions/clipped_ratio": 0.1,
+      "completions/max_length": 193.0,
+      "completions/max_terminated_length": 163.0,
+      "completions/mean_length": 103.975,
+      "completions/mean_terminated_length": 90.39881134033203,
+      "completions/min_length": 17.6,
+      "completions/min_terminated_length": 17.6,
+      "entropy": 0.1820149033330381,
+      "epoch": 2.5,
+      "frac_reward_zero_std": 0.5,
+      "grad_norm": 0.25,
       "learning_rate": 1.0000000000000002e-06,
+      "loss": 0.0794088900089264,
+      "num_tokens": 14355.0,
+      "reward": 0.15020800828933717,
+      "reward_std": 0.6376187483081595,
+      "rewards/reward_fn/mean": 0.15020800828933717,
+      "rewards/reward_fn/std": 0.6376187764341011,
       "step": 5,
+      "step_time": 30.522303848797673
     },
     {
       "clip_ratio/high_max": 0.0,
       "clip_ratio/low_mean": 0.0,
       "clip_ratio/low_min": 0.0,
       "clip_ratio/region_mean": 0.0,
+      "completions/clipped_ratio": 0.025,
+      "completions/max_length": 151.2,
+      "completions/max_terminated_length": 141.6,
+      "completions/mean_length": 80.075,
+      "completions/mean_terminated_length": 77.78928833007812,
+      "completions/min_length": 17.8,
+      "completions/min_terminated_length": 17.8,
+      "entropy": 0.16221200795844198,
+      "epoch": 5.0,
+      "frac_reward_zero_std": 0.5,
+      "grad_norm": 0.228515625,
       "learning_rate": 2.25e-06,
+      "loss": 0.07169516086578369,
+      "num_tokens": 27462.0,
+      "reward": 0.4012819856405258,
+      "reward_std": 0.4128362699819263,
+      "rewards/reward_fn/mean": 0.4012819856405258,
+      "rewards/reward_fn/std": 0.4128363010211615,
       "step": 10,
+      "step_time": 25.01976956339822
     },
     {
       "clip_ratio/high_max": 0.0,
       "clip_ratio/low_mean": 0.0,
       "clip_ratio/low_min": 0.0,
       "clip_ratio/region_mean": 0.0,
+      "completions/clipped_ratio": 0.025,
+      "completions/max_length": 185.6,
+      "completions/max_terminated_length": 171.6,
+      "completions/mean_length": 83.65,
+      "completions/mean_terminated_length": 79.14285736083984,
+      "completions/min_length": 16.6,
+      "completions/min_terminated_length": 16.6,
+      "entropy": 0.14225535104051232,
+      "epoch": 7.5,
+      "frac_reward_zero_std": 0.3,
+      "grad_norm": 0.265625,
       "learning_rate": 3.5e-06,
+      "loss": 0.0717179834842682,
+      "num_tokens": 40740.0,
+      "reward": 0.10207997858524323,
+      "reward_std": 0.7454913818277419,
+      "rewards/reward_fn/mean": 0.10207997858524323,
+      "rewards/reward_fn/std": 0.7454914333298802,
       "step": 15,
+      "step_time": 29.626422570000432
     },
     {
       "clip_ratio/high_max": 0.0,
       "clip_ratio/low_mean": 0.0,
       "clip_ratio/low_min": 0.0,
       "clip_ratio/region_mean": 0.0,
+      "completions/clipped_ratio": 0.1,
+      "completions/max_length": 202.0,
+      "completions/max_terminated_length": 163.0,
+      "completions/mean_length": 98.85,
+      "completions/mean_terminated_length": 81.79285888671875,
+      "completions/min_length": 19.2,
+      "completions/min_terminated_length": 19.2,
+      "entropy": 0.1534867493668571,
+      "epoch": 10.0,
+      "frac_reward_zero_std": 0.4,
+      "grad_norm": 0.2431640625,
       "learning_rate": 4.75e-06,
+      "loss": 0.09726614952087402,
+      "num_tokens": 54862.0,
+      "reward": 0.25181599259376525,
+      "reward_std": 0.6045640033902601,
+      "rewards/reward_fn/mean": 0.25181599259376525,
+      "rewards/reward_fn/std": 0.6045640454394743,
       "step": 20,
+      "step_time": 31.625062462999267
     },
     {
       "clip_ratio/high_max": 0.0,
       "clip_ratio/low_mean": 0.0,
       "clip_ratio/low_min": 0.0,
       "clip_ratio/region_mean": 0.0,
+      "completions/clipped_ratio": 0.125,
+      "completions/max_length": 233.6,
+      "completions/max_terminated_length": 194.4,
+      "completions/mean_length": 99.525,
+      "completions/mean_terminated_length": 77.78690643310547,
+      "completions/min_length": 18.8,
+      "completions/min_terminated_length": 18.8,
+      "entropy": 0.1538564210291952,
+      "epoch": 12.5,
+      "frac_reward_zero_std": 0.6,
+      "grad_norm": 0.0,
+      "learning_rate": 4.981481481481482e-06,
+      "loss": 0.13929661512374877,
+      "num_tokens": 69039.0,
+      "reward": 0.0006859898567199707,
+      "reward_std": 1.0108385920524596,
+      "rewards/reward_fn/mean": 0.0006859898567199707,
+      "rewards/reward_fn/std": 1.0108386158943177,
       "step": 25,
+      "step_time": 35.65783783460065
     },
     {
       "clip_ratio/high_max": 0.0,
       "clip_ratio/low_mean": 0.0,
       "clip_ratio/low_min": 0.0,
       "clip_ratio/region_mean": 0.0,
+      "completions/clipped_ratio": 0.05,
+      "completions/max_length": 181.8,
+      "completions/max_terminated_length": 153.0,
+      "completions/mean_length": 65.05,
+      "completions/mean_terminated_length": 56.03928680419922,
+      "completions/min_length": 16.6,
+      "completions/min_terminated_length": 16.6,
+      "entropy": 0.13224927680566906,
+      "epoch": 15.0,
+      "frac_reward_zero_std": 0.5,
+      "grad_norm": 0.1875,
+      "learning_rate": 4.958333333333334e-06,
+      "loss": -0.003383058309555054,
+      "num_tokens": 81545.0,
+      "reward": 0.3994179755449295,
+      "reward_std": 0.5622525057464373,
+      "rewards/reward_fn/mean": 0.3994179755449295,
+      "rewards/reward_fn/std": 0.56225254482124,
       "step": 30,
+      "step_time": 29.112151033400732
     },
     {
       "clip_ratio/high_max": 0.0,
       "clip_ratio/low_mean": 0.0,
       "clip_ratio/low_min": 0.0,
       "clip_ratio/region_mean": 0.0,
+      "completions/clipped_ratio": 0.05,
+      "completions/max_length": 131.0,
+      "completions/max_terminated_length": 128.6,
+      "completions/mean_length": 61.075,
+      "completions/mean_terminated_length": 53.282144165039064,
+      "completions/min_length": 17.6,
+      "completions/min_terminated_length": 17.6,
+      "entropy": 0.11974610288161784,
+      "epoch": 17.5,
+      "frac_reward_zero_std": 0.7,
+      "grad_norm": 0.220703125,
+      "learning_rate": 4.935185185185186e-06,
+      "loss": 0.004663025587797165,
+      "num_tokens": 93892.0,
+      "reward": 0.3469119846820831,
+      "reward_std": 0.5775202971824911,
+      "rewards/reward_fn/mean": 0.3469119846820831,
+      "rewards/reward_fn/std": 0.5775203009106917,
       "step": 35,
+      "step_time": 22.457519923200017
     },
     {
       "clip_ratio/high_max": 0.0,
       "clip_ratio/low_mean": 0.0,
       "clip_ratio/low_min": 0.0,
       "clip_ratio/region_mean": 0.0,
+      "completions/clipped_ratio": 0.075,
+      "completions/max_length": 179.6,
+      "completions/max_terminated_length": 131.2,
+      "completions/mean_length": 78.025,
+      "completions/mean_terminated_length": 65.70357513427734,
+      "completions/min_length": 19.0,
+      "completions/min_terminated_length": 19.0,
+      "entropy": 0.12550681543070824,
+      "epoch": 20.0,
+      "frac_reward_zero_std": 0.4,
+      "grad_norm": 0.4609375,
+      "learning_rate": 4.9120370370370375e-06,
+      "loss": 0.036792796850204465,
+      "num_tokens": 107209.0,
+      "reward": 0.3498039901256561,
+      "reward_std": 0.7408117946935817,
+      "rewards/reward_fn/mean": 0.3498039901256561,
+      "rewards/reward_fn/std": 0.7408118456369266,
       "step": 40,
+      "step_time": 28.692756705999635
     },
     {
       "clip_ratio/high_max": 0.0,
       "clip_ratio/low_mean": 0.0,
       "clip_ratio/low_min": 0.0,
       "clip_ratio/region_mean": 0.0,
+      "completions/clipped_ratio": 0.025,
+      "completions/max_length": 189.0,
+      "completions/max_terminated_length": 178.8,
+      "completions/mean_length": 92.675,
+      "completions/mean_terminated_length": 89.51071472167969,
+      "completions/min_length": 17.8,
+      "completions/min_terminated_length": 17.8,
+      "entropy": 0.15165529411751777,
+      "epoch": 22.5,
+      "frac_reward_zero_std": 0.5,
+      "grad_norm": 0.232421875,
+      "learning_rate": 4.888888888888889e-06,
+      "loss": -0.004727205634117127,
+      "num_tokens": 120868.0,
+      "reward": 0.30072798430919645,
+      "reward_std": 0.7614144545921591,
+      "rewards/reward_fn/mean": 0.30072798430919645,
+      "rewards/reward_fn/std": 0.7614145380415721,
       "step": 45,
+      "step_time": 30.053675796201425
     },
     {
       "clip_ratio/high_max": 0.0,
       "clip_ratio/low_mean": 0.0,
       "clip_ratio/low_min": 0.0,
       "clip_ratio/region_mean": 0.0,
+      "completions/clipped_ratio": 0.075,
+      "completions/max_length": 208.6,
+      "completions/max_terminated_length": 189.6,
+      "completions/mean_length": 102.625,
+      "completions/mean_terminated_length": 90.11666870117188,
+      "completions/min_length": 18.8,
+      "completions/min_terminated_length": 18.8,
+      "entropy": 0.15195838457439095,
+      "epoch": 25.0,
+      "frac_reward_zero_std": 0.6,
+      "grad_norm": 0.208984375,
+      "learning_rate": 4.865740740740741e-06,
+      "loss": 0.04500017166137695,
+      "num_tokens": 135121.0,
+      "reward": 0.5009119868278503,
+      "reward_std": 0.5213470441231038,
+      "rewards/reward_fn/mean": 0.5009119868278503,
+      "rewards/reward_fn/std": 0.5213470560469432,
       "step": 50,
+      "step_time": 32.55890014459801
     }
   ],
   "logging_steps": 5,
+  "max_steps": 1100,
+  "num_input_tokens_seen": 135121,
+  "num_train_epochs": 550,
   "save_steps": 50,
   "stateful_callbacks": {
     "TrainerControl": {

checkpoint-50/training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:0cf50531ca359d39fbfa05dc0896e03532a83d53b029182b6d7f757efab0c97a
 size 7185

 version https://git-lfs.github.com/spec/v1
+oid sha256:354b92cb3fc579afc26aea1b098796649abd9be8f2b683beb58a14a6bab0c7d4
 size 7185