Instructions to use Vikaspandey582003/echo-calibration-adapter with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Vikaspandey582003/echo-calibration-adapter with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen2.5-7B-Instruct")
model = PeftModel.from_pretrained(base_model, "Vikaspandey582003/echo-calibration-adapter")

Transformers

How to use Vikaspandey582003/echo-calibration-adapter with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Vikaspandey582003/echo-calibration-adapter")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Vikaspandey582003/echo-calibration-adapter", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Vikaspandey582003/echo-calibration-adapter with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Vikaspandey582003/echo-calibration-adapter"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Vikaspandey582003/echo-calibration-adapter",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Vikaspandey582003/echo-calibration-adapter

SGLang

How to use Vikaspandey582003/echo-calibration-adapter with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Vikaspandey582003/echo-calibration-adapter" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Vikaspandey582003/echo-calibration-adapter",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Vikaspandey582003/echo-calibration-adapter" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Vikaspandey582003/echo-calibration-adapter",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Vikaspandey582003/echo-calibration-adapter with Docker Model Runner:
```
docker model run hf.co/Vikaspandey582003/echo-calibration-adapter
```

Vikaspandey582003 commited on 14 days ago

Commit

c826b46

verified ·

1 Parent(s): 3d90663

checkpoint step 100

Browse files

Files changed (12) hide show

.gitattributes +1 -0
checkpoint-100/README.md +209 -0
checkpoint-100/adapter_config.json +48 -0
checkpoint-100/adapter_model.safetensors +3 -0
checkpoint-100/chat_template.jinja +54 -0
checkpoint-100/optimizer.pt +3 -0
checkpoint-100/rng_state.pth +3 -0
checkpoint-100/scheduler.pt +3 -0
checkpoint-100/tokenizer.json +3 -0
checkpoint-100/tokenizer_config.json +16 -0
checkpoint-100/trainer_state.json +574 -0
checkpoint-100/training_args.bin +3 -0

.gitattributes CHANGED Viewed

@@ -34,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 checkpoint-50/tokenizer.json filter=lfs diff=lfs merge=lfs -text

 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 checkpoint-50/tokenizer.json filter=lfs diff=lfs merge=lfs -text
+checkpoint-100/tokenizer.json filter=lfs diff=lfs merge=lfs -text

checkpoint-100/README.md ADDED Viewed

	@@ -0,0 +1,209 @@

+---
+base_model: unsloth/Qwen2.5-7B-Instruct
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:unsloth/Qwen2.5-7B-Instruct
+- grpo
+- lora
+- transformers
+- trl
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.19.1

checkpoint-100/adapter_config.json ADDED Viewed

	@@ -0,0 +1,48 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": null,
+  "base_model_name_or_path": "unsloth/Qwen2.5-7B-Instruct",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "lora_ga_config": null,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "peft_version": "0.19.1",
+  "qalora_group_size": 16,
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "down_proj",
+    "up_proj",
+    "k_proj",
+    "v_proj",
+    "gate_proj",
+    "q_proj",
+    "o_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_bdlora": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

checkpoint-100/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8cd01891a0c7be67ac5edaf7e64cd5b8c289aad95fdfe736615c1c796c81a3a4
+size 80792880

checkpoint-100/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,54 @@

+{%- if tools %}
+    {{- '<|im_start|>system\n' }}
+    {%- if messages[0]['role'] == 'system' %}
+        {{- messages[0]['content'] }}
+    {%- else %}
+        {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
+    {%- endif %}
+    {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
+    {%- for tool in tools %}
+        {{- "\n" }}
+        {{- tool | tojson }}
+    {%- endfor %}
+    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
+{%- else %}
+    {%- if messages[0]['role'] == 'system' %}
+        {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
+    {%- else %}
+        {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- for message in messages %}
+    {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
+        {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "assistant" %}
+        {{- '<|im_start|>' + message.role }}
+        {%- if message.content %}
+            {{- '\n' + message.content }}
+        {%- endif %}
+        {%- for tool_call in message.tool_calls %}
+            {%- if tool_call.function is defined %}
+                {%- set tool_call = tool_call.function %}
+            {%- endif %}
+            {{- '\n<tool_call>\n{"name": "' }}
+            {{- tool_call.name }}
+            {{- '", "arguments": ' }}
+            {{- tool_call.arguments | tojson }}
+            {{- '}\n</tool_call>' }}
+        {%- endfor %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
+            {{- '<|im_start|>user' }}
+        {%- endif %}
+        {{- '\n<tool_response>\n' }}
+        {{- message.content }}
+        {{- '\n</tool_response>' }}
+        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+{%- endif %}

checkpoint-100/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:89d2fabfe757f06f6c97bf1cfc73f5cd99d15c29239ed9b8753e76795bd90089
+size 161816251

checkpoint-100/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6916902e277ce55ca646bbc2a4102d0233b6ca808546520322cd03b29af78fc7
+size 14645

checkpoint-100/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dcb92b79e96aeec298f0fc6c8c49006695ffcba4575b6a1308c167be334c5c37
+size 1465

checkpoint-100/tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3fd169731d2cbde95e10bf356d66d5997fd885dd8dbb6fb4684da3f23b2585d8
+size 11421892

checkpoint-100/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,16 @@

+{
+  "add_prefix_space": false,
+  "backend": "tokenizers",
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "is_local": false,
+  "local_files_only": false,
+  "model_max_length": 32768,
+  "pad_token": "<|vision_pad|>",
+  "padding_side": "left",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

checkpoint-100/trainer_state.json ADDED Viewed

	@@ -0,0 +1,574 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.004310344827586207,
+  "eval_steps": 500,
+  "global_step": 100,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "clip_ratio/high_max": 0.0,
+      "clip_ratio/high_mean": 0.0,
+      "clip_ratio/low_mean": 0.0,
+      "clip_ratio/low_min": 0.0,
+      "clip_ratio/region_mean": 0.0,
+      "completions/clipped_ratio": 0.0,
+      "completions/max_length": 130.0,
+      "completions/max_terminated_length": 130.0,
+      "completions/mean_length": 68.5,
+      "completions/mean_terminated_length": 68.5,
+      "completions/min_length": 26.8,
+      "completions/min_terminated_length": 26.8,
+      "entropy": 0.25826080311089755,
+      "epoch": 0.00021551724137931034,
+      "frac_reward_zero_std": 0.1,
+      "grad_norm": 0.28125,
+      "learning_rate": 1.0000000000000002e-06,
+      "loss": 0.10238287448883057,
+      "num_tokens": 12064.0,
+      "reward": 0.3290319949388504,
+      "reward_std": 0.40028320252895355,
+      "rewards/reward_fn/mean": 0.3290319949388504,
+      "rewards/reward_fn/std": 0.400283208489418,
+      "step": 5,
+      "step_time": 22.194597821800016
+    },
+    {
+      "clip_ratio/high_max": 0.0,
+      "clip_ratio/high_mean": 0.0,
+      "clip_ratio/low_mean": 0.0,
+      "clip_ratio/low_min": 0.0,
+      "clip_ratio/region_mean": 0.0,
+      "completions/clipped_ratio": 0.0,
+      "completions/max_length": 87.6,
+      "completions/max_terminated_length": 87.6,
+      "completions/mean_length": 48.525,
+      "completions/mean_terminated_length": 48.525,
+      "completions/min_length": 17.0,
+      "completions/min_terminated_length": 17.0,
+      "entropy": 0.2673827801831067,
+      "epoch": 0.0004310344827586207,
+      "frac_reward_zero_std": 0.1,
+      "grad_norm": 0.484375,
+      "learning_rate": 2.25e-06,
+      "loss": 0.05849265456199646,
+      "num_tokens": 23089.0,
+      "reward": 0.45969198942184447,
+      "reward_std": 0.2795014828443527,
+      "rewards/reward_fn/mean": 0.45969198942184447,
+      "rewards/reward_fn/std": 0.27950150668621065,
+      "step": 10,
+      "step_time": 16.331819552399928
+    },
+    {
+      "clip_ratio/high_max": 0.0,
+      "clip_ratio/high_mean": 0.0,
+      "clip_ratio/low_mean": 0.0,
+      "clip_ratio/low_min": 0.0,
+      "clip_ratio/region_mean": 0.0,
+      "completions/clipped_ratio": 0.0,
+      "completions/max_length": 85.6,
+      "completions/max_terminated_length": 85.6,
+      "completions/mean_length": 47.35,
+      "completions/mean_terminated_length": 47.35,
+      "completions/min_length": 25.0,
+      "completions/min_terminated_length": 25.0,
+      "entropy": 0.20174795808270574,
+      "epoch": 0.000646551724137931,
+      "frac_reward_zero_std": 0.4,
+      "grad_norm": 0.66015625,
+      "learning_rate": 3.5e-06,
+      "loss": -0.03555725216865539,
+      "num_tokens": 34239.0,
+      "reward": 0.5545999944210053,
+      "reward_std": 0.32832055240869523,
+      "rewards/reward_fn/mean": 0.5545999944210053,
+      "rewards/reward_fn/std": 0.3283205583691597,
+      "step": 15,
+      "step_time": 16.44816417159991
+    },
+    {
+      "clip_ratio/high_max": 0.0,
+      "clip_ratio/high_mean": 0.0,
+      "clip_ratio/low_mean": 0.0,
+      "clip_ratio/low_min": 0.0,
+      "clip_ratio/region_mean": 0.0,
+      "completions/clipped_ratio": 0.0,
+      "completions/max_length": 98.2,
+      "completions/max_terminated_length": 98.2,
+      "completions/mean_length": 53.6,
+      "completions/mean_terminated_length": 53.6,
+      "completions/min_length": 25.0,
+      "completions/min_terminated_length": 25.0,
+      "entropy": 0.2620095924474299,
+      "epoch": 0.0008620689655172414,
+      "frac_reward_zero_std": 0.2,
+      "grad_norm": 0.5390625,
+      "learning_rate": 4.75e-06,
+      "loss": 0.017401468753814698,
+      "num_tokens": 45399.0,
+      "reward": 0.4466240078210831,
+      "reward_std": 0.27573536019772293,
+      "rewards/reward_fn/mean": 0.4466240078210831,
+      "rewards/reward_fn/std": 0.27573537137359383,
+      "step": 20,
+      "step_time": 17.988385831799953
+    },
+    {
+      "clip_ratio/high_max": 0.0,
+      "clip_ratio/high_mean": 0.0,
+      "clip_ratio/low_mean": 0.0,
+      "clip_ratio/low_min": 0.0,
+      "clip_ratio/region_mean": 0.0,
+      "completions/clipped_ratio": 0.0,
+      "completions/max_length": 123.8,
+      "completions/max_terminated_length": 123.8,
+      "completions/mean_length": 59.15,
+      "completions/mean_terminated_length": 59.15,
+      "completions/min_length": 21.8,
+      "completions/min_terminated_length": 21.8,
+      "entropy": 0.33701689867302775,
+      "epoch": 0.0010775862068965517,
+      "frac_reward_zero_std": 0.0,
+      "grad_norm": 0.74609375,
+      "learning_rate": 4.965517241379311e-06,
+      "loss": -0.02674364447593689,
+      "num_tokens": 56993.0,
+      "reward": 0.5006439983844757,
+      "reward_std": 0.2351181447505951,
+      "rewards/reward_fn/mean": 0.5006439983844757,
+      "rewards/reward_fn/std": 0.23511814773082734,
+      "step": 25,
+      "step_time": 21.11461545459997
+    },
+    {
+      "clip_ratio/high_max": 0.0,
+      "clip_ratio/high_mean": 0.0,
+      "clip_ratio/low_mean": 0.0,
+      "clip_ratio/low_min": 0.0,
+      "clip_ratio/region_mean": 0.0,
+      "completions/clipped_ratio": 0.0,
+      "completions/max_length": 110.2,
+      "completions/max_terminated_length": 110.2,
+      "completions/mean_length": 47.95,
+      "completions/mean_terminated_length": 47.95,
+      "completions/min_length": 15.8,
+      "completions/min_terminated_length": 15.8,
+      "entropy": 0.24504410615190864,
+      "epoch": 0.001293103448275862,
+      "frac_reward_zero_std": 0.4,
+      "grad_norm": 0.92578125,
+      "learning_rate": 4.922413793103449e-06,
+      "loss": -0.05023183822631836,
+      "num_tokens": 67659.0,
+      "reward": 0.6457239985466003,
+      "reward_std": 0.11179900387069211,
+      "rewards/reward_fn/mean": 0.6457239985466003,
+      "rewards/reward_fn/std": 0.11179901438299567,
+      "step": 30,
+      "step_time": 19.134268762199827
+    },
+    {
+      "clip_ratio/high_max": 0.0,
+      "clip_ratio/high_mean": 0.0,
+      "clip_ratio/low_mean": 0.0,
+      "clip_ratio/low_min": 0.0,
+      "clip_ratio/region_mean": 0.0,
+      "completions/clipped_ratio": 0.1,
+      "completions/max_length": 150.8,
+      "completions/max_terminated_length": 120.2,
+      "completions/mean_length": 73.375,
+      "completions/mean_terminated_length": 54.36666717529297,
+      "completions/min_length": 15.2,
+      "completions/min_terminated_length": 15.2,
+      "entropy": 0.23285924410447478,
+      "epoch": 0.0015086206896551724,
+      "frac_reward_zero_std": 0.3,
+      "grad_norm": 0.58203125,
+      "learning_rate": 4.879310344827586e-06,
+      "loss": -0.05149807929992676,
+      "num_tokens": 79734.0,
+      "reward": 0.46414998471736907,
+      "reward_std": 0.4071369742392562,
+      "rewards/reward_fn/mean": 0.46414998471736907,
+      "rewards/reward_fn/std": 0.40713699695188554,
+      "step": 35,
+      "step_time": 24.679699858600042
+    },
+    {
+      "clip_ratio/high_max": 0.0,
+      "clip_ratio/high_mean": 0.0,
+      "clip_ratio/low_mean": 0.0,
+      "clip_ratio/low_min": 0.0,
+      "clip_ratio/region_mean": 0.0,
+      "completions/clipped_ratio": 0.0,
+      "completions/max_length": 125.4,
+      "completions/max_terminated_length": 125.4,
+      "completions/mean_length": 68.925,
+      "completions/mean_terminated_length": 68.925,
+      "completions/min_length": 26.0,
+      "completions/min_terminated_length": 26.0,
+      "entropy": 0.3110098702833056,
+      "epoch": 0.0017241379310344827,
+      "frac_reward_zero_std": 0.1,
+      "grad_norm": 0.419921875,
+      "learning_rate": 4.836206896551724e-06,
+      "loss": 0.030162644386291505,
+      "num_tokens": 91695.0,
+      "reward": 0.5143539935350419,
+      "reward_std": 0.2790891878306866,
+      "rewards/reward_fn/mean": 0.5143539935350419,
+      "rewards/reward_fn/std": 0.279089218378067,
+      "step": 40,
+      "step_time": 21.565513409599998
+    },
+    {
+      "clip_ratio/high_max": 0.0,
+      "clip_ratio/high_mean": 0.0,
+      "clip_ratio/low_mean": 0.0,
+      "clip_ratio/low_min": 0.0,
+      "clip_ratio/region_mean": 0.0,
+      "completions/clipped_ratio": 0.0,
+      "completions/max_length": 100.0,
+      "completions/max_terminated_length": 100.0,
+      "completions/mean_length": 53.875,
+      "completions/mean_terminated_length": 53.875,
+      "completions/min_length": 16.6,
+      "completions/min_terminated_length": 16.6,
+      "entropy": 0.23047098610550165,
+      "epoch": 0.001939655172413793,
+      "frac_reward_zero_std": 0.4,
+      "grad_norm": 1.15625,
+      "learning_rate": 4.793103448275862e-06,
+      "loss": -0.04350074529647827,
+      "num_tokens": 102994.0,
+      "reward": 0.6398920059204102,
+      "reward_std": 0.2279826147481799,
+      "rewards/reward_fn/mean": 0.6398920059204102,
+      "rewards/reward_fn/std": 0.22798261437565087,
+      "step": 45,
+      "step_time": 18.23975218899968
+    },
+    {
+      "clip_ratio/high_max": 0.0,
+      "clip_ratio/high_mean": 0.0,
+      "clip_ratio/low_mean": 0.0,
+      "clip_ratio/low_min": 0.0,
+      "clip_ratio/region_mean": 0.0,
+      "completions/clipped_ratio": 0.0,
+      "completions/max_length": 133.4,
+      "completions/max_terminated_length": 133.4,
+      "completions/mean_length": 75.975,
+      "completions/mean_terminated_length": 75.975,
+      "completions/min_length": 23.8,
+      "completions/min_terminated_length": 23.8,
+      "entropy": 0.34613882582634686,
+      "epoch": 0.0021551724137931034,
+      "frac_reward_zero_std": 0.2,
+      "grad_norm": 0.1318359375,
+      "learning_rate": 4.75e-06,
+      "loss": -0.03668657541275024,
+      "num_tokens": 115289.0,
+      "reward": 0.33042599707841874,
+      "reward_std": 0.5918701648712158,
+      "rewards/reward_fn/mean": 0.33042599707841874,
+      "rewards/reward_fn/std": 0.5918702006340026,
+      "step": 50,
+      "step_time": 22.43372436519985
+    },
+    {
+      "clip_ratio/high_max": 0.0,
+      "clip_ratio/high_mean": 0.0,
+      "clip_ratio/low_mean": 0.0,
+      "clip_ratio/low_min": 0.0,
+      "clip_ratio/region_mean": 0.0,
+      "completions/clipped_ratio": 0.0,
+      "completions/max_length": 92.2,
+      "completions/max_terminated_length": 92.2,
+      "completions/mean_length": 50.65,
+      "completions/mean_terminated_length": 50.65,
+      "completions/min_length": 18.0,
+      "completions/min_terminated_length": 18.0,
+      "entropy": 0.3391244841972366,
+      "epoch": 0.0023706896551724138,
+      "frac_reward_zero_std": 0.3,
+      "grad_norm": 0.39453125,
+      "learning_rate": 4.706896551724138e-06,
+      "loss": -0.0521969735622406,
+      "num_tokens": 126111.0,
+      "reward": 0.6139639973640442,
+      "reward_std": 0.18208687230944634,
+      "rewards/reward_fn/mean": 0.6139639973640442,
+      "rewards/reward_fn/std": 0.18208688013255597,
+      "step": 55,
+      "step_time": 16.889007100199933
+    },
+    {
+      "clip_ratio/high_max": 0.0,
+      "clip_ratio/high_mean": 0.0,
+      "clip_ratio/low_mean": 0.0,
+      "clip_ratio/low_min": 0.0,
+      "clip_ratio/region_mean": 0.0,
+      "completions/clipped_ratio": 0.025,
+      "completions/max_length": 164.0,
+      "completions/max_terminated_length": 162.4,
+      "completions/mean_length": 82.1,
+      "completions/mean_terminated_length": 78.87857208251953,
+      "completions/min_length": 29.4,
+      "completions/min_terminated_length": 29.4,
+      "entropy": 0.28240502553526314,
+      "epoch": 0.002586206896551724,
+      "frac_reward_zero_std": 0.5,
+      "grad_norm": 0.2373046875,
+      "learning_rate": 4.663793103448276e-06,
+      "loss": -0.0108717679977417,
+      "num_tokens": 138667.0,
+      "reward": 0.6616780042648316,
+      "reward_std": 0.18088504523038865,
+      "rewards/reward_fn/mean": 0.6616780042648316,
+      "rewards/reward_fn/std": 0.18088504672050476,
+      "step": 60,
+      "step_time": 26.46498533039994
+    },
+    {
+      "clip_ratio/high_max": 0.0,
+      "clip_ratio/high_mean": 0.0,
+      "clip_ratio/low_mean": 0.0,
+      "clip_ratio/low_min": 0.0,
+      "clip_ratio/region_mean": 0.0,
+      "completions/clipped_ratio": 0.05,
+      "completions/max_length": 134.8,
+      "completions/max_terminated_length": 133.0,
+      "completions/mean_length": 69.55,
+      "completions/mean_terminated_length": 63.233334350585935,
+      "completions/min_length": 15.8,
+      "completions/min_terminated_length": 15.8,
+      "entropy": 0.3688268234953284,
+      "epoch": 0.0028017241379310344,
+      "frac_reward_zero_std": 0.1,
+      "grad_norm": 0.2216796875,
+      "learning_rate": 4.620689655172414e-06,
+      "loss": 0.09875075221061706,
+      "num_tokens": 150585.0,
+      "reward": 0.5008320093154908,
+      "reward_std": 0.2221744753420353,
+      "rewards/reward_fn/mean": 0.5008320093154908,
+      "rewards/reward_fn/std": 0.22217447832226753,
+      "step": 65,
+      "step_time": 22.69940989719953
+    },
+    {
+      "clip_ratio/high_max": 0.0,
+      "clip_ratio/high_mean": 0.0,
+      "clip_ratio/low_mean": 0.0,
+      "clip_ratio/low_min": 0.0,
+      "clip_ratio/region_mean": 0.0,
+      "completions/clipped_ratio": 0.05,
+      "completions/max_length": 149.4,
+      "completions/max_terminated_length": 141.4,
+      "completions/mean_length": 82.125,
+      "completions/mean_terminated_length": 76.2107162475586,
+      "completions/min_length": 34.8,
+      "completions/min_terminated_length": 34.8,
+      "entropy": 0.2419425747357309,
+      "epoch": 0.003017241379310345,
+      "frac_reward_zero_std": 0.3,
+      "grad_norm": 0.5859375,
+      "learning_rate": 4.577586206896552e-06,
+      "loss": -0.06608393192291259,
+      "num_tokens": 162726.0,
+      "reward": 0.35277000069618225,
+      "reward_std": 0.37774557205848397,
+      "rewards/reward_fn/mean": 0.35277000069618225,
+      "rewards/reward_fn/std": 0.37774557354860006,
+      "step": 70,
+      "step_time": 24.30778650340026
+    },
+    {
+      "clip_ratio/high_max": 0.0,
+      "clip_ratio/high_mean": 0.0,
+      "clip_ratio/low_mean": 0.0,
+      "clip_ratio/low_min": 0.0,
+      "clip_ratio/region_mean": 0.0,
+      "completions/clipped_ratio": 0.025,
+      "completions/max_length": 137.6,
+      "completions/max_terminated_length": 135.6,
+      "completions/mean_length": 77.55,
+      "completions/mean_terminated_length": 74.46785888671874,
+      "completions/min_length": 41.8,
+      "completions/min_terminated_length": 41.8,
+      "entropy": 0.35793030727654696,
+      "epoch": 0.003232758620689655,
+      "frac_reward_zero_std": 0.1,
+      "grad_norm": 0.63671875,
+      "learning_rate": 4.53448275862069e-06,
+      "loss": 0.10921311378479004,
+      "num_tokens": 175364.0,
+      "reward": 0.15934600830078124,
+      "reward_std": 0.5374872148036957,
+      "rewards/reward_fn/mean": 0.15934600830078124,
+      "rewards/reward_fn/std": 0.5374872386455536,
+      "step": 75,
+      "step_time": 23.099894465200123
+    },
+    {
+      "clip_ratio/high_max": 0.0,
+      "clip_ratio/high_mean": 0.0,
+      "clip_ratio/low_mean": 0.0,
+      "clip_ratio/low_min": 0.0,
+      "clip_ratio/region_mean": 0.0,
+      "completions/clipped_ratio": 0.0,
+      "completions/max_length": 95.6,
+      "completions/max_terminated_length": 95.6,
+      "completions/mean_length": 53.175,
+      "completions/mean_terminated_length": 53.175,
+      "completions/min_length": 15.8,
+      "completions/min_terminated_length": 15.8,
+      "entropy": 0.2758270605234429,
+      "epoch": 0.0034482758620689655,
+      "frac_reward_zero_std": 0.3,
+      "grad_norm": 0.443359375,
+      "learning_rate": 4.4913793103448275e-06,
+      "loss": 0.11924625635147094,
+      "num_tokens": 186407.0,
+      "reward": 0.6774819850921631,
+      "reward_std": 0.11054287778679281,
+      "rewards/reward_fn/mean": 0.6774819850921631,
+      "rewards/reward_fn/std": 0.11054288879968226,
+      "step": 80,
+      "step_time": 17.348075130399685
+    },
+    {
+      "clip_ratio/high_max": 0.0,
+      "clip_ratio/high_mean": 0.0,
+      "clip_ratio/low_mean": 0.0,
+      "clip_ratio/low_min": 0.0,
+      "clip_ratio/region_mean": 0.0,
+      "completions/clipped_ratio": 0.0,
+      "completions/max_length": 137.6,
+      "completions/max_terminated_length": 137.6,
+      "completions/mean_length": 68.25,
+      "completions/mean_terminated_length": 68.25,
+      "completions/min_length": 30.0,
+      "completions/min_terminated_length": 30.0,
+      "entropy": 0.2696234828326851,
+      "epoch": 0.003663793103448276,
+      "frac_reward_zero_std": 0.1,
+      "grad_norm": 0.287109375,
+      "learning_rate": 4.4482758620689656e-06,
+      "loss": -0.03822631537914276,
+      "num_tokens": 198241.0,
+      "reward": 0.5305620029568672,
+      "reward_std": 0.4029154841089621,
+      "rewards/reward_fn/mean": 0.5305620029568672,
+      "rewards/reward_fn/std": 0.4029154877178371,
+      "step": 85,
+      "step_time": 22.834117503599828
+    },
+    {
+      "clip_ratio/high_max": 0.0,
+      "clip_ratio/high_mean": 0.0,
+      "clip_ratio/low_mean": 0.0,
+      "clip_ratio/low_min": 0.0,
+      "clip_ratio/region_mean": 0.0,
+      "completions/clipped_ratio": 0.0,
+      "completions/max_length": 131.2,
+      "completions/max_terminated_length": 131.2,
+      "completions/mean_length": 77.45,
+      "completions/mean_terminated_length": 77.45,
+      "completions/min_length": 19.8,
+      "completions/min_terminated_length": 19.8,
+      "entropy": 0.3707127865403891,
+      "epoch": 0.003879310344827586,
+      "frac_reward_zero_std": 0.1,
+      "grad_norm": 0.50390625,
+      "learning_rate": 4.405172413793104e-06,
+      "loss": 0.035861659049987796,
+      "num_tokens": 210935.0,
+      "reward": 0.3616240084171295,
+      "reward_std": 0.48762375079095366,
+      "rewards/reward_fn/mean": 0.3616240084171295,
+      "rewards/reward_fn/std": 0.48762374445796014,
+      "step": 90,
+      "step_time": 22.48314213299982
+    },
+    {
+      "clip_ratio/high_max": 0.0,
+      "clip_ratio/high_mean": 0.0,
+      "clip_ratio/low_mean": 0.0,
+      "clip_ratio/low_min": 0.0,
+      "clip_ratio/region_mean": 0.0,
+      "completions/clipped_ratio": 0.075,
+      "completions/max_length": 118.2,
+      "completions/max_terminated_length": 116.0,
+      "completions/mean_length": 66.9,
+      "completions/mean_terminated_length": 56.7,
+      "completions/min_length": 25.0,
+      "completions/min_terminated_length": 25.0,
+      "entropy": 0.342819757014513,
+      "epoch": 0.004094827586206897,
+      "frac_reward_zero_std": 0.2,
+      "grad_norm": 0.78125,
+      "learning_rate": 4.362068965517242e-06,
+      "loss": -0.051665693521499634,
+      "num_tokens": 222647.0,
+      "reward": 0.46663198471069334,
+      "reward_std": 0.3400567059754394,
+      "rewards/reward_fn/mean": 0.46663198471069334,
+      "rewards/reward_fn/std": 0.34005671155173334,
+      "step": 95,
+      "step_time": 20.31996373819993
+    },
+    {
+      "clip_ratio/high_max": 0.0,
+      "clip_ratio/high_mean": 0.0,
+      "clip_ratio/low_mean": 0.0,
+      "clip_ratio/low_min": 0.0,
+      "clip_ratio/region_mean": 0.0,
+      "completions/clipped_ratio": 0.0,
+      "completions/max_length": 82.2,
+      "completions/max_terminated_length": 82.2,
+      "completions/mean_length": 39.975,
+      "completions/mean_terminated_length": 39.975,
+      "completions/min_length": 16.4,
+      "completions/min_terminated_length": 16.4,
+      "entropy": 0.25842549465596676,
+      "epoch": 0.004310344827586207,
+      "frac_reward_zero_std": 0.2,
+      "grad_norm": 0.64453125,
+      "learning_rate": 4.31896551724138e-06,
+      "loss": 0.029389530420303345,
+      "num_tokens": 233466.0,
+      "reward": 0.5307899951934815,
+      "reward_std": 0.28180397795513273,
+      "rewards/reward_fn/mean": 0.5307899951934815,
+      "rewards/reward_fn/std": 0.28180398060940204,
+      "step": 100,
+      "step_time": 15.753981888000453
+    }
+  ],
+  "logging_steps": 5,
+  "max_steps": 600,
+  "num_input_tokens_seen": 233466,
+  "num_train_epochs": 1,
+  "save_steps": 50,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 0.0,
+  "train_batch_size": 1,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-100/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0cf50531ca359d39fbfa05dc0896e03532a83d53b029182b6d7f757efab0c97a
+size 7185