Instructions to use Vikaspandey582003/echo-calibration-adapter with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Vikaspandey582003/echo-calibration-adapter with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen2.5-7B-Instruct")
model = PeftModel.from_pretrained(base_model, "Vikaspandey582003/echo-calibration-adapter")

Transformers

How to use Vikaspandey582003/echo-calibration-adapter with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Vikaspandey582003/echo-calibration-adapter")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Vikaspandey582003/echo-calibration-adapter", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Vikaspandey582003/echo-calibration-adapter with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Vikaspandey582003/echo-calibration-adapter"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Vikaspandey582003/echo-calibration-adapter",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Vikaspandey582003/echo-calibration-adapter

SGLang

How to use Vikaspandey582003/echo-calibration-adapter with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Vikaspandey582003/echo-calibration-adapter" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Vikaspandey582003/echo-calibration-adapter",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Vikaspandey582003/echo-calibration-adapter" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Vikaspandey582003/echo-calibration-adapter",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Vikaspandey582003/echo-calibration-adapter with Docker Model Runner:
```
docker model run hf.co/Vikaspandey582003/echo-calibration-adapter
```

Vikaspandey582003 commited on 15 days ago

Commit

3d90663

verified ·

1 Parent(s): b9dae38

checkpoint step 50

Browse files

Files changed (12) hide show

.gitattributes +1 -0
checkpoint-50/README.md +209 -0
checkpoint-50/adapter_config.json +48 -0
checkpoint-50/adapter_model.safetensors +3 -0
checkpoint-50/chat_template.jinja +54 -0
checkpoint-50/optimizer.pt +3 -0
checkpoint-50/rng_state.pth +3 -0
checkpoint-50/scheduler.pt +3 -0
checkpoint-50/tokenizer.json +3 -0
checkpoint-50/tokenizer_config.json +16 -0
checkpoint-50/trainer_state.json +304 -0
checkpoint-50/training_args.bin +3 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+checkpoint-50/tokenizer.json filter=lfs diff=lfs merge=lfs -text

checkpoint-50/README.md ADDED Viewed

	@@ -0,0 +1,209 @@

+---
+base_model: unsloth/Qwen2.5-7B-Instruct
+library_name: peft
+pipeline_tag: text-generation
+tags:
+- base_model:adapter:unsloth/Qwen2.5-7B-Instruct
+- grpo
+- lora
+- transformers
+- trl
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.19.1

checkpoint-50/adapter_config.json ADDED Viewed

	@@ -0,0 +1,48 @@

+{
+  "alora_invocation_tokens": null,
+  "alpha_pattern": {},
+  "arrow_config": null,
+  "auto_mapping": null,
+  "base_model_name_or_path": "unsloth/Qwen2.5-7B-Instruct",
+  "bias": "none",
+  "corda_config": null,
+  "ensure_weight_tying": false,
+  "eva_config": null,
+  "exclude_modules": null,
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_bias": false,
+  "lora_dropout": 0.05,
+  "lora_ga_config": null,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "peft_version": "0.19.1",
+  "qalora_group_size": 16,
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "down_proj",
+    "up_proj",
+    "k_proj",
+    "v_proj",
+    "gate_proj",
+    "q_proj",
+    "o_proj"
+  ],
+  "target_parameters": null,
+  "task_type": "CAUSAL_LM",
+  "trainable_token_indices": null,
+  "use_bdlora": null,
+  "use_dora": false,
+  "use_qalora": false,
+  "use_rslora": false
+}

checkpoint-50/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8774212fba617daa40bef4cef915e047bc2e6feee4ceebd5580bf1ee2fb50370
+size 80792880

checkpoint-50/chat_template.jinja ADDED Viewed

	@@ -0,0 +1,54 @@

+{%- if tools %}
+    {{- '<|im_start|>system\n' }}
+    {%- if messages[0]['role'] == 'system' %}
+        {{- messages[0]['content'] }}
+    {%- else %}
+        {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}
+    {%- endif %}
+    {{- "\n\n# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
+    {%- for tool in tools %}
+        {{- "\n" }}
+        {{- tool | tojson }}
+    {%- endfor %}
+    {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
+{%- else %}
+    {%- if messages[0]['role'] == 'system' %}
+        {{- '<|im_start|>system\n' + messages[0]['content'] + '<|im_end|>\n' }}
+    {%- else %}
+        {{- '<|im_start|>system\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\n' }}
+    {%- endif %}
+{%- endif %}
+{%- for message in messages %}
+    {%- if (message.role == "user") or (message.role == "system" and not loop.first) or (message.role == "assistant" and not message.tool_calls) %}
+        {{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
+    {%- elif message.role == "assistant" %}
+        {{- '<|im_start|>' + message.role }}
+        {%- if message.content %}
+            {{- '\n' + message.content }}
+        {%- endif %}
+        {%- for tool_call in message.tool_calls %}
+            {%- if tool_call.function is defined %}
+                {%- set tool_call = tool_call.function %}
+            {%- endif %}
+            {{- '\n<tool_call>\n{"name": "' }}
+            {{- tool_call.name }}
+            {{- '", "arguments": ' }}
+            {{- tool_call.arguments | tojson }}
+            {{- '}\n</tool_call>' }}
+        {%- endfor %}
+        {{- '<|im_end|>\n' }}
+    {%- elif message.role == "tool" %}
+        {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != "tool") %}
+            {{- '<|im_start|>user' }}
+        {%- endif %}
+        {{- '\n<tool_response>\n' }}
+        {{- message.content }}
+        {{- '\n</tool_response>' }}
+        {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
+            {{- '<|im_end|>\n' }}
+        {%- endif %}
+    {%- endif %}
+{%- endfor %}
+{%- if add_generation_prompt %}
+    {{- '<|im_start|>assistant\n' }}
+{%- endif %}

checkpoint-50/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0f91af0fa995043cb9e216dca74e80d09ee819a9d274a9dcf260d0411a6b48ed
+size 161816251

checkpoint-50/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:42c8957cbd17b37e5391f10035f189ea0492f94bda207d033f16b09cc832dbcf
+size 14645

checkpoint-50/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:259b1303d82fa78c2e55eeb7df6096e0b57593f81a8f6658f2d1675d51e39965
+size 1465

checkpoint-50/tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3fd169731d2cbde95e10bf356d66d5997fd885dd8dbb6fb4684da3f23b2585d8
+size 11421892

checkpoint-50/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,16 @@

+{
+  "add_prefix_space": false,
+  "backend": "tokenizers",
+  "bos_token": null,
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "is_local": false,
+  "local_files_only": false,
+  "model_max_length": 32768,
+  "pad_token": "<|vision_pad|>",
+  "padding_side": "left",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

checkpoint-50/trainer_state.json ADDED Viewed

	@@ -0,0 +1,304 @@

+{
+  "best_global_step": null,
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.0021551724137931034,
+  "eval_steps": 500,
+  "global_step": 50,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "clip_ratio/high_max": 0.0,
+      "clip_ratio/high_mean": 0.0,
+      "clip_ratio/low_mean": 0.0,
+      "clip_ratio/low_min": 0.0,
+      "clip_ratio/region_mean": 0.0,
+      "completions/clipped_ratio": 0.0,
+      "completions/max_length": 130.0,
+      "completions/max_terminated_length": 130.0,
+      "completions/mean_length": 68.5,
+      "completions/mean_terminated_length": 68.5,
+      "completions/min_length": 26.8,
+      "completions/min_terminated_length": 26.8,
+      "entropy": 0.25826080311089755,
+      "epoch": 0.00021551724137931034,
+      "frac_reward_zero_std": 0.1,
+      "grad_norm": 0.28125,
+      "learning_rate": 1.0000000000000002e-06,
+      "loss": 0.10238287448883057,
+      "num_tokens": 12064.0,
+      "reward": 0.3290319949388504,
+      "reward_std": 0.40028320252895355,
+      "rewards/reward_fn/mean": 0.3290319949388504,
+      "rewards/reward_fn/std": 0.400283208489418,
+      "step": 5,
+      "step_time": 22.194597821800016
+    },
+    {
+      "clip_ratio/high_max": 0.0,
+      "clip_ratio/high_mean": 0.0,
+      "clip_ratio/low_mean": 0.0,
+      "clip_ratio/low_min": 0.0,
+      "clip_ratio/region_mean": 0.0,
+      "completions/clipped_ratio": 0.0,
+      "completions/max_length": 87.6,
+      "completions/max_terminated_length": 87.6,
+      "completions/mean_length": 48.525,
+      "completions/mean_terminated_length": 48.525,
+      "completions/min_length": 17.0,
+      "completions/min_terminated_length": 17.0,
+      "entropy": 0.2673827801831067,
+      "epoch": 0.0004310344827586207,
+      "frac_reward_zero_std": 0.1,
+      "grad_norm": 0.484375,
+      "learning_rate": 2.25e-06,
+      "loss": 0.05849265456199646,
+      "num_tokens": 23089.0,
+      "reward": 0.45969198942184447,
+      "reward_std": 0.2795014828443527,
+      "rewards/reward_fn/mean": 0.45969198942184447,
+      "rewards/reward_fn/std": 0.27950150668621065,
+      "step": 10,
+      "step_time": 16.331819552399928
+    },
+    {
+      "clip_ratio/high_max": 0.0,
+      "clip_ratio/high_mean": 0.0,
+      "clip_ratio/low_mean": 0.0,
+      "clip_ratio/low_min": 0.0,
+      "clip_ratio/region_mean": 0.0,
+      "completions/clipped_ratio": 0.0,
+      "completions/max_length": 85.6,
+      "completions/max_terminated_length": 85.6,
+      "completions/mean_length": 47.35,
+      "completions/mean_terminated_length": 47.35,
+      "completions/min_length": 25.0,
+      "completions/min_terminated_length": 25.0,
+      "entropy": 0.20174795808270574,
+      "epoch": 0.000646551724137931,
+      "frac_reward_zero_std": 0.4,
+      "grad_norm": 0.66015625,
+      "learning_rate": 3.5e-06,
+      "loss": -0.03555725216865539,
+      "num_tokens": 34239.0,
+      "reward": 0.5545999944210053,
+      "reward_std": 0.32832055240869523,
+      "rewards/reward_fn/mean": 0.5545999944210053,
+      "rewards/reward_fn/std": 0.3283205583691597,
+      "step": 15,
+      "step_time": 16.44816417159991
+    },
+    {
+      "clip_ratio/high_max": 0.0,
+      "clip_ratio/high_mean": 0.0,
+      "clip_ratio/low_mean": 0.0,
+      "clip_ratio/low_min": 0.0,
+      "clip_ratio/region_mean": 0.0,
+      "completions/clipped_ratio": 0.0,
+      "completions/max_length": 98.2,
+      "completions/max_terminated_length": 98.2,
+      "completions/mean_length": 53.6,
+      "completions/mean_terminated_length": 53.6,
+      "completions/min_length": 25.0,
+      "completions/min_terminated_length": 25.0,
+      "entropy": 0.2620095924474299,
+      "epoch": 0.0008620689655172414,
+      "frac_reward_zero_std": 0.2,
+      "grad_norm": 0.5390625,
+      "learning_rate": 4.75e-06,
+      "loss": 0.017401468753814698,
+      "num_tokens": 45399.0,
+      "reward": 0.4466240078210831,
+      "reward_std": 0.27573536019772293,
+      "rewards/reward_fn/mean": 0.4466240078210831,
+      "rewards/reward_fn/std": 0.27573537137359383,
+      "step": 20,
+      "step_time": 17.988385831799953
+    },
+    {
+      "clip_ratio/high_max": 0.0,
+      "clip_ratio/high_mean": 0.0,
+      "clip_ratio/low_mean": 0.0,
+      "clip_ratio/low_min": 0.0,
+      "clip_ratio/region_mean": 0.0,
+      "completions/clipped_ratio": 0.0,
+      "completions/max_length": 123.8,
+      "completions/max_terminated_length": 123.8,
+      "completions/mean_length": 59.15,
+      "completions/mean_terminated_length": 59.15,
+      "completions/min_length": 21.8,
+      "completions/min_terminated_length": 21.8,
+      "entropy": 0.33701689867302775,
+      "epoch": 0.0010775862068965517,
+      "frac_reward_zero_std": 0.0,
+      "grad_norm": 0.74609375,
+      "learning_rate": 4.965517241379311e-06,
+      "loss": -0.02674364447593689,
+      "num_tokens": 56993.0,
+      "reward": 0.5006439983844757,
+      "reward_std": 0.2351181447505951,
+      "rewards/reward_fn/mean": 0.5006439983844757,
+      "rewards/reward_fn/std": 0.23511814773082734,
+      "step": 25,
+      "step_time": 21.11461545459997
+    },
+    {
+      "clip_ratio/high_max": 0.0,
+      "clip_ratio/high_mean": 0.0,
+      "clip_ratio/low_mean": 0.0,
+      "clip_ratio/low_min": 0.0,
+      "clip_ratio/region_mean": 0.0,
+      "completions/clipped_ratio": 0.0,
+      "completions/max_length": 110.2,
+      "completions/max_terminated_length": 110.2,
+      "completions/mean_length": 47.95,
+      "completions/mean_terminated_length": 47.95,
+      "completions/min_length": 15.8,
+      "completions/min_terminated_length": 15.8,
+      "entropy": 0.24504410615190864,
+      "epoch": 0.001293103448275862,
+      "frac_reward_zero_std": 0.4,
+      "grad_norm": 0.92578125,
+      "learning_rate": 4.922413793103449e-06,
+      "loss": -0.05023183822631836,
+      "num_tokens": 67659.0,
+      "reward": 0.6457239985466003,
+      "reward_std": 0.11179900387069211,
+      "rewards/reward_fn/mean": 0.6457239985466003,
+      "rewards/reward_fn/std": 0.11179901438299567,
+      "step": 30,
+      "step_time": 19.134268762199827
+    },
+    {
+      "clip_ratio/high_max": 0.0,
+      "clip_ratio/high_mean": 0.0,
+      "clip_ratio/low_mean": 0.0,
+      "clip_ratio/low_min": 0.0,
+      "clip_ratio/region_mean": 0.0,
+      "completions/clipped_ratio": 0.1,
+      "completions/max_length": 150.8,
+      "completions/max_terminated_length": 120.2,
+      "completions/mean_length": 73.375,
+      "completions/mean_terminated_length": 54.36666717529297,
+      "completions/min_length": 15.2,
+      "completions/min_terminated_length": 15.2,
+      "entropy": 0.23285924410447478,
+      "epoch": 0.0015086206896551724,
+      "frac_reward_zero_std": 0.3,
+      "grad_norm": 0.58203125,
+      "learning_rate": 4.879310344827586e-06,
+      "loss": -0.05149807929992676,
+      "num_tokens": 79734.0,
+      "reward": 0.46414998471736907,
+      "reward_std": 0.4071369742392562,
+      "rewards/reward_fn/mean": 0.46414998471736907,
+      "rewards/reward_fn/std": 0.40713699695188554,
+      "step": 35,
+      "step_time": 24.679699858600042
+    },
+    {
+      "clip_ratio/high_max": 0.0,
+      "clip_ratio/high_mean": 0.0,
+      "clip_ratio/low_mean": 0.0,
+      "clip_ratio/low_min": 0.0,
+      "clip_ratio/region_mean": 0.0,
+      "completions/clipped_ratio": 0.0,
+      "completions/max_length": 125.4,
+      "completions/max_terminated_length": 125.4,
+      "completions/mean_length": 68.925,
+      "completions/mean_terminated_length": 68.925,
+      "completions/min_length": 26.0,
+      "completions/min_terminated_length": 26.0,
+      "entropy": 0.3110098702833056,
+      "epoch": 0.0017241379310344827,
+      "frac_reward_zero_std": 0.1,
+      "grad_norm": 0.419921875,
+      "learning_rate": 4.836206896551724e-06,
+      "loss": 0.030162644386291505,
+      "num_tokens": 91695.0,
+      "reward": 0.5143539935350419,
+      "reward_std": 0.2790891878306866,
+      "rewards/reward_fn/mean": 0.5143539935350419,
+      "rewards/reward_fn/std": 0.279089218378067,
+      "step": 40,
+      "step_time": 21.565513409599998
+    },
+    {
+      "clip_ratio/high_max": 0.0,
+      "clip_ratio/high_mean": 0.0,
+      "clip_ratio/low_mean": 0.0,
+      "clip_ratio/low_min": 0.0,
+      "clip_ratio/region_mean": 0.0,
+      "completions/clipped_ratio": 0.0,
+      "completions/max_length": 100.0,
+      "completions/max_terminated_length": 100.0,
+      "completions/mean_length": 53.875,
+      "completions/mean_terminated_length": 53.875,
+      "completions/min_length": 16.6,
+      "completions/min_terminated_length": 16.6,
+      "entropy": 0.23047098610550165,
+      "epoch": 0.001939655172413793,
+      "frac_reward_zero_std": 0.4,
+      "grad_norm": 1.15625,
+      "learning_rate": 4.793103448275862e-06,
+      "loss": -0.04350074529647827,
+      "num_tokens": 102994.0,
+      "reward": 0.6398920059204102,
+      "reward_std": 0.2279826147481799,
+      "rewards/reward_fn/mean": 0.6398920059204102,
+      "rewards/reward_fn/std": 0.22798261437565087,
+      "step": 45,
+      "step_time": 18.23975218899968
+    },
+    {
+      "clip_ratio/high_max": 0.0,
+      "clip_ratio/high_mean": 0.0,
+      "clip_ratio/low_mean": 0.0,
+      "clip_ratio/low_min": 0.0,
+      "clip_ratio/region_mean": 0.0,
+      "completions/clipped_ratio": 0.0,
+      "completions/max_length": 133.4,
+      "completions/max_terminated_length": 133.4,
+      "completions/mean_length": 75.975,
+      "completions/mean_terminated_length": 75.975,
+      "completions/min_length": 23.8,
+      "completions/min_terminated_length": 23.8,
+      "entropy": 0.34613882582634686,
+      "epoch": 0.0021551724137931034,
+      "frac_reward_zero_std": 0.2,
+      "grad_norm": 0.1318359375,
+      "learning_rate": 4.75e-06,
+      "loss": -0.03668657541275024,
+      "num_tokens": 115289.0,
+      "reward": 0.33042599707841874,
+      "reward_std": 0.5918701648712158,
+      "rewards/reward_fn/mean": 0.33042599707841874,
+      "rewards/reward_fn/std": 0.5918702006340026,
+      "step": 50,
+      "step_time": 22.43372436519985
+    }
+  ],
+  "logging_steps": 5,
+  "max_steps": 600,
+  "num_input_tokens_seen": 115289,
+  "num_train_epochs": 1,
+  "save_steps": 50,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 0.0,
+  "train_batch_size": 1,
+  "trial_name": null,
+  "trial_params": null
+}

checkpoint-50/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0cf50531ca359d39fbfa05dc0896e03532a83d53b029182b6d7f757efab0c97a
+size 7185