Adding Evaluation Results
Browse filesThis is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr
The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.
If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions
README.md
CHANGED
|
@@ -2,7 +2,115 @@
|
|
| 2 |
license: other
|
| 3 |
license_name: qwen
|
| 4 |
license_link: https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
---
|
| 6 |
Fine-Tuned Qwen 2.5-Coder-1.5B is a causal language model fine-tuned for generating contextually relevant responses. The base model, Qwen/Qwen2.5-Coder-1.5B, features a Transformer-based architecture with 1.5 billion parameters. The model was fine-tuned on a custom dataset named subset5, consisting of prompt-response pairs tokenized with a maximum sequence length of 128 tokens. During training, inputs were padded and truncated appropriately, and labels were aligned for causal language modeling. Key hyperparameters included a learning rate of 2e-5, batch size of 1, gradient accumulation steps of 32, and 3 epochs. The AdamW optimizer was used, with weight decay set to 0.01. Training was performed on CPU without CUDA.
|
| 7 |
|
| 8 |
-
The model can be used for tasks like answering questions, completing sentences, or generating responses. For usage, load the model and tokenizer with the Hugging Face Transformers library, tokenize your input prompt, and generate responses with the model’s generate method. Example input-output pairs demonstrate the model’s ability to generate concise, informative answers. However, the model should not be used for harmful, malicious, or unethical content, and users are responsible for adhering to applicable laws and ethical standards.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
license: other
|
| 3 |
license_name: qwen
|
| 4 |
license_link: https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B
|
| 5 |
+
model-index:
|
| 6 |
+
- name: Qwenftmodel
|
| 7 |
+
results:
|
| 8 |
+
- task:
|
| 9 |
+
type: text-generation
|
| 10 |
+
name: Text Generation
|
| 11 |
+
dataset:
|
| 12 |
+
name: IFEval (0-Shot)
|
| 13 |
+
type: HuggingFaceH4/ifeval
|
| 14 |
+
args:
|
| 15 |
+
num_few_shot: 0
|
| 16 |
+
metrics:
|
| 17 |
+
- type: inst_level_strict_acc and prompt_level_strict_acc
|
| 18 |
+
value: 17.29
|
| 19 |
+
name: strict accuracy
|
| 20 |
+
source:
|
| 21 |
+
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sumink/Qwenftmodel
|
| 22 |
+
name: Open LLM Leaderboard
|
| 23 |
+
- task:
|
| 24 |
+
type: text-generation
|
| 25 |
+
name: Text Generation
|
| 26 |
+
dataset:
|
| 27 |
+
name: BBH (3-Shot)
|
| 28 |
+
type: BBH
|
| 29 |
+
args:
|
| 30 |
+
num_few_shot: 3
|
| 31 |
+
metrics:
|
| 32 |
+
- type: acc_norm
|
| 33 |
+
value: 14.04
|
| 34 |
+
name: normalized accuracy
|
| 35 |
+
source:
|
| 36 |
+
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sumink/Qwenftmodel
|
| 37 |
+
name: Open LLM Leaderboard
|
| 38 |
+
- task:
|
| 39 |
+
type: text-generation
|
| 40 |
+
name: Text Generation
|
| 41 |
+
dataset:
|
| 42 |
+
name: MATH Lvl 5 (4-Shot)
|
| 43 |
+
type: hendrycks/competition_math
|
| 44 |
+
args:
|
| 45 |
+
num_few_shot: 4
|
| 46 |
+
metrics:
|
| 47 |
+
- type: exact_match
|
| 48 |
+
value: 7.7
|
| 49 |
+
name: exact match
|
| 50 |
+
source:
|
| 51 |
+
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sumink/Qwenftmodel
|
| 52 |
+
name: Open LLM Leaderboard
|
| 53 |
+
- task:
|
| 54 |
+
type: text-generation
|
| 55 |
+
name: Text Generation
|
| 56 |
+
dataset:
|
| 57 |
+
name: GPQA (0-shot)
|
| 58 |
+
type: Idavidrein/gpqa
|
| 59 |
+
args:
|
| 60 |
+
num_few_shot: 0
|
| 61 |
+
metrics:
|
| 62 |
+
- type: acc_norm
|
| 63 |
+
value: 0.89
|
| 64 |
+
name: acc_norm
|
| 65 |
+
source:
|
| 66 |
+
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sumink/Qwenftmodel
|
| 67 |
+
name: Open LLM Leaderboard
|
| 68 |
+
- task:
|
| 69 |
+
type: text-generation
|
| 70 |
+
name: Text Generation
|
| 71 |
+
dataset:
|
| 72 |
+
name: MuSR (0-shot)
|
| 73 |
+
type: TAUR-Lab/MuSR
|
| 74 |
+
args:
|
| 75 |
+
num_few_shot: 0
|
| 76 |
+
metrics:
|
| 77 |
+
- type: acc_norm
|
| 78 |
+
value: 4.61
|
| 79 |
+
name: acc_norm
|
| 80 |
+
source:
|
| 81 |
+
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sumink/Qwenftmodel
|
| 82 |
+
name: Open LLM Leaderboard
|
| 83 |
+
- task:
|
| 84 |
+
type: text-generation
|
| 85 |
+
name: Text Generation
|
| 86 |
+
dataset:
|
| 87 |
+
name: MMLU-PRO (5-shot)
|
| 88 |
+
type: TIGER-Lab/MMLU-Pro
|
| 89 |
+
config: main
|
| 90 |
+
split: test
|
| 91 |
+
args:
|
| 92 |
+
num_few_shot: 5
|
| 93 |
+
metrics:
|
| 94 |
+
- type: acc
|
| 95 |
+
value: 14.88
|
| 96 |
+
name: accuracy
|
| 97 |
+
source:
|
| 98 |
+
url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sumink/Qwenftmodel
|
| 99 |
+
name: Open LLM Leaderboard
|
| 100 |
---
|
| 101 |
Fine-Tuned Qwen 2.5-Coder-1.5B is a causal language model fine-tuned for generating contextually relevant responses. The base model, Qwen/Qwen2.5-Coder-1.5B, features a Transformer-based architecture with 1.5 billion parameters. The model was fine-tuned on a custom dataset named subset5, consisting of prompt-response pairs tokenized with a maximum sequence length of 128 tokens. During training, inputs were padded and truncated appropriately, and labels were aligned for causal language modeling. Key hyperparameters included a learning rate of 2e-5, batch size of 1, gradient accumulation steps of 32, and 3 epochs. The AdamW optimizer was used, with weight decay set to 0.01. Training was performed on CPU without CUDA.
|
| 102 |
|
| 103 |
+
The model can be used for tasks like answering questions, completing sentences, or generating responses. For usage, load the model and tokenizer with the Hugging Face Transformers library, tokenize your input prompt, and generate responses with the model’s generate method. Example input-output pairs demonstrate the model’s ability to generate concise, informative answers. However, the model should not be used for harmful, malicious, or unethical content, and users are responsible for adhering to applicable laws and ethical standards.
|
| 104 |
+
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
|
| 105 |
+
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_sumink__Qwenftmodel)
|
| 106 |
+
|
| 107 |
+
| Metric |Value|
|
| 108 |
+
|-------------------|----:|
|
| 109 |
+
|Avg. | 9.90|
|
| 110 |
+
|IFEval (0-Shot) |17.29|
|
| 111 |
+
|BBH (3-Shot) |14.04|
|
| 112 |
+
|MATH Lvl 5 (4-Shot)| 7.70|
|
| 113 |
+
|GPQA (0-shot) | 0.89|
|
| 114 |
+
|MuSR (0-shot) | 4.61|
|
| 115 |
+
|MMLU-PRO (5-shot) |14.88|
|
| 116 |
+
|