leaderboard-pr-bot commited on
Commit
9ad800e
·
verified ·
1 Parent(s): 7fe96b0

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +109 -1
README.md CHANGED
@@ -2,7 +2,115 @@
2
  license: other
3
  license_name: qwen
4
  license_link: https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  ---
6
  Fine-Tuned Qwen 2.5-Coder-1.5B is a causal language model fine-tuned for generating contextually relevant responses. The base model, Qwen/Qwen2.5-Coder-1.5B, features a Transformer-based architecture with 1.5 billion parameters. The model was fine-tuned on a custom dataset named subset5, consisting of prompt-response pairs tokenized with a maximum sequence length of 128 tokens. During training, inputs were padded and truncated appropriately, and labels were aligned for causal language modeling. Key hyperparameters included a learning rate of 2e-5, batch size of 1, gradient accumulation steps of 32, and 3 epochs. The AdamW optimizer was used, with weight decay set to 0.01. Training was performed on CPU without CUDA.
7
 
8
- The model can be used for tasks like answering questions, completing sentences, or generating responses. For usage, load the model and tokenizer with the Hugging Face Transformers library, tokenize your input prompt, and generate responses with the model’s generate method. Example input-output pairs demonstrate the model’s ability to generate concise, informative answers. However, the model should not be used for harmful, malicious, or unethical content, and users are responsible for adhering to applicable laws and ethical standards.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  license: other
3
  license_name: qwen
4
  license_link: https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B
5
+ model-index:
6
+ - name: Qwenftmodel
7
+ results:
8
+ - task:
9
+ type: text-generation
10
+ name: Text Generation
11
+ dataset:
12
+ name: IFEval (0-Shot)
13
+ type: HuggingFaceH4/ifeval
14
+ args:
15
+ num_few_shot: 0
16
+ metrics:
17
+ - type: inst_level_strict_acc and prompt_level_strict_acc
18
+ value: 17.29
19
+ name: strict accuracy
20
+ source:
21
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sumink/Qwenftmodel
22
+ name: Open LLM Leaderboard
23
+ - task:
24
+ type: text-generation
25
+ name: Text Generation
26
+ dataset:
27
+ name: BBH (3-Shot)
28
+ type: BBH
29
+ args:
30
+ num_few_shot: 3
31
+ metrics:
32
+ - type: acc_norm
33
+ value: 14.04
34
+ name: normalized accuracy
35
+ source:
36
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sumink/Qwenftmodel
37
+ name: Open LLM Leaderboard
38
+ - task:
39
+ type: text-generation
40
+ name: Text Generation
41
+ dataset:
42
+ name: MATH Lvl 5 (4-Shot)
43
+ type: hendrycks/competition_math
44
+ args:
45
+ num_few_shot: 4
46
+ metrics:
47
+ - type: exact_match
48
+ value: 7.7
49
+ name: exact match
50
+ source:
51
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sumink/Qwenftmodel
52
+ name: Open LLM Leaderboard
53
+ - task:
54
+ type: text-generation
55
+ name: Text Generation
56
+ dataset:
57
+ name: GPQA (0-shot)
58
+ type: Idavidrein/gpqa
59
+ args:
60
+ num_few_shot: 0
61
+ metrics:
62
+ - type: acc_norm
63
+ value: 0.89
64
+ name: acc_norm
65
+ source:
66
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sumink/Qwenftmodel
67
+ name: Open LLM Leaderboard
68
+ - task:
69
+ type: text-generation
70
+ name: Text Generation
71
+ dataset:
72
+ name: MuSR (0-shot)
73
+ type: TAUR-Lab/MuSR
74
+ args:
75
+ num_few_shot: 0
76
+ metrics:
77
+ - type: acc_norm
78
+ value: 4.61
79
+ name: acc_norm
80
+ source:
81
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sumink/Qwenftmodel
82
+ name: Open LLM Leaderboard
83
+ - task:
84
+ type: text-generation
85
+ name: Text Generation
86
+ dataset:
87
+ name: MMLU-PRO (5-shot)
88
+ type: TIGER-Lab/MMLU-Pro
89
+ config: main
90
+ split: test
91
+ args:
92
+ num_few_shot: 5
93
+ metrics:
94
+ - type: acc
95
+ value: 14.88
96
+ name: accuracy
97
+ source:
98
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=sumink/Qwenftmodel
99
+ name: Open LLM Leaderboard
100
  ---
101
  Fine-Tuned Qwen 2.5-Coder-1.5B is a causal language model fine-tuned for generating contextually relevant responses. The base model, Qwen/Qwen2.5-Coder-1.5B, features a Transformer-based architecture with 1.5 billion parameters. The model was fine-tuned on a custom dataset named subset5, consisting of prompt-response pairs tokenized with a maximum sequence length of 128 tokens. During training, inputs were padded and truncated appropriately, and labels were aligned for causal language modeling. Key hyperparameters included a learning rate of 2e-5, batch size of 1, gradient accumulation steps of 32, and 3 epochs. The AdamW optimizer was used, with weight decay set to 0.01. Training was performed on CPU without CUDA.
102
 
103
+ The model can be used for tasks like answering questions, completing sentences, or generating responses. For usage, load the model and tokenizer with the Hugging Face Transformers library, tokenize your input prompt, and generate responses with the model’s generate method. Example input-output pairs demonstrate the model’s ability to generate concise, informative answers. However, the model should not be used for harmful, malicious, or unethical content, and users are responsible for adhering to applicable laws and ethical standards.
104
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
105
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_sumink__Qwenftmodel)
106
+
107
+ | Metric |Value|
108
+ |-------------------|----:|
109
+ |Avg. | 9.90|
110
+ |IFEval (0-Shot) |17.29|
111
+ |BBH (3-Shot) |14.04|
112
+ |MATH Lvl 5 (4-Shot)| 7.70|
113
+ |GPQA (0-shot) | 0.89|
114
+ |MuSR (0-shot) | 4.61|
115
+ |MMLU-PRO (5-shot) |14.88|
116
+