schonsense commited on
Commit
7996105
·
verified ·
1 Parent(s): ec5db72

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +96 -93
README.md CHANGED
@@ -1,93 +1,96 @@
1
- ---
2
- base_model:
3
- - CrucibleLab/L3.3-70B-Loki-V2.0
4
- - meta-llama/Llama-3.1-70B
5
- - schonsense/Tropoplectic
6
- library_name: transformers
7
- tags:
8
- - mergekit
9
- - merge
10
-
11
- ---
12
- # Bragi3
13
-
14
- This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
15
-
16
- ## Merge Details
17
- ### Merge Method
18
-
19
- This model was merged using the NuSLERP merge method using [meta-llama/Llama-3.1-70B](https://huggingface.co/meta-llama/Llama-3.1-70B) as a base.
20
-
21
- ### Models Merged
22
-
23
- The following models were included in the merge:
24
- * [CrucibleLab/L3.3-70B-Loki-V2.0](https://huggingface.co/CrucibleLab/L3.3-70B-Loki-V2.0)
25
- * [schonsense/Tropoplectic](https://huggingface.co/schonsense/Tropoplectic)
26
-
27
- ### Configuration
28
-
29
- The following YAML configuration was used to produce this model:
30
-
31
- ```yaml
32
- models:
33
- - model: CrucibleLab/L3.3-70B-Loki-V2.0
34
- parameters:
35
- weight:
36
- - filter: q_proj
37
- value: [0.80, 0.30, 0.30, 0.30, 0.8]
38
- - filter: k_proj
39
- value: [0.70, 0.20, 0.20, 0.20, 0.7]
40
- - filter: v_proj
41
- value: [0.80, 0.40, 0.40, 0.40, 0.8]
42
- - filter: o_proj
43
- value: [0.90, 0.80, 0.80, 0.80, 0.9]
44
- - filter: gate_proj
45
- value: [0.80, 0.20, 0.20, 0.20, 0.8]
46
- - filter: up_proj
47
- value: [0.80, 0.30, 0.30, 0.30, 0.8]
48
- - filter: down_proj
49
- value: [0.90, 0.80, 0.80, 0.80, 0.9]
50
- - filter: lm_head
51
- value: 0.95
52
- - value: 1
53
-
54
-
55
-
56
- - model: schonsense/Tropoplectic
57
- parameters:
58
- weight:
59
- - filter: q_proj
60
- value: [0.20, 0.70, 0.70, 0.70, 0.2]
61
- - filter: k_proj
62
- value: [0.30, 0.80, 0.80, 0.80, 0.3]
63
- - filter: v_proj
64
- value: [0.20, 0.60, 0.60, 0.60, 0.2]
65
- - filter: o_proj
66
- value: [0.10, 0.25, 0.25, 0.25, 0.1]
67
- - filter: gate_proj
68
- value: [0.20, 0.80, 0.80, 0.80, 0.2]
69
- - filter: up_proj
70
- value: [0.20, 0.70, 0.70, 0.70, 0.2]
71
- - filter: down_proj
72
- value: [0.10, 0.25, 0.25, 0.25, 0.1]
73
- - filter: lm_head
74
- value: 0.05
75
- - value: 0
76
-
77
- base_model: meta-llama/Llama-3.1-70B
78
- merge_method: nuslerp
79
-
80
- parameters:
81
- normalize: false
82
- int8_mask: false
83
- rescale: false
84
-
85
- dtype: float32
86
- out_dtype: bfloat16
87
-
88
- chat_template: llama3
89
- tokenizer:
90
- source: union
91
- pad_to_multiple_of: 8
92
-
93
- ```
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - CrucibleLab/L3.3-70B-Loki-V2.0
4
+ - meta-llama/Llama-3.1-70B
5
+ - schonsense/Tropoplectic
6
+ library_name: transformers
7
+ tags:
8
+ - mergekit
9
+ - merge
10
+
11
+ ---
12
+ # Bragi3
13
+
14
+ Too sloppy for my tastes.
15
+
16
+
17
+ This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
18
+
19
+ ## Merge Details
20
+ ### Merge Method
21
+
22
+ This model was merged using the NuSLERP merge method using [meta-llama/Llama-3.1-70B](https://huggingface.co/meta-llama/Llama-3.1-70B) as a base.
23
+
24
+ ### Models Merged
25
+
26
+ The following models were included in the merge:
27
+ * [CrucibleLab/L3.3-70B-Loki-V2.0](https://huggingface.co/CrucibleLab/L3.3-70B-Loki-V2.0)
28
+ * [schonsense/Tropoplectic](https://huggingface.co/schonsense/Tropoplectic)
29
+
30
+ ### Configuration
31
+
32
+ The following YAML configuration was used to produce this model:
33
+
34
+ ```yaml
35
+ models:
36
+ - model: CrucibleLab/L3.3-70B-Loki-V2.0
37
+ parameters:
38
+ weight:
39
+ - filter: q_proj
40
+ value: [0.80, 0.30, 0.30, 0.30, 0.8]
41
+ - filter: k_proj
42
+ value: [0.70, 0.20, 0.20, 0.20, 0.7]
43
+ - filter: v_proj
44
+ value: [0.80, 0.40, 0.40, 0.40, 0.8]
45
+ - filter: o_proj
46
+ value: [0.90, 0.80, 0.80, 0.80, 0.9]
47
+ - filter: gate_proj
48
+ value: [0.80, 0.20, 0.20, 0.20, 0.8]
49
+ - filter: up_proj
50
+ value: [0.80, 0.30, 0.30, 0.30, 0.8]
51
+ - filter: down_proj
52
+ value: [0.90, 0.80, 0.80, 0.80, 0.9]
53
+ - filter: lm_head
54
+ value: 0.95
55
+ - value: 1
56
+
57
+
58
+
59
+ - model: schonsense/Tropoplectic
60
+ parameters:
61
+ weight:
62
+ - filter: q_proj
63
+ value: [0.20, 0.70, 0.70, 0.70, 0.2]
64
+ - filter: k_proj
65
+ value: [0.30, 0.80, 0.80, 0.80, 0.3]
66
+ - filter: v_proj
67
+ value: [0.20, 0.60, 0.60, 0.60, 0.2]
68
+ - filter: o_proj
69
+ value: [0.10, 0.25, 0.25, 0.25, 0.1]
70
+ - filter: gate_proj
71
+ value: [0.20, 0.80, 0.80, 0.80, 0.2]
72
+ - filter: up_proj
73
+ value: [0.20, 0.70, 0.70, 0.70, 0.2]
74
+ - filter: down_proj
75
+ value: [0.10, 0.25, 0.25, 0.25, 0.1]
76
+ - filter: lm_head
77
+ value: 0.05
78
+ - value: 0
79
+
80
+ base_model: meta-llama/Llama-3.1-70B
81
+ merge_method: nuslerp
82
+
83
+ parameters:
84
+ normalize: false
85
+ int8_mask: false
86
+ rescale: false
87
+
88
+ dtype: float32
89
+ out_dtype: bfloat16
90
+
91
+ chat_template: llama3
92
+ tokenizer:
93
+ source: union
94
+ pad_to_multiple_of: 8
95
+
96
+ ```