darkc0de commited on
Commit
ce69da1
·
verified ·
1 Parent(s): b994c52

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +281 -197
README.md CHANGED
@@ -1,199 +1,283 @@
1
  ---
2
- library_name: transformers
3
- tags: []
 
 
 
 
 
 
 
 
 
 
4
  ---
5
-
6
- # Model Card for Model ID
7
-
8
- <!-- Provide a quick summary of what the model is/does. -->
9
-
10
-
11
-
12
- ## Model Details
13
-
14
- ### Model Description
15
-
16
- <!-- Provide a longer summary of what this model is. -->
17
-
18
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
-
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
-
28
- ### Model Sources [optional]
29
-
30
- <!-- Provide the basic links for the model. -->
31
-
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
-
36
- ## Uses
37
-
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
-
40
- ### Direct Use
41
-
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
-
44
- [More Information Needed]
45
-
46
- ### Downstream Use [optional]
47
-
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
-
50
- [More Information Needed]
51
-
52
- ### Out-of-Scope Use
53
-
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
-
56
- [More Information Needed]
57
-
58
- ## Bias, Risks, and Limitations
59
-
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
-
62
- [More Information Needed]
63
-
64
- ### Recommendations
65
-
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
-
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
-
70
- ## How to Get Started with the Model
71
-
72
- Use the code below to get started with the model.
73
-
74
- [More Information Needed]
75
-
76
- ## Training Details
77
-
78
- ### Training Data
79
-
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
-
82
- [More Information Needed]
83
-
84
- ### Training Procedure
85
-
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
-
88
- #### Preprocessing [optional]
89
-
90
- [More Information Needed]
91
-
92
-
93
- #### Training Hyperparameters
94
-
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
-
97
- #### Speeds, Sizes, Times [optional]
98
-
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
-
101
- [More Information Needed]
102
-
103
- ## Evaluation
104
-
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
-
107
- ### Testing Data, Factors & Metrics
108
-
109
- #### Testing Data
110
-
111
- <!-- This should link to a Dataset Card if possible. -->
112
-
113
- [More Information Needed]
114
-
115
- #### Factors
116
-
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
-
119
- [More Information Needed]
120
-
121
- #### Metrics
122
-
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
-
125
- [More Information Needed]
126
-
127
- ### Results
128
-
129
- [More Information Needed]
130
-
131
- #### Summary
132
-
133
-
134
-
135
- ## Model Examination [optional]
136
-
137
- <!-- Relevant interpretability work for the model goes here -->
138
-
139
- [More Information Needed]
140
-
141
- ## Environmental Impact
142
-
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
-
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
-
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
-
153
- ## Technical Specifications [optional]
154
-
155
- ### Model Architecture and Objective
156
-
157
- [More Information Needed]
158
-
159
- ### Compute Infrastructure
160
-
161
- [More Information Needed]
162
-
163
- #### Hardware
164
-
165
- [More Information Needed]
166
-
167
- #### Software
168
-
169
- [More Information Needed]
170
-
171
- ## Citation [optional]
172
-
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
-
175
- **BibTeX:**
176
-
177
- [More Information Needed]
178
-
179
- **APA:**
180
-
181
- [More Information Needed]
182
-
183
- ## Glossary [optional]
184
-
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
-
187
- [More Information Needed]
188
-
189
- ## More Information [optional]
190
-
191
- [More Information Needed]
192
-
193
- ## Model Card Authors [optional]
194
-
195
- [More Information Needed]
196
-
197
- ## Model Card Contact
198
-
199
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: apache-2.0
3
+ base_model: mistralai/Mistral-Medium-3.5-128B
4
+ tags:
5
+ - mistral
6
+ - mistral-3.5
7
+ - text-only
8
+ - bf16
9
+ - 128b
10
+ - heretic
11
+ - uncensored
12
+ - decensored
13
+ - abliterated
14
  ---
15
+ # This is a decensored version of [Darkhn/Mistral-Medium-3.5-128B-BF16-Text-Only](https://huggingface.co/Darkhn/Mistral-Medium-3.5-128B-BF16-Text-Only), made using [Heretic](https://github.com/p-e-w/heretic) v1.2.0
16
+
17
+ ## Abliteration parameters
18
+
19
+ | Parameter | Value |
20
+ | :-------- | :---: |
21
+ | **direction_index** | 43.15 |
22
+ | **attn.o_proj.max_weight** | 1.48 |
23
+ | **attn.o_proj.max_weight_position** | 59.65 |
24
+ | **attn.o_proj.min_weight** | 1.44 |
25
+ | **attn.o_proj.min_weight_distance** | 48.02 |
26
+ | **mlp.down_proj.max_weight** | 1.21 |
27
+ | **mlp.down_proj.max_weight_position** | 54.75 |
28
+ | **mlp.down_proj.min_weight** | 0.30 |
29
+ | **mlp.down_proj.min_weight_distance** | 50.44 |
30
+
31
+ ## Performance
32
+
33
+ | Metric | This model | Original model ([Darkhn/Mistral-Medium-3.5-128B-BF16-Text-Only](https://huggingface.co/Darkhn/Mistral-Medium-3.5-128B-BF16-Text-Only)) |
34
+ | :----- | :--------: | :---------------------------: |
35
+ | **KL divergence** | 0.0220 | 0 *(by definition)* |
36
+ | **Refusals** | 9/100 | 98/100 |
37
+
38
+ -----
39
+
40
+
41
+ <style>
42
+ body {
43
+ font-family: 'Quicksand', sans-serif;
44
+ background: linear-gradient(135deg, #4a1e00 0%, #1c0a00 100%);
45
+ color: #F5EFE6;
46
+ margin: 0;
47
+ padding: 0;
48
+ font-size: 16px;
49
+ }
50
+
51
+ h1, h2, h3, h4, summary {
52
+ font-family: 'Cinzel', serif;
53
+ }
54
+
55
+ .container {
56
+ margin: 20px auto;
57
+ max-width: 900px;
58
+ background-color: rgba(28, 22, 18, 0.95);
59
+ padding: 30px;
60
+ border-radius: 12px;
61
+ box-shadow: 0 4px 20px rgba(255, 140, 0, 0.15);
62
+ border: 1px solid rgba(255, 140, 0, 0.2);
63
+ outline: 1px solid rgba(255, 140, 0, 0.5);
64
+ outline-offset: -1px;
65
+ position: relative;
66
+ }
67
+
68
+ .container::before {
69
+ content: '';
70
+ position: absolute;
71
+ top: -1px;
72
+ left: -1px;
73
+ right: -1px;
74
+ bottom: -1px;
75
+ border: 1px solid rgba(255, 165, 0, 0.98);
76
+ border-radius: 12px;
77
+ pointer-events: none;
78
+ animation: borderGlow 2.5s ease-in-out infinite;
79
+ }
80
+
81
+ @keyframes borderGlow {
82
+ 0% { box-shadow: 0 0 5px rgba(255, 165, 0, 0.98); }
83
+ 50% { box-shadow: 0 0 12px rgba(255, 165, 0, 0.98); }
84
+ 100% { box-shadow: 0 0 5px rgba(255, 165, 0, 0.98); }
85
+ }
86
+
87
+ .header h1 {
88
+ font-size: 32px;
89
+ color: #FFA500;
90
+ margin: 0 0 20px 0;
91
+ text-align: center;
92
+ text-shadow: 0 0 12px rgba(255, 100, 0, 0.6);
93
+ }
94
+
95
+ a {
96
+ color: #FFD700;
97
+ text-decoration: none;
98
+ transition: color 0.3s ease;
99
+ }
100
+
101
+ .button {
102
+ display: inline-block;
103
+ background-color: #E55B00;
104
+ color: #FFFFFF;
105
+ padding: 12px 24px;
106
+ border-radius: 5px;
107
+ cursor: pointer;
108
+ text-decoration: none;
109
+ font-family: 'Cinzel', serif;
110
+ font-weight: 600;
111
+ transition: all 0.3s ease;
112
+ border: 1px solid transparent;
113
+ }
114
+
115
+ .button:hover {
116
+ background-color: #FF8C00;
117
+ box-shadow: 0 0 15px rgba(255, 140, 0, 0.5);
118
+ transform: translateY(-2px);
119
+ }
120
+
121
+ .info-card {
122
+ background: rgba(45, 35, 25, 0.95);
123
+ border: 1px solid rgba(255, 140, 0, 0.2);
124
+ border-radius: 8px;
125
+ overflow: hidden;
126
+ margin-bottom: 25px;
127
+ }
128
+
129
+ .info-header {
130
+ background: rgba(255, 140, 0, 0.1);
131
+ padding: 20px;
132
+ border-bottom: 1px solid rgba(255, 140, 0, 0.2);
133
+ }
134
+
135
+ .info-header h3 {
136
+ color: #FFA500;
137
+ margin: 0 0 10px 0;
138
+ font-size: 22px;
139
+ }
140
+
141
+ .card-content {
142
+ padding: 20px;
143
+ line-height: 1.7;
144
+ }
145
+
146
+ .card-content ul {
147
+ list-style: none;
148
+ padding-left: 20px;
149
+ }
150
+
151
+ .card-content li::before {
152
+ content: '✦';
153
+ color: #FFD700;
154
+ font-weight: bold;
155
+ display: inline-block;
156
+ width: 1em;
157
+ margin-left: -1.2em;
158
+ }
159
+
160
+ .card-content strong {
161
+ color: #FFD700;
162
+ }
163
+
164
+ /* Update to the note card */
165
+ .note-card {
166
+ border: 1px solid #FFA500;
167
+ box-shadow: 0 0 10px rgba(255, 165, 0, 0.1);
168
+ }
169
+
170
+ .note-header {
171
+ background: rgba(255, 165, 0, 0.1);
172
+ }
173
+
174
+ .note-header h3 {
175
+ color: #FFA500;
176
+ text-align: center;
177
+ }
178
+
179
+ .support-section {
180
+ text-align: center;
181
+ margin-top: 40px;
182
+ background: rgba(45, 35, 25, 0.95);
183
+ border: 1px solid rgba(255, 140, 0, 0.2);
184
+ border-radius: 8px;
185
+ padding: 20px;
186
+ }
187
+
188
+ summary {
189
+ cursor: pointer;
190
+ list-style: none;
191
+ outline: none;
192
+ display: flex;
193
+ align-items: center;
194
+ }
195
+
196
+ summary::before {
197
+ content: '▶';
198
+ font-size: 1.2em;
199
+ color: #FFA500;
200
+ margin-right: 15px;
201
+ transition: transform 0.2s ease;
202
+ }
203
+
204
+ details[open] > summary::before {
205
+ transform: rotate(90deg);
206
+ }
207
+
208
+ h2 {
209
+ color: #FFA500;
210
+ border-bottom: 1px solid rgba(255, 140, 0, 0.2);
211
+ padding-bottom: 10px;
212
+ margin-bottom: 20px;
213
+ }
214
+ </style>
215
+
216
+ <div class="container">
217
+ <link href="https://fonts.googleapis.com/css2?family=Cinzel:wght@400;500;600&family=Quicksand:wght@400;500&display=swap" rel="stylesheet">
218
+
219
+ <div class="header">
220
+ <h1>Mistral-Medium-3.5-128B-BF16-Text-Only</h1>
221
+ </div>
222
+
223
+ <div class="info">
224
+
225
+ <div class="info-card note-card">
226
+ <div class="info-header note-header">
227
+ <h3>📜 Technical Architecture Note</h3>
228
+ </div>
229
+ <div class="card-content">
230
+ <p>This model has been converted from <strong>Mistral3ForConditionalGeneration</strong> (Multimodal) to <strong>MistralForCausalLM</strong> (Standard Text-Only). This change ensures maximum compatibility with standard fine-tuning libraries like <em>Axolotl, Unsloth, and Hugging Face Transformers</em> without requiring custom vision-encoder handling.</p>
231
+ </div>
232
+ </div>
233
+
234
+ <div class="support-section">
235
+ <p><strong>Help me feed the data beast! Taking commissions for universe-specific models.</strong></p>
236
+ <a href="https://ko-fi.com/som1tokmynam" target="_blank" class="button">
237
+ Support on Ko-fi
238
+ </a>
239
+ </div>
240
+
241
+ <div class="section-container">
242
+ <details open>
243
+ <summary><h2>Model Description</h2></summary>
244
+ <div class="info-card">
245
+ <div class="card-content">
246
+ <p>This is a processed version of <strong>Mistral-Medium-3.5-128B</strong> designed for users who prioritize text-only performance and ease of fine-tuning.</p>
247
+ <p><strong>Modification Details:</strong></p>
248
+ <ul>
249
+ <li><strong>Precision Upscale:</strong> Converted from <strong>FP8</strong> weights to <strong>BF16</strong> to restore full 16-bit brain-float precision for stable gradient updates during training.</li>
250
+ <li><strong>Vision Layer Stripping:</strong> All vision encoders and multimodal projection layers have been removed, significantly reducing memory overhead during inference and training for text-only tasks.</li>
251
+ <li><strong>Architecture Re-mapping:</strong> The configuration has been modified to use <code>MistralForCausalLM</code>, allowing it to be treated as a standard dense language model.</li>
252
+ </ul>
253
+ </div>
254
+ </div>
255
+ </details>
256
+ </div>
257
+
258
+ <div class="section-container">
259
+ <details>
260
+ <summary><h2>Purpose & Usage</h2></summary>
261
+ <div class="info-card">
262
+ <div class="card-content">
263
+ <p>This model is intended to serve as a <strong>clean base for fine-tuning</strong>. By removing the vision components, you can allocate more VRAM to sequence length or batch size. It is 100% functional for text-only chat and reasoning out of the box.</p>
264
+ </div>
265
+ </div>
266
+ </details>
267
+ </div>
268
+
269
+ <div class="section-container">
270
+ <details>
271
+ <summary><h2>Acknowledgements</h2></summary>
272
+ <div class="info-card">
273
+ <div class="card-content">
274
+ <ul>
275
+ <li>Credit to <strong>Mistral AI</strong> for the original Mistral-Medium-3.5-128B architecture.</li>
276
+ </ul>
277
+ </div>
278
+ </div>
279
+ </details>
280
+ </div>
281
+
282
+ </div>
283
+ </div>