shaneMattner commited on
Commit
11f9b93
·
verified ·
1 Parent(s): 80e0957

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +12 -14
README.md CHANGED
@@ -18,7 +18,7 @@ tags:
18
  base_model: Qwen/Qwen3.6-35B-A3B
19
  pipeline_tag: text-generation
20
  model-index:
21
- - name: Qwen3.6-35B-A3B-SSD
22
  results:
23
  - task:
24
  type: text-generation
@@ -34,13 +34,11 @@ model-index:
34
  value: 0.985
35
  ---
36
 
37
- # Qwen3.6-35B-A3B-SSD
38
 
39
  A fine-tuned version of [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B) using **Rejection Fine-Tuning (RFT) on self-generated data**, inspired by the [Simple Self-Distillation (SSD)](https://arxiv.org/abs/2604.01193) paper. The LoRA adapter has been merged into the base weights -- this is a standard bf16 model ready for direct use or quantization.
40
 
41
- > **Note on the repo name**: The repo is named "SSD" because the project started as an SSD replication, but our method deviates from pure SSD in a key way (see below). We kept the name for continuity.
42
-
43
- ## What We Actually Did (RFT, Not Pure SSD)
44
 
45
  Our method is **inspired by** the SSD paper ("Embarrassingly Simple Self-Distillation Improves Code Generation", arxiv 2604.01193) but differs in a critical way:
46
 
@@ -148,12 +146,12 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
148
  import torch
149
 
150
  model = AutoModelForCausalLM.from_pretrained(
151
- "shaneMattner/Qwen3.6-35B-A3B-SSD",
152
  torch_dtype=torch.bfloat16,
153
  device_map="auto",
154
  attn_implementation="eager",
155
  )
156
- tokenizer = AutoTokenizer.from_pretrained("shaneMattner/Qwen3.6-35B-A3B-SSD")
157
 
158
  messages = [
159
  {"role": "user", "content": "Write a Python function to merge two sorted lists into one sorted list."}
@@ -173,7 +171,7 @@ pip install mlx-lm
173
  ```python
174
  from mlx_lm import load, generate
175
 
176
- model, tokenizer = load("shaneMattner/Qwen3.6-35B-A3B-SSD")
177
  response = generate(
178
  model,
179
  tokenizer,
@@ -188,8 +186,8 @@ Or quantize first for faster inference:
188
  ```bash
189
  # Convert to 6-bit MLX format
190
  python -m mlx_lm.convert \
191
- --hf-path shaneMattner/Qwen3.6-35B-A3B-SSD \
192
- --mlx-path Qwen3.6-35B-A3B-SSD-6bit \
193
  -q --q-bits 6
194
  ```
195
 
@@ -201,10 +199,10 @@ Convert to GGUF for use with llama.cpp, Ollama, or other GGUF-compatible tools:
201
 
202
  ```bash
203
  # Clone llama.cpp and convert
204
- python convert_hf_to_gguf.py shaneMattner/Qwen3.6-35B-A3B-SSD --outtype bf16
205
 
206
  # Quantize to desired format
207
- ./llama-quantize Qwen3.6-35B-A3B-SSD-bf16.gguf Qwen3.6-35B-A3B-SSD-Q4_K_M.gguf Q4_K_M
208
  ```
209
 
210
  ## Limitations
@@ -229,10 +227,10 @@ If you use this model, please cite:
229
 
230
  ```bibtex
231
  @misc{mattner2026qwen36rft,
232
- title={Qwen3.6-35B-A3B-SSD: Rejection Fine-Tuned Qwen3.6 for Coding},
233
  author={Shane Mattner},
234
  year={2026},
235
- url={https://huggingface.co/shaneMattner/Qwen3.6-35B-A3B-SSD}
236
  }
237
  ```
238
 
 
18
  base_model: Qwen/Qwen3.6-35B-A3B
19
  pipeline_tag: text-generation
20
  model-index:
21
+ - name: Qwen3.6-35B-A3B-RFT
22
  results:
23
  - task:
24
  type: text-generation
 
34
  value: 0.985
35
  ---
36
 
37
+ # Qwen3.6-35B-A3B-RFT
38
 
39
  A fine-tuned version of [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B) using **Rejection Fine-Tuning (RFT) on self-generated data**, inspired by the [Simple Self-Distillation (SSD)](https://arxiv.org/abs/2604.01193) paper. The LoRA adapter has been merged into the base weights -- this is a standard bf16 model ready for direct use or quantization.
40
 
41
+ ## Method (RFT, Not Pure SSD)
 
 
42
 
43
  Our method is **inspired by** the SSD paper ("Embarrassingly Simple Self-Distillation Improves Code Generation", arxiv 2604.01193) but differs in a critical way:
44
 
 
146
  import torch
147
 
148
  model = AutoModelForCausalLM.from_pretrained(
149
+ "shaneMattner/Qwen3.6-35B-A3B-RFT",
150
  torch_dtype=torch.bfloat16,
151
  device_map="auto",
152
  attn_implementation="eager",
153
  )
154
+ tokenizer = AutoTokenizer.from_pretrained("shaneMattner/Qwen3.6-35B-A3B-RFT")
155
 
156
  messages = [
157
  {"role": "user", "content": "Write a Python function to merge two sorted lists into one sorted list."}
 
171
  ```python
172
  from mlx_lm import load, generate
173
 
174
+ model, tokenizer = load("shaneMattner/Qwen3.6-35B-A3B-RFT")
175
  response = generate(
176
  model,
177
  tokenizer,
 
186
  ```bash
187
  # Convert to 6-bit MLX format
188
  python -m mlx_lm.convert \
189
+ --hf-path shaneMattner/Qwen3.6-35B-A3B-RFT \
190
+ --mlx-path Qwen3.6-35B-A3B-RFT-6bit \
191
  -q --q-bits 6
192
  ```
193
 
 
199
 
200
  ```bash
201
  # Clone llama.cpp and convert
202
+ python convert_hf_to_gguf.py shaneMattner/Qwen3.6-35B-A3B-RFT --outtype bf16
203
 
204
  # Quantize to desired format
205
+ ./llama-quantize Qwen3.6-35B-A3B-RFT-bf16.gguf Qwen3.6-35B-A3B-RFT-Q4_K_M.gguf Q4_K_M
206
  ```
207
 
208
  ## Limitations
 
227
 
228
  ```bibtex
229
  @misc{mattner2026qwen36rft,
230
+ title={Qwen3.6-35B-A3B-RFT: Rejection Fine-Tuned Qwen3.6 for Coding},
231
  author={Shane Mattner},
232
  year={2026},
233
+ url={https://huggingface.co/shaneMattner/Qwen3.6-35B-A3B-RFT}
234
  }
235
  ```
236