Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -18,7 +18,7 @@ tags:
|
|
| 18 |
base_model: Qwen/Qwen3.6-35B-A3B
|
| 19 |
pipeline_tag: text-generation
|
| 20 |
model-index:
|
| 21 |
-
- name: Qwen3.6-35B-A3B-
|
| 22 |
results:
|
| 23 |
- task:
|
| 24 |
type: text-generation
|
|
@@ -34,13 +34,11 @@ model-index:
|
|
| 34 |
value: 0.985
|
| 35 |
---
|
| 36 |
|
| 37 |
-
# Qwen3.6-35B-A3B-
|
| 38 |
|
| 39 |
A fine-tuned version of [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B) using **Rejection Fine-Tuning (RFT) on self-generated data**, inspired by the [Simple Self-Distillation (SSD)](https://arxiv.org/abs/2604.01193) paper. The LoRA adapter has been merged into the base weights -- this is a standard bf16 model ready for direct use or quantization.
|
| 40 |
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
## What We Actually Did (RFT, Not Pure SSD)
|
| 44 |
|
| 45 |
Our method is **inspired by** the SSD paper ("Embarrassingly Simple Self-Distillation Improves Code Generation", arxiv 2604.01193) but differs in a critical way:
|
| 46 |
|
|
@@ -148,12 +146,12 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
| 148 |
import torch
|
| 149 |
|
| 150 |
model = AutoModelForCausalLM.from_pretrained(
|
| 151 |
-
"shaneMattner/Qwen3.6-35B-A3B-
|
| 152 |
torch_dtype=torch.bfloat16,
|
| 153 |
device_map="auto",
|
| 154 |
attn_implementation="eager",
|
| 155 |
)
|
| 156 |
-
tokenizer = AutoTokenizer.from_pretrained("shaneMattner/Qwen3.6-35B-A3B-
|
| 157 |
|
| 158 |
messages = [
|
| 159 |
{"role": "user", "content": "Write a Python function to merge two sorted lists into one sorted list."}
|
|
@@ -173,7 +171,7 @@ pip install mlx-lm
|
|
| 173 |
```python
|
| 174 |
from mlx_lm import load, generate
|
| 175 |
|
| 176 |
-
model, tokenizer = load("shaneMattner/Qwen3.6-35B-A3B-
|
| 177 |
response = generate(
|
| 178 |
model,
|
| 179 |
tokenizer,
|
|
@@ -188,8 +186,8 @@ Or quantize first for faster inference:
|
|
| 188 |
```bash
|
| 189 |
# Convert to 6-bit MLX format
|
| 190 |
python -m mlx_lm.convert \
|
| 191 |
-
--hf-path shaneMattner/Qwen3.6-35B-A3B-
|
| 192 |
-
--mlx-path Qwen3.6-35B-A3B-
|
| 193 |
-q --q-bits 6
|
| 194 |
```
|
| 195 |
|
|
@@ -201,10 +199,10 @@ Convert to GGUF for use with llama.cpp, Ollama, or other GGUF-compatible tools:
|
|
| 201 |
|
| 202 |
```bash
|
| 203 |
# Clone llama.cpp and convert
|
| 204 |
-
python convert_hf_to_gguf.py shaneMattner/Qwen3.6-35B-A3B-
|
| 205 |
|
| 206 |
# Quantize to desired format
|
| 207 |
-
./llama-quantize Qwen3.6-35B-A3B-
|
| 208 |
```
|
| 209 |
|
| 210 |
## Limitations
|
|
@@ -229,10 +227,10 @@ If you use this model, please cite:
|
|
| 229 |
|
| 230 |
```bibtex
|
| 231 |
@misc{mattner2026qwen36rft,
|
| 232 |
-
title={Qwen3.6-35B-A3B-
|
| 233 |
author={Shane Mattner},
|
| 234 |
year={2026},
|
| 235 |
-
url={https://huggingface.co/shaneMattner/Qwen3.6-35B-A3B-
|
| 236 |
}
|
| 237 |
```
|
| 238 |
|
|
|
|
| 18 |
base_model: Qwen/Qwen3.6-35B-A3B
|
| 19 |
pipeline_tag: text-generation
|
| 20 |
model-index:
|
| 21 |
+
- name: Qwen3.6-35B-A3B-RFT
|
| 22 |
results:
|
| 23 |
- task:
|
| 24 |
type: text-generation
|
|
|
|
| 34 |
value: 0.985
|
| 35 |
---
|
| 36 |
|
| 37 |
+
# Qwen3.6-35B-A3B-RFT
|
| 38 |
|
| 39 |
A fine-tuned version of [Qwen/Qwen3.6-35B-A3B](https://huggingface.co/Qwen/Qwen3.6-35B-A3B) using **Rejection Fine-Tuning (RFT) on self-generated data**, inspired by the [Simple Self-Distillation (SSD)](https://arxiv.org/abs/2604.01193) paper. The LoRA adapter has been merged into the base weights -- this is a standard bf16 model ready for direct use or quantization.
|
| 40 |
|
| 41 |
+
## Method (RFT, Not Pure SSD)
|
|
|
|
|
|
|
| 42 |
|
| 43 |
Our method is **inspired by** the SSD paper ("Embarrassingly Simple Self-Distillation Improves Code Generation", arxiv 2604.01193) but differs in a critical way:
|
| 44 |
|
|
|
|
| 146 |
import torch
|
| 147 |
|
| 148 |
model = AutoModelForCausalLM.from_pretrained(
|
| 149 |
+
"shaneMattner/Qwen3.6-35B-A3B-RFT",
|
| 150 |
torch_dtype=torch.bfloat16,
|
| 151 |
device_map="auto",
|
| 152 |
attn_implementation="eager",
|
| 153 |
)
|
| 154 |
+
tokenizer = AutoTokenizer.from_pretrained("shaneMattner/Qwen3.6-35B-A3B-RFT")
|
| 155 |
|
| 156 |
messages = [
|
| 157 |
{"role": "user", "content": "Write a Python function to merge two sorted lists into one sorted list."}
|
|
|
|
| 171 |
```python
|
| 172 |
from mlx_lm import load, generate
|
| 173 |
|
| 174 |
+
model, tokenizer = load("shaneMattner/Qwen3.6-35B-A3B-RFT")
|
| 175 |
response = generate(
|
| 176 |
model,
|
| 177 |
tokenizer,
|
|
|
|
| 186 |
```bash
|
| 187 |
# Convert to 6-bit MLX format
|
| 188 |
python -m mlx_lm.convert \
|
| 189 |
+
--hf-path shaneMattner/Qwen3.6-35B-A3B-RFT \
|
| 190 |
+
--mlx-path Qwen3.6-35B-A3B-RFT-6bit \
|
| 191 |
-q --q-bits 6
|
| 192 |
```
|
| 193 |
|
|
|
|
| 199 |
|
| 200 |
```bash
|
| 201 |
# Clone llama.cpp and convert
|
| 202 |
+
python convert_hf_to_gguf.py shaneMattner/Qwen3.6-35B-A3B-RFT --outtype bf16
|
| 203 |
|
| 204 |
# Quantize to desired format
|
| 205 |
+
./llama-quantize Qwen3.6-35B-A3B-RFT-bf16.gguf Qwen3.6-35B-A3B-RFT-Q4_K_M.gguf Q4_K_M
|
| 206 |
```
|
| 207 |
|
| 208 |
## Limitations
|
|
|
|
| 227 |
|
| 228 |
```bibtex
|
| 229 |
@misc{mattner2026qwen36rft,
|
| 230 |
+
title={Qwen3.6-35B-A3B-RFT: Rejection Fine-Tuned Qwen3.6 for Coding},
|
| 231 |
author={Shane Mattner},
|
| 232 |
year={2026},
|
| 233 |
+
url={https://huggingface.co/shaneMattner/Qwen3.6-35B-A3B-RFT}
|
| 234 |
}
|
| 235 |
```
|
| 236 |
|