Qwen3.5-10B-Frankenmerge-Opus-4.6-Distill

Category Base (Qwen3.5-9B-Base-Q8_0) Qwen-3.5-10B-Frankenmerge-Opus-4.6-Distill Δ
Factual Knowledge 85.0% B 85.0% B =
Reasoning 88.0% B 60.0% C ↓ −28.0%
Coding 56.0% D 80.0% B ↑ +24.0%
Instruction Following 100.0% A 30.0% F ↓ −70.0%
Language 100.0% A 70.0% C ↓ −30.0%
Safety Calibration 66.7% C 66.7% C =
Overall 82.4% B 65.6% C ↓ −16.8%

Method: Layer surgery on Qwen3.5-9B-Base-Q8_0 followed by fine-tuning.
Benchmarks run at temperature=0, seed=42
Coding capability improved significantly (+24%) at the cost of instruction-following and language tasks


This model was GGUF format using Unsloth.

Example usage:

  • For text only LLMs: llama-cli -hf JackBinary/Qwen-3.5-10B-Frankenmerge-Opus-4.6-Distill-GGUF --jinja
  • For multimodal models: llama-mtmd-cli -hf JackBinary/Qwen-3.5-10B-Frankenmerge-Opus-4.6-Distill-GGUF --jinja

Available Model files:

  • Qwen-3.5-10B-Frankenmerge-Opus-4.6-Distill.Q6_K.gguf
  • Qwen-3.5-10B-Frankenmerge-Opus-4.6-Distill.Q8_0.gguf
  • Qwen-3.5-10B-Frankenmerge-Opus-4.6-Distill.Q4_K_M.gguf This was trained 2x faster with Unsloth

A DIY frankenmerge of Qwen3.5-9B with duplicated reasoning layers, then fine-tuned on high-quality reasoning data. 36 layers instead of 32. ~10B parameters. Text-only, thinking mode supported.

What this is

I took llmfan46/Qwen3.5-9B-ultra-heretic (an abliterated Qwen3.5-9B), duplicated layers 24-27 to give it an extra reasoning block, then trained it sequentially on two datasets to make the new layers earn their keep.

The original 9B has 32 layers arranged as 8 blocks of DeltaNet × 3 + Attention × 1. After surgery, it has 36 layers: 9 complete blocks. The duplicated block starts as an exact copy but diverges during training, giving the model more depth for complex reasoning without changing anything about the input/output behavior.

After the merge, two rounds of SFT with high-rank LoRA (r=128, alpha=256):

  1. Stage 1: Jackrong/Qwen3.5-reasoning-700x (633 examples) at LR 2e-4. Reasoning distillation from Qwen3.5-27B. Gets the frankenmerge coherent and stabilizes the duplicated layers.
  2. Stage 2: nohurry/Opus-4.6-Reasoning-3000x-filtered (~3000 examples) at LR 5e-5. Claude Opus 4.6 reasoning traces. Strengthens the model's actual problem-solving ability.

Why frankenmerge + train?

David Noel Ng's RYS work showed you can top the Open LLM Leaderboard by duplicating middle "reasoning" layers of a model without changing a single weight. The idea: early layers handle input encoding, late layers handle output decoding, and the middle layers do the actual thinking. Give the model more layers to think with, it thinks better.

RockTalk/Qwen3.5-9B-Franken-L24-27 applied this to Qwen3.5-9B and showed improvements without any post-training. A reddit post on layer surgery explored similar ideas.

Then I saw Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled, which showed that distilling structured reasoning from Claude Opus into Qwen3.5 massively reduces the overthinking/looping problem and makes the model more coherent and autonomous.

So the logic was: frankenmerge for extra capacity, then train the new capacity on high-quality reasoning data. Layer surgery gives you the architecture; SFT teaches the duplicated layers what to do with themselves.

The surgery, specifically

Qwen3.5-9B's 32 layers follow a repeating pattern:

Block 0: layers  0- 3  (DeltaNet, DeltaNet, DeltaNet, Attention)
Block 1: layers  4- 7  (DeltaNet, DeltaNet, DeltaNet, Attention)
...
Block 6: layers 24-27  (DeltaNet, DeltaNet, DeltaNet, Attention)  ← duplicated
Block 7: layers 28-31  (DeltaNet, DeltaNet, DeltaNet, Attention)

After surgery:

Blocks 0-6: layers  0-27  (original, unchanged)
Block 6':  layers 28-31  (deep copy of layers 24-27)
Block 7:   layers 32-35  (original layers 28-31, shifted)

The copy is done with copy.deepcopy in PyTorch from clean bf16 weights. No quantization artifacts, no weight key remapping hacks.

Training details

Stage 1 Stage 2
Dataset Qwen3.5-reasoning-700x Opus-4.6-Reasoning-3000x-filtered
Examples 633 2326
Learning rate 2e-4 5e-5
Schedule Cosine Cosine
Epochs 1 1
Effective batch 8 8
LoRA rank 128 128
LoRA alpha 256 256
RSLoRA Yes Yes
Precision bf16 bf16

Trained on a single G4 using Unsloth. Response-only masking (instruction tokens masked with -100). Sequential training: Stage 1 completes fully before Stage 2 begins. The LoRA adapters accumulate both stages.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "YOUR_USERNAME/Qwen3.5-9B-Franken-L24-27-Reasoning",
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(
    "YOUR_USERNAME/Qwen3.5-9B-Franken-L24-27-Reasoning",
    trust_remote_code=True,
)

messages = [{"role": "user", "content": "Prove that the square root of 2 is irrational."}]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=2048, temperature=0.7, top_p=0.8, top_k=20)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Acknowledgments

This model wouldn't exist without the work of:

  • David Noel Ng (dnhkng) for the RYS research proving layer duplication works, and for writing such a clear explanation of the "LLM neuroanatomy" concept
  • RockTalk for demonstrating the frankenmerge on Qwen3.5-9B specifically (even though the weights turned out to be 4-bit under the hood, the idea was sound)
  • Jackrong for both the Opus-distilled model showing how well reasoning distillation works on Qwen3.5, and for the Qwen3.5-reasoning-700x dataset
  • nohurry for the filtered Opus 4.6 reasoning dataset
  • llmfan46 for the ultra-heretic abliteration, which gave me a clean, uncensored base to build on
  • r/LocalLLaMA for the collective insanity that makes all of this happen
  • The Qwen team at Alibaba for the base Qwen3.5 architecture
  • Unsloth for making training on a single GPU actually feasible

License

Apache 2.0, same as the base Qwen3.5 model.

Downloads last month
425
GGUF
Model size
10B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for JackBinary/Qwen-3.5-10B-Frankenmerge-Opus-4.6-Distill-GGUF

Datasets used to train JackBinary/Qwen-3.5-10B-Frankenmerge-Opus-4.6-Distill-GGUF

Collection including JackBinary/Qwen-3.5-10B-Frankenmerge-Opus-4.6-Distill-GGUF