Qwen3.5-35B-A3B Chimere Distilled -- LoRA Adapter

LoRA adapter from the Chimere distillation (Claude Opus 4.6 into Qwen3.5-35B-A3B).

This is not a standalone model. It requires the base model unsloth/Qwen3.5-35B-A3B to use.

Why is this adapter 44.7 GB?

This LoRA targets all 256 MoE experts in addition to the standard attention projections. In a typical dense model, an r=64 LoRA is 1-5 GB. But Qwen3.5-35B-A3B has 256 experts per MoE layer, each with gate/up/down projections -- so the adapter includes LoRA weights for every expert, resulting in the large size. This is intentional: distilling Opus reasoning into a MoE architecture requires adapting the expert routing.

When to use this repo

Goal Use this? Alternative
Continue fine-tuning from Chimere checkpoint Yes Or use BF16 merged weights
Merge into BF16 for custom quantization Yes Or download pre-merged BF16
A-LoRA routing (swap adapters at runtime) Yes --
Run inference directly No Use v1 GGUF or v3 GGUF

Usage

Load with PEFT

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model_id = "unsloth/Qwen3.5-35B-A3B"
adapter_id = "Kevletesteur/Qwen3.5-35B-A3B-Chimere-Distilled-LoRA"

# Load base model
tokenizer = AutoTokenizer.from_pretrained(adapter_id)  # Tokenizer included in adapter repo
model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype="bfloat16",
    device_map="auto",
)

# Apply LoRA adapter
model = PeftModel.from_pretrained(model, adapter_id)

messages = [{"role": "user", "content": "Write a Python function to parse JSON."}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
inputs = inputs.to(model.device)

outputs = model.generate(inputs, max_new_tokens=512, temperature=0.7, top_p=0.8, top_k=20)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))

Merge LoRA into base model

from transformers import AutoModelForCausalLM
from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained(
    "unsloth/Qwen3.5-35B-A3B",
    torch_dtype="bfloat16",
    device_map="cpu",  # Merge on CPU to avoid VRAM issues
)

model = PeftModel.from_pretrained(base_model, "Kevletesteur/Qwen3.5-35B-A3B-Chimere-Distilled-LoRA")
merged = model.merge_and_unload()
merged.save_pretrained("./chimere-merged-bf16")

Continue fine-tuning with Unsloth

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="Kevletesteur/Qwen3.5-35B-A3B-Chimere-Distilled-LoRA",
    max_seq_length=32768,
    dtype="bfloat16",
    load_in_4bit=True,
)

# The LoRA is already applied -- continue training with your dataset
# ...

LoRA Configuration

Parameter Value
Base model unsloth/Qwen3.5-35B-A3B
PEFT type LoRA
Rank (r) 64
Alpha 64 (alpha/r ratio = 1.0)
Dropout 0
Bias none
Task CAUSAL_LM
PEFT version 0.18.1

Target modules

Attention projections: q_proj, k_proj, v_proj, o_proj

MoE expert projections (all 256 experts): gate_proj, up_proj, down_proj

MoE fused expert tensors: mlp.experts.gate_up_proj, mlp.experts.down_proj

This broad targeting (attention + all MoE experts) is why the adapter is 44.7 GB -- it adapts the routing behavior of every expert.

Files

File Description
adapter_model.safetensors LoRA adapter weights (~44.7 GB)
adapter_config.json LoRA configuration (rank, alpha, target modules)
tokenizer.json, tokenizer_config.json Tokenizer (same as base model)
chat_template.jinja Chat template (Qwen3.5 format)
processor_config.json Processor config

Training Details

Parameter Value
Method SFT BF16 LoRA r64, completion-only loss
Dataset 9,763 samples (37% BFCL v3 ground truth + 59% Opus reasoning traces + 4% gold)
Distillation source Claude Opus 4.6 reasoning traces
Epochs 1 (611 steps, batch 16)
GPU NVIDIA B200
Cost ~$5
Framework Unsloth + PEFT 0.18.1

Related

Citation

@misc{chimere-distilled-2026,
  title={Chimere: Claude Opus 4.6 Distillation of Qwen3.5-35B-A3B MoE for Agentic Local Inference},
  author={Kevletesteur},
  year={2026},
  url={https://huggingface.co/Kevletesteur/Qwen3.5-35B-A3B-Chimere-Distilled-LoRA}
}
Downloads last month
41
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Kevletesteur/Qwen3.5-35B-A3B-Chimere-Distilled-LoRA

Adapter
(8)
this model