Qwen3.5-35B-A3B Chimere Distilled -- LoRA Adapter

LoRA adapter from the Chimere distillation (Claude Opus 4.6 into Qwen3.5-35B-A3B).

This is not a standalone model. It requires the base model unsloth/Qwen3.5-35B-A3B to use.

Why is this adapter 44.7 GB?

This LoRA targets all 256 MoE experts in addition to the standard attention projections. In a typical dense model, an r=64 LoRA is 1-5 GB. But Qwen3.5-35B-A3B has 256 experts per MoE layer, each with gate/up/down projections -- so the adapter includes LoRA weights for every expert, resulting in the large size. This is intentional: distilling Opus reasoning into a MoE architecture requires adapting the expert routing.

When to use this repo

Goal	Use this?	Alternative
Continue fine-tuning from Chimere checkpoint	Yes	Or use BF16 merged weights
Merge into BF16 for custom quantization	Yes	Or download pre-merged BF16
A-LoRA routing (swap adapters at runtime)	Yes	--
Run inference directly	No	Use v1 GGUF or v3 GGUF

Usage

Load with PEFT

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model_id = "unsloth/Qwen3.5-35B-A3B"
adapter_id = "Kevletesteur/Qwen3.5-35B-A3B-Chimere-Distilled-LoRA"

# Load base model
tokenizer = AutoTokenizer.from_pretrained(adapter_id)  # Tokenizer included in adapter repo
model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype="bfloat16",
    device_map="auto",
)

# Apply LoRA adapter
model = PeftModel.from_pretrained(model, adapter_id)

messages = [{"role": "user", "content": "Write a Python function to parse JSON."}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)
inputs = inputs.to(model.device)

outputs = model.generate(inputs, max_new_tokens=512, temperature=0.7, top_p=0.8, top_k=20)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))

Merge LoRA into base model

from transformers import AutoModelForCausalLM
from peft import PeftModel

base_model = AutoModelForCausalLM.from_pretrained(
    "unsloth/Qwen3.5-35B-A3B",
    torch_dtype="bfloat16",
    device_map="cpu",  # Merge on CPU to avoid VRAM issues
)

model = PeftModel.from_pretrained(base_model, "Kevletesteur/Qwen3.5-35B-A3B-Chimere-Distilled-LoRA")
merged = model.merge_and_unload()
merged.save_pretrained("./chimere-merged-bf16")

Continue fine-tuning with Unsloth

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="Kevletesteur/Qwen3.5-35B-A3B-Chimere-Distilled-LoRA",
    max_seq_length=32768,
    dtype="bfloat16",
    load_in_4bit=True,
)

# The LoRA is already applied -- continue training with your dataset
# ...

LoRA Configuration

Parameter	Value
Base model	`unsloth/Qwen3.5-35B-A3B`
PEFT type	LoRA
Rank (r)	64
Alpha	64 (alpha/r ratio = 1.0)
Dropout	0
Bias	none
Task	CAUSAL_LM
PEFT version	0.18.1

Target modules

Attention projections: q_proj, k_proj, v_proj, o_proj

MoE expert projections (all 256 experts): gate_proj, up_proj, down_proj

MoE fused expert tensors: mlp.experts.gate_up_proj, mlp.experts.down_proj

This broad targeting (attention + all MoE experts) is why the adapter is 44.7 GB -- it adapts the routing behavior of every expert.

Files

File	Description
`adapter_model.safetensors`	LoRA adapter weights (~44.7 GB)
`adapter_config.json`	LoRA configuration (rank, alpha, target modules)
`tokenizer.json`, `tokenizer_config.json`	Tokenizer (same as base model)
`chat_template.jinja`	Chat template (Qwen3.5 format)
`processor_config.json`	Processor config

Training Details

Parameter	Value
Method	SFT BF16 LoRA r64, completion-only loss
Dataset	9,763 samples (37% BFCL v3 ground truth + 59% Opus reasoning traces + 4% gold)
Distillation source	Claude Opus 4.6 reasoning traces
Epochs	1 (611 steps, batch 16)
GPU	NVIDIA B200
Cost	~$5
Framework	Unsloth + PEFT 0.18.1

Chimere v1 GGUF -- v1 RAMP quantized, ready for inference
Chimere v3 GGUF -- v3 RAMP quantized, ready for inference
BF16 merged weights -- Pre-merged BF16 (no adapter loading needed)
Base model: Qwen3.5-35B-A3B (official)
Base model: unsloth/Qwen3.5-35B-A3B (Unsloth optimized, used for training)
GitHub: Chimere
GitHub: Chimere ODO

Citation

@misc{chimere-distilled-2026,
  title={Chimere: Claude Opus 4.6 Distillation of Qwen3.5-35B-A3B MoE for Agentic Local Inference},
  author={Kevletesteur},
  year={2026},
  url={https://huggingface.co/Kevletesteur/Qwen3.5-35B-A3B-Chimere-Distilled-LoRA}
}

Downloads last month: 41

Model tree for Kevletesteur/Qwen3.5-35B-A3B-Chimere-Distilled-LoRA

Base model

Qwen/Qwen3.5-35B-A3B-Base

Finetuned

Qwen/Qwen3.5-35B-A3B