🧠 Qwen 3.5 35B-A3B — Cagatay LoRA

A LoRA fine-tune of Qwen/Qwen3.5-35B-A3B — a 35B Mixture-of-Experts model with only 3B active parameters.

🎯 What is this?

A LoRA adapter for the Qwen 3.5 35B MoE (Mixture-of-Experts) model. This architecture gives you 35B-level reasoning with only 3B active parameters per forward pass — making it surprisingly efficient for complex robotics task planning.

Fine-tuned using SFT via TRL on HuggingFace Jobs.

⚡ Why MoE for Robotics?

Property	Benefit
35B total params	Deep reasoning capacity for complex multi-step tasks
3B active params	Fast inference — only 3B params compute per token
Expert routing	Different experts specialize in different command types
Efficient LoRA	Only 50MB adapter on top of base model

📊 Training Details

Parameter	Value
Base Model	`Qwen/Qwen3.5-35B-A3B` (MoE)
Architecture	Mixture-of-Experts (35B total, 3B active)
Method	LoRA (PEFT) + SFT (TRL)
Rank (r)	32
Alpha	64
Dropout	0.05
Target Modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Adapter Size	50 MB
Framework	TRL 0.29.1, Transformers 5.3.0, PyTorch 2.10.0, PEFT 0.18.1
Training	HuggingFace Jobs (cloud GPU)

🚀 Quick Start

from transformers import pipeline

generator = pipeline(
    "text-generation",
    model="cagataydev/qwen3.5-35B-A3B-cagatay",
    device_map="auto",
    torch_dtype="auto"
)

# Complex multi-step robotics reasoning
output = generator(
    [{"role": "user", "content": "You're a household robot. The kitchen is messy after cooking. Plan a complete cleanup sequence, considering what needs to be done first and why."}],
    max_new_tokens=512,
    return_full_text=False
)[0]
print(output["generated_text"])

With PEFT (explicit)

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3.5-35B-A3B",
    torch_dtype="auto",
    device_map="auto"
)
model = PeftModel.from_pretrained(base, "cagataydev/qwen3.5-35B-A3B-cagatay")
tokenizer = AutoTokenizer.from_pretrained("cagataydev/qwen3.5-35B-A3B-cagatay")

🤖 Use Cases

Complex task planning — Multi-step reasoning with dependency awareness
Household robotics — Full cleanup/cooking/organizing sequences
Safety-aware planning — Considers order of operations and risks
Neon VLA reasoning engine — Highest-capability model in the Neon stack

📦 Model Family

Model	Base	Total / Active	Best For
qwen2.5-omni-3b	Qwen 2.5 3B	1.8B / 1.8B	Voice commands
qwen3.5-4B	Qwen 3.5 4B	4B / 4B	Simple task planning
qwen3.5-35B-A3B	Qwen 3.5 35B MoE	35B / 3B	Complex reasoning (this)

💡 Hardware Requirements

Setup	Works?	Notes
A100 80GB	✅	Full precision
L40S 46GB	✅	bf16
RTX 4090 24GB	✅	4-bit quantization (GPTQ/AWQ)
Jetson Orin 32GB	⚠️	Needs quantization
Consumer 16GB	❌	Too large even quantized

Built with DevDuck 🦆 | Trained on HuggingFace Jobs | Part of the Neon VLA ecosystem

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for cagataydev/qwen3.5-35B-A3B-cagatay

Base model

Qwen/Qwen3.5-35B-A3B-Base

Finetuned

Qwen/Qwen3.5-35B-A3B

Adapter

(25)

this model