π§ Qwen 3.5 35B-A3B β Cagatay LoRA
A LoRA fine-tune of Qwen/Qwen3.5-35B-A3B β a 35B Mixture-of-Experts model with only 3B active parameters.
π― What is this?
A LoRA adapter for the Qwen 3.5 35B MoE (Mixture-of-Experts) model. This architecture gives you 35B-level reasoning with only 3B active parameters per forward pass β making it surprisingly efficient for complex robotics task planning.
Fine-tuned using SFT via TRL on HuggingFace Jobs.
β‘ Why MoE for Robotics?
| Property | Benefit |
|---|---|
| 35B total params | Deep reasoning capacity for complex multi-step tasks |
| 3B active params | Fast inference β only 3B params compute per token |
| Expert routing | Different experts specialize in different command types |
| Efficient LoRA | Only 50MB adapter on top of base model |
π Training Details
| Parameter | Value |
|---|---|
| Base Model | Qwen/Qwen3.5-35B-A3B (MoE) |
| Architecture | Mixture-of-Experts (35B total, 3B active) |
| Method | LoRA (PEFT) + SFT (TRL) |
| Rank (r) | 32 |
| Alpha | 64 |
| Dropout | 0.05 |
| Target Modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Adapter Size | 50 MB |
| Framework | TRL 0.29.1, Transformers 5.3.0, PyTorch 2.10.0, PEFT 0.18.1 |
| Training | HuggingFace Jobs (cloud GPU) |
π Quick Start
from transformers import pipeline
generator = pipeline(
"text-generation",
model="cagataydev/qwen3.5-35B-A3B-cagatay",
device_map="auto",
torch_dtype="auto"
)
# Complex multi-step robotics reasoning
output = generator(
[{"role": "user", "content": "You're a household robot. The kitchen is messy after cooking. Plan a complete cleanup sequence, considering what needs to be done first and why."}],
max_new_tokens=512,
return_full_text=False
)[0]
print(output["generated_text"])
With PEFT (explicit)
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen3.5-35B-A3B",
torch_dtype="auto",
device_map="auto"
)
model = PeftModel.from_pretrained(base, "cagataydev/qwen3.5-35B-A3B-cagatay")
tokenizer = AutoTokenizer.from_pretrained("cagataydev/qwen3.5-35B-A3B-cagatay")
π€ Use Cases
- Complex task planning β Multi-step reasoning with dependency awareness
- Household robotics β Full cleanup/cooking/organizing sequences
- Safety-aware planning β Considers order of operations and risks
- Neon VLA reasoning engine β Highest-capability model in the Neon stack
π¦ Model Family
| Model | Base | Total / Active | Best For |
|---|---|---|---|
| qwen2.5-omni-3b | Qwen 2.5 3B | 1.8B / 1.8B | Voice commands |
| qwen3.5-4B | Qwen 3.5 4B | 4B / 4B | Simple task planning |
| qwen3.5-35B-A3B | Qwen 3.5 35B MoE | 35B / 3B | Complex reasoning (this) |
π‘ Hardware Requirements
| Setup | Works? | Notes |
|---|---|---|
| A100 80GB | β | Full precision |
| L40S 46GB | β | bf16 |
| RTX 4090 24GB | β | 4-bit quantization (GPTQ/AWQ) |
| Jetson Orin 32GB | β οΈ | Needs quantization |
| Consumer 16GB | β | Too large even quantized |
Built with DevDuck π¦ | Trained on HuggingFace Jobs | Part of the Neon VLA ecosystem