🐤 Q-Tiny MLX — Qwen 3.5 4B Cagatay (4-bit)

A 4-bit quantized MLX model for Apple Silicon — fine-tuned for robotics reasoning and instruction following.

8.4 GB → 2.4 GB | 4.5 bits/weight | Runs on MacBook Air

What is this?

This is the merged + quantized version of cagataydev/qwen3.5-4B-cagatay (LoRA adapter) for native Apple Silicon inference via MLX.

Pipeline: Qwen/Qwen3.5-4B + LoRA adapter → merged weights → MLX 4-bit quantization (group size 64)

🚀 Use with Strands Agents + MLX

The recommended way to use this model is with strands-agents and strands-mlx:

pip install strands-agents strands-agents-mlx

from strands import Agent
from strands_mlx import MLXModel

# Load the 4-bit quantized model
model = MLXModel(model_id="cagataydev/Qwen3.5-4B-cagatay-4bit")

# Create an agent with tools
agent = Agent(model=model)

# Use it!
agent("Plan the steps to pick up a red cube and place it on the shelf")

With Custom Tools

from strands import Agent, tool
from strands_mlx import MLXModel

@tool
def get_robot_state() -> dict:
    """Get the current state of the robot."""
    return {"position": [0.5, 0.3, 0.1], "gripper": "open"}

model = MLXModel(
    model_id="cagataydev/Qwen3.5-4B-cagatay-4bit",
    params={"temperature": 0.7, "max_tokens": 1024}
)

agent = Agent(model=model, tools=[get_robot_state])
agent("What is the robot's current position? Then plan a pick-and-place task.")

With DevDuck

pip install devduck
export MODEL_PROVIDER=mlx
export STRANDS_MODEL_ID=cagataydev/Qwen3.5-4B-cagatay-4bit
devduck

📦 Use with mlx-lm (standalone)

pip install mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("cagataydev/Qwen3.5-4B-cagatay-4bit")

messages = [{"role": "user", "content": "Plan how to pick up a cup from the table"}]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_dict=False)

response = generate(model, tokenizer, prompt=prompt, verbose=True, max_tokens=512)

CLI

mlx_lm generate --model cagataydev/Qwen3.5-4B-cagatay-4bit --prompt "Hello!"

📊 Model Details

Property	Value
Base Model	Qwen/Qwen3.5-4B
Fine-tune	cagataydev/qwen3.5-4B-cagatay (LoRA)
Architecture	Qwen 3.5 (32 layers, 2560 hidden, 16 heads)
Parameters	4B total
Quantization	4-bit (4.503 bits/weight, group size 64)
Model Size	2.4 GB (down from 8.4 GB fp16)
Format	MLX SafeTensors
Platform	Apple Silicon (M1/M2/M3/M4)
License	Apache 2.0

🏋️ Training Provenance

The LoRA adapter was trained with:

Parameter	Value
Method	LoRA + SFT (TRL)
LoRA Rank	32
LoRA Alpha	64
Dropout	0.05
Target Modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Infrastructure	HuggingFace Jobs (cloud GPU)

🤖 Use Cases

Robotics task planning — Break down commands into step-by-step action plans
Embodied reasoning — Spatial understanding and action sequencing
Edge deployment — 2.4 GB fits comfortably on any Apple Silicon Mac
Strands agent backbone — Local model for Strands Agents on Mac
Neon VLA — Part of the Neon VLA vision-language-action stack

📦 Q-Model Family

Model	Base	Size	Quantized	Use Case
🌐 Q-Omni	Qwen 2.5 Omni 3B	3B	—	Voice & multimodal
🐤 Q-Tiny (this)	Qwen 3.5 4B	4B	2.4 GB 4-bit	Task planning on Mac
🧠 Q-Brain	Qwen 3.5 35B MoE	35B (3B active)	—	Complex reasoning

Built with DevDuck 🦆 and Strands Agents 🧬

Downloads last month: 50

Safetensors

Model size

0.7B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

Model tree for cagataydev/Qwen3.5-4B-cagatay-4bit

Base model

Qwen/Qwen3.5-4B-Base

Finetuned

Qwen/Qwen3.5-4B

Quantized

(143)

this model