🐀 Q-Tiny MLX β€” Qwen 3.5 4B Cagatay (4-bit)

A 4-bit quantized MLX model for Apple Silicon β€” fine-tuned for robotics reasoning and instruction following.

Base Model LoRA Adapter Strands MLX Strands Agents

8.4 GB β†’ 2.4 GB | 4.5 bits/weight | Runs on MacBook Air

What is this?

This is the merged + quantized version of cagataydev/qwen3.5-4B-cagatay (LoRA adapter) for native Apple Silicon inference via MLX.

Pipeline: Qwen/Qwen3.5-4B + LoRA adapter β†’ merged weights β†’ MLX 4-bit quantization (group size 64)

πŸš€ Use with Strands Agents + MLX

The recommended way to use this model is with strands-agents and strands-mlx:

pip install strands-agents strands-agents-mlx
from strands import Agent
from strands_mlx import MLXModel

# Load the 4-bit quantized model
model = MLXModel(model_id="cagataydev/Qwen3.5-4B-cagatay-4bit")

# Create an agent with tools
agent = Agent(model=model)

# Use it!
agent("Plan the steps to pick up a red cube and place it on the shelf")

With Custom Tools

from strands import Agent, tool
from strands_mlx import MLXModel

@tool
def get_robot_state() -> dict:
    """Get the current state of the robot."""
    return {"position": [0.5, 0.3, 0.1], "gripper": "open"}

model = MLXModel(
    model_id="cagataydev/Qwen3.5-4B-cagatay-4bit",
    params={"temperature": 0.7, "max_tokens": 1024}
)

agent = Agent(model=model, tools=[get_robot_state])
agent("What is the robot's current position? Then plan a pick-and-place task.")

With DevDuck

pip install devduck
export MODEL_PROVIDER=mlx
export STRANDS_MODEL_ID=cagataydev/Qwen3.5-4B-cagatay-4bit
devduck

πŸ“¦ Use with mlx-lm (standalone)

pip install mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("cagataydev/Qwen3.5-4B-cagatay-4bit")

messages = [{"role": "user", "content": "Plan how to pick up a cup from the table"}]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_dict=False)

response = generate(model, tokenizer, prompt=prompt, verbose=True, max_tokens=512)

CLI

mlx_lm generate --model cagataydev/Qwen3.5-4B-cagatay-4bit --prompt "Hello!"

πŸ“Š Model Details

Property Value
Base Model Qwen/Qwen3.5-4B
Fine-tune cagataydev/qwen3.5-4B-cagatay (LoRA)
Architecture Qwen 3.5 (32 layers, 2560 hidden, 16 heads)
Parameters 4B total
Quantization 4-bit (4.503 bits/weight, group size 64)
Model Size 2.4 GB (down from 8.4 GB fp16)
Format MLX SafeTensors
Platform Apple Silicon (M1/M2/M3/M4)
License Apache 2.0

πŸ‹οΈ Training Provenance

The LoRA adapter was trained with:

Parameter Value
Method LoRA + SFT (TRL)
LoRA Rank 32
LoRA Alpha 64
Dropout 0.05
Target Modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Infrastructure HuggingFace Jobs (cloud GPU)

πŸ€– Use Cases

  • Robotics task planning β€” Break down commands into step-by-step action plans
  • Embodied reasoning β€” Spatial understanding and action sequencing
  • Edge deployment β€” 2.4 GB fits comfortably on any Apple Silicon Mac
  • Strands agent backbone β€” Local model for Strands Agents on Mac
  • Neon VLA β€” Part of the Neon VLA vision-language-action stack

πŸ“¦ Q-Model Family

Model Base Size Quantized Use Case
🌐 Q-Omni Qwen 2.5 Omni 3B 3B β€” Voice & multimodal
🐀 Q-Tiny (this) Qwen 3.5 4B 4B 2.4 GB 4-bit Task planning on Mac
🧠 Q-Brain Qwen 3.5 35B MoE 35B (3B active) β€” Complex reasoning

Built with DevDuck πŸ¦† and Strands Agents 🧬

Downloads last month
50
Safetensors
Model size
0.7B params
Tensor type
BF16
Β·
U32
Β·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for cagataydev/Qwen3.5-4B-cagatay-4bit

Finetuned
Qwen/Qwen3.5-4B
Quantized
(143)
this model