π€ Q-Tiny MLX β Qwen 3.5 4B Cagatay (4-bit)
A 4-bit quantized MLX model for Apple Silicon β fine-tuned for robotics reasoning and instruction following.
8.4 GB β 2.4 GB | 4.5 bits/weight | Runs on MacBook Air
What is this?
This is the merged + quantized version of cagataydev/qwen3.5-4B-cagatay (LoRA adapter) for native Apple Silicon inference via MLX.
Pipeline: Qwen/Qwen3.5-4B + LoRA adapter β merged weights β MLX 4-bit quantization (group size 64)
π Use with Strands Agents + MLX
The recommended way to use this model is with strands-agents and strands-mlx:
pip install strands-agents strands-agents-mlx
from strands import Agent
from strands_mlx import MLXModel
# Load the 4-bit quantized model
model = MLXModel(model_id="cagataydev/Qwen3.5-4B-cagatay-4bit")
# Create an agent with tools
agent = Agent(model=model)
# Use it!
agent("Plan the steps to pick up a red cube and place it on the shelf")
With Custom Tools
from strands import Agent, tool
from strands_mlx import MLXModel
@tool
def get_robot_state() -> dict:
"""Get the current state of the robot."""
return {"position": [0.5, 0.3, 0.1], "gripper": "open"}
model = MLXModel(
model_id="cagataydev/Qwen3.5-4B-cagatay-4bit",
params={"temperature": 0.7, "max_tokens": 1024}
)
agent = Agent(model=model, tools=[get_robot_state])
agent("What is the robot's current position? Then plan a pick-and-place task.")
With DevDuck
pip install devduck
export MODEL_PROVIDER=mlx
export STRANDS_MODEL_ID=cagataydev/Qwen3.5-4B-cagatay-4bit
devduck
π¦ Use with mlx-lm (standalone)
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("cagataydev/Qwen3.5-4B-cagatay-4bit")
messages = [{"role": "user", "content": "Plan how to pick up a cup from the table"}]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_dict=False)
response = generate(model, tokenizer, prompt=prompt, verbose=True, max_tokens=512)
CLI
mlx_lm generate --model cagataydev/Qwen3.5-4B-cagatay-4bit --prompt "Hello!"
π Model Details
| Property | Value |
|---|---|
| Base Model | Qwen/Qwen3.5-4B |
| Fine-tune | cagataydev/qwen3.5-4B-cagatay (LoRA) |
| Architecture | Qwen 3.5 (32 layers, 2560 hidden, 16 heads) |
| Parameters | 4B total |
| Quantization | 4-bit (4.503 bits/weight, group size 64) |
| Model Size | 2.4 GB (down from 8.4 GB fp16) |
| Format | MLX SafeTensors |
| Platform | Apple Silicon (M1/M2/M3/M4) |
| License | Apache 2.0 |
ποΈ Training Provenance
The LoRA adapter was trained with:
| Parameter | Value |
|---|---|
| Method | LoRA + SFT (TRL) |
| LoRA Rank | 32 |
| LoRA Alpha | 64 |
| Dropout | 0.05 |
| Target Modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Infrastructure | HuggingFace Jobs (cloud GPU) |
π€ Use Cases
- Robotics task planning β Break down commands into step-by-step action plans
- Embodied reasoning β Spatial understanding and action sequencing
- Edge deployment β 2.4 GB fits comfortably on any Apple Silicon Mac
- Strands agent backbone β Local model for Strands Agents on Mac
- Neon VLA β Part of the Neon VLA vision-language-action stack
π¦ Q-Model Family
| Model | Base | Size | Quantized | Use Case |
|---|---|---|---|---|
| π Q-Omni | Qwen 2.5 Omni 3B | 3B | β | Voice & multimodal |
| π€ Q-Tiny (this) | Qwen 3.5 4B | 4B | 2.4 GB 4-bit | Task planning on Mac |
| π§ Q-Brain | Qwen 3.5 35B MoE | 35B (3B active) | β | Complex reasoning |
Built with DevDuck π¦ and Strands Agents π§¬
- Downloads last month
- 50
Model size
0.7B params
Tensor type
BF16
Β·
U32 Β·
Hardware compatibility
Log In to add your hardware
4-bit