nani-qwen-3.5-2B (MLX 4-bit)

MLX 4-bit quantized version of nani-qwen-3.5-2B for Apple Silicon Macs.

A fine-tuned Qwen3.5-2B for crypto wallet tool calling. Trained with Unsloth LoRA on 2,029 examples covering 103 blockchain tools from agentek.

Quickstart

pip install mlx-lm

# Run inference
python -m mlx_lm.generate \
  --model NaniDAO/nani-qwen-3.5-2B-mlx-4bit \
  --prompt "resolve vitalik.eth"

# Or load in Python
from mlx_lm import load, generate

model, tokenizer = load("NaniDAO/nani-qwen-3.5-2B-mlx-4bit")
response = generate(model, tokenizer, prompt="resolve vitalik.eth", max_tokens=256)
print(response)

Model Details

Base model Qwen3.5-2B
Method LoRA (r=16, alpha=16, dropout=0.05)
Training data 2,029 examples, 103 tools
Epochs 3 (optimal — 4th showed no improvement)
Hardware 1x T4 GPU (Kaggle)
Quantization 4-bit affine (group_size=64)
Context 4096 tokens
Size ~1.2GB
License Same as Qwen3.5-2B

Eval Results

Evaluated on 50 held-out examples. Base = Qwen3.5-2B without fine-tuning.

Metric Base Nani Delta
Tool call accuracy 98.0% 96.0% -2.0%
Correct function 75.5% 81.6% +6.1%
Correct params 67.3% 71.4% +4.1%
Format valid 100.0% 100.0%
Has <think> block 28.0% 100.0% +72.0%
No-tool correct 0.0% 100.0% +100.0%

The model learned to pick the right tool (+6.1%), use correct parameters (+4.1%), reason before acting (100% thinking), and know when NOT to call a tool (+100%). The small drop in tool call accuracy (-2%) is mostly sampling noise — 3 of 4 "failures" succeed on rerun.

Tool Call Format

The model uses Qwen3.5's native XML tool calling format:

<think>
The user wants to resolve an ENS name. I'll use resolveENS.
</think>

<tool_call>
<function=resolveENS>
<parameter=name>vitalik.eth</parameter>
</function>
</tool_call>

Tool results are passed back as tool role messages, then the model generates a final response.

System Prompt Format

Tools must be defined in the system prompt as newline-separated JSON inside <tools> tags. This matches the training data format exactly:

# Tools

You have access to the following functions:

<tools>
{"type":"function","function":{"name":"resolveENS","description":"Resolves an ENS name to an Ethereum address","parameters":{"type":"object","properties":{"name":{"type":"string","description":"The ENS name to resolve"}},"required":["name"]}}}
{"type":"function","function":{"name":"getBalance","description":"Get the native token (ETH) balance for an address","parameters":{"type":"object","properties":{"address":{"type":"string","description":"The wallet address (0x...)"},"chainId":{"type":"number","description":"Chain ID (1=Ethereum, 8453=Base)"}},"required":["address"]}}}
</tools>

If you choose to call a function ONLY reply in the following format with NO suffix:

<tool_call>
<function=example_function_name>
<parameter=example_parameter_1>
value_1
</parameter>
</function>
</tool_call>

<IMPORTANT>
Reminder:
- Function calls MUST follow the specified format
- Required parameters MUST be specified
- You may provide optional reasoning BEFORE the function call, but NOT after
- If there is no function call available, answer the question like normal
</IMPORTANT>

You are Nani, a crypto wallet assistant.

Keep tool schemas simple — use {"type": "string", "description": "..."} per property. Complex schemas with anyOf, $ref, or nested objects confuse the 2B model.

Supported Tools

Trained on 103 tools from agentek. Top tools by training examples:

Tool Examples Category
intentSwap 183 DEX trading
intentTransfer 157 Token transfers
resolveENS 71 ENS resolution
getBalance 64 Balance queries
getBalanceOf 59 ERC20 balances
resolveWNS 58 WNS resolution
getCryptoPrice 55 Price data
lookupENS 54 Reverse ENS
getQuote 53 Swap quotes
getFearAndGreedIndex 50 Market sentiment

Full coverage includes: ENS/WNS, ERC20, Uniswap V3, Aave, bridging (Across), security (ScamSniffer), blockscout explorer, gas estimation, DeFillama yields, NFTs, and more.

Tool Call Parsing

Regex to extract tool calls from model output:

const regex = /<tool_call>\s*<function=(\w+)>([\s\S]*?)<\/function>\s*(?:<\/tool_call>)?/g;
const paramRegex = /<parameter=(\w+)>([\s\S]*?)<\/parameter>/g;

All parameter values are strings — coerce to numbers/booleans based on the tool's schema before execution.

Training Config

model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    lora_alpha=16,
    lora_dropout=0.05,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                     "gate_proj", "up_proj", "down_proj"],
)

# Tokenize with enable_thinking=True (critical)
text = tokenizer.apply_chat_template(messages, tokenize=False,
    add_generation_prompt=False, enable_thinking=True)

TrainingArguments(
    num_train_epochs=3,
    learning_rate=5e-5,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=16,
    lr_scheduler_type="cosine",
    warmup_ratio=0.1,
    fp16=True,
    optim="adamw_8bit",
)

enable_thinking=True is critical. With it disabled (v2), the model regressed -64% in tool accuracy because the tokenizer mangled the <think> blocks present in 97% of training data.

HuggingFace Repos

Repo Format Size
NaniDAO/nani-qwen-3.5-2B Merged bf16 ~5GB
NaniDAO/nani-qwen-3.5-2B-gguf-q4km GGUF Q4_K_M ~1.3GB
NaniDAO/nani-qwen-3.5-2B-mlx-4bit MLX 4-bit ~1.2GB

Vision Support

Qwen3.5-2B is a vision-language model (VLM), and the base weights include a vision encoder. However, this MLX model only contains the text/language model weights. The vision encoder was not included in the conversion.

Known Limitations

  • Text-only — vision encoder not included
  • 2B model size — sometimes hallucinates tool names or picks wrong parameters. Works best with 5 or fewer tools in the system prompt.
  • Simple schemas only — complex JSON schemas with anyOf, $ref, or nested objects confuse the model. Keep tool definitions flat: {"type": "string", "description": "..."} per property.
  • Training data imbalance — 78 of 103 tools have fewer than 15 examples. Performance on underrepresented tools is weaker.

Conversion

This model was converted using a standalone converter that requires only numpy + safetensors — no mlx or macOS needed. You can run it on any Linux/Windows machine.

Local Testing

See nani-local for a Vite + React test app that connects to Ollama and executes real agentek tools with streaming, tool call visualization, and regex-based tool filtering (5 tools per message).

Downloads last month
89
Safetensors
Model size
0.4B params
Tensor type
U32
·
F16
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NaniDAO/nani-qwen-3.5-2B-mlx-4bit

Finetuned
Qwen/Qwen3.5-2B
Adapter
(47)
this model

Evaluation results