nani-qwen-3.5-2B (MLX 4-bit)

MLX 4-bit quantized version of nani-qwen-3.5-2B for Apple Silicon Macs.

A fine-tuned Qwen3.5-2B for crypto wallet tool calling. Trained with Unsloth LoRA on 2,029 examples covering 103 blockchain tools from agentek.

Quickstart

pip install mlx-lm

# Run inference
python -m mlx_lm.generate \
  --model NaniDAO/nani-qwen-3.5-2B-mlx-4bit \
  --prompt "resolve vitalik.eth"

# Or load in Python
from mlx_lm import load, generate

model, tokenizer = load("NaniDAO/nani-qwen-3.5-2B-mlx-4bit")
response = generate(model, tokenizer, prompt="resolve vitalik.eth", max_tokens=256)
print(response)

Model Details


Base model	Qwen3.5-2B
Method	LoRA (r=16, alpha=16, dropout=0.05)
Training data	2,029 examples, 103 tools
Epochs	3 (optimal — 4th showed no improvement)
Hardware	1x T4 GPU (Kaggle)
Quantization	4-bit affine (group_size=64)
Context	4096 tokens
Size	~1.2GB
License	Same as Qwen3.5-2B

Eval Results

Evaluated on 50 held-out examples. Base = Qwen3.5-2B without fine-tuning.

Metric	Base	Nani	Delta
Tool call accuracy	98.0%	96.0%	-2.0%
Correct function	75.5%	81.6%	+6.1%
Correct params	67.3%	71.4%	+4.1%
Format valid	100.0%	100.0%	—
Has `<think>` block	28.0%	100.0%	+72.0%
No-tool correct	0.0%	100.0%	+100.0%

The model learned to pick the right tool (+6.1%), use correct parameters (+4.1%), reason before acting (100% thinking), and know when NOT to call a tool (+100%). The small drop in tool call accuracy (-2%) is mostly sampling noise — 3 of 4 "failures" succeed on rerun.

Tool Call Format

The model uses Qwen3.5's native XML tool calling format:

<think>
The user wants to resolve an ENS name. I'll use resolveENS.
</think>

<tool_call>
<function=resolveENS>
<parameter=name>vitalik.eth</parameter>
</function>
</tool_call>

Tool results are passed back as tool role messages, then the model generates a final response.

System Prompt Format

Tools must be defined in the system prompt as newline-separated JSON inside <tools> tags. This matches the training data format exactly:

# Tools

You have access to the following functions:

<tools>
{"type":"function","function":{"name":"resolveENS","description":"Resolves an ENS name to an Ethereum address","parameters":{"type":"object","properties":{"name":{"type":"string","description":"The ENS name to resolve"}},"required":["name"]}}}
{"type":"function","function":{"name":"getBalance","description":"Get the native token (ETH) balance for an address","parameters":{"type":"object","properties":{"address":{"type":"string","description":"The wallet address (0x...)"},"chainId":{"type":"number","description":"Chain ID (1=Ethereum, 8453=Base)"}},"required":["address"]}}}
</tools>

If you choose to call a function ONLY reply in the following format with NO suffix:

<tool_call>
<function=example_function_name>
<parameter=example_parameter_1>
value_1
</parameter>
</function>
</tool_call>

<IMPORTANT>
Reminder:
- Function calls MUST follow the specified format
- Required parameters MUST be specified
- You may provide optional reasoning BEFORE the function call, but NOT after
- If there is no function call available, answer the question like normal
</IMPORTANT>

You are Nani, a crypto wallet assistant.

Keep tool schemas simple — use {"type": "string", "description": "..."} per property. Complex schemas with anyOf, $ref, or nested objects confuse the 2B model.

Supported Tools

Trained on 103 tools from agentek. Top tools by training examples:

Tool	Examples	Category
intentSwap	183	DEX trading
intentTransfer	157	Token transfers
resolveENS	71	ENS resolution
getBalance	64	Balance queries
getBalanceOf	59	ERC20 balances
resolveWNS	58	WNS resolution
getCryptoPrice	55	Price data
lookupENS	54	Reverse ENS
getQuote	53	Swap quotes
getFearAndGreedIndex	50	Market sentiment

Full coverage includes: ENS/WNS, ERC20, Uniswap V3, Aave, bridging (Across), security (ScamSniffer), blockscout explorer, gas estimation, DeFillama yields, NFTs, and more.

Tool Call Parsing

Regex to extract tool calls from model output:

const regex = /<tool_call>\s*<function=(\w+)>([\s\S]*?)<\/function>\s*(?:<\/tool_call>)?/g;
const paramRegex = /<parameter=(\w+)>([\s\S]*?)<\/parameter>/g;

All parameter values are strings — coerce to numbers/booleans based on the tool's schema before execution.

Training Config

model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    lora_alpha=16,
    lora_dropout=0.05,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
                     "gate_proj", "up_proj", "down_proj"],
)

# Tokenize with enable_thinking=True (critical)
text = tokenizer.apply_chat_template(messages, tokenize=False,
    add_generation_prompt=False, enable_thinking=True)

TrainingArguments(
    num_train_epochs=3,
    learning_rate=5e-5,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=16,
    lr_scheduler_type="cosine",
    warmup_ratio=0.1,
    fp16=True,
    optim="adamw_8bit",
)

enable_thinking=True is critical. With it disabled (v2), the model regressed -64% in tool accuracy because the tokenizer mangled the <think> blocks present in 97% of training data.

HuggingFace Repos

Repo	Format	Size
NaniDAO/nani-qwen-3.5-2B	Merged bf16	~5GB
NaniDAO/nani-qwen-3.5-2B-gguf-q4km	GGUF Q4_K_M	~1.3GB
NaniDAO/nani-qwen-3.5-2B-mlx-4bit	MLX 4-bit	~1.2GB

Vision Support

Qwen3.5-2B is a vision-language model (VLM), and the base weights include a vision encoder. However, this MLX model only contains the text/language model weights. The vision encoder was not included in the conversion.

Known Limitations

Text-only — vision encoder not included
2B model size — sometimes hallucinates tool names or picks wrong parameters. Works best with 5 or fewer tools in the system prompt.
Simple schemas only — complex JSON schemas with anyOf, $ref, or nested objects confuse the model. Keep tool definitions flat: {"type": "string", "description": "..."} per property.
Training data imbalance — 78 of 103 tools have fewer than 15 examples. Performance on underrepresented tools is weaker.

Conversion

This model was converted using a standalone converter that requires only numpy + safetensors — no mlx or macOS needed. You can run it on any Linux/Windows machine.

Local Testing

See nani-local for a Vite + React test app that connects to Ollama and executes real agentek tools with streaming, tool call visualization, and regex-based tool filtering (5 tools per message).

Downloads last month: 89

Safetensors

Model size

0.4B params

Tensor type

U32

F16

MLX

Hardware compatibility

4-bit

Model tree for NaniDAO/nani-qwen-3.5-2B-mlx-4bit

Base model

Qwen/Qwen3.5-2B-Base

Finetuned

Qwen/Qwen3.5-2B

Adapter

(47)

this model

Evaluation results

Tool Call Accuracy
self-reported

96.000
Correct Function
self-reported

81.600
Correct Params
self-reported

71.400
Format Valid
self-reported

100.000
No-Tool Correct
self-reported

100.000