nani-qwen-3.5-2B (MLX 4-bit)
MLX 4-bit quantized version of nani-qwen-3.5-2B for Apple Silicon Macs.
A fine-tuned Qwen3.5-2B for crypto wallet tool calling. Trained with Unsloth LoRA on 2,029 examples covering 103 blockchain tools from agentek.
Quickstart
pip install mlx-lm
# Run inference
python -m mlx_lm.generate \
--model NaniDAO/nani-qwen-3.5-2B-mlx-4bit \
--prompt "resolve vitalik.eth"
# Or load in Python
from mlx_lm import load, generate
model, tokenizer = load("NaniDAO/nani-qwen-3.5-2B-mlx-4bit")
response = generate(model, tokenizer, prompt="resolve vitalik.eth", max_tokens=256)
print(response)
Model Details
| Base model | Qwen3.5-2B |
| Method | LoRA (r=16, alpha=16, dropout=0.05) |
| Training data | 2,029 examples, 103 tools |
| Epochs | 3 (optimal — 4th showed no improvement) |
| Hardware | 1x T4 GPU (Kaggle) |
| Quantization | 4-bit affine (group_size=64) |
| Context | 4096 tokens |
| Size | ~1.2GB |
| License | Same as Qwen3.5-2B |
Eval Results
Evaluated on 50 held-out examples. Base = Qwen3.5-2B without fine-tuning.
| Metric | Base | Nani | Delta |
|---|---|---|---|
| Tool call accuracy | 98.0% | 96.0% | -2.0% |
| Correct function | 75.5% | 81.6% | +6.1% |
| Correct params | 67.3% | 71.4% | +4.1% |
| Format valid | 100.0% | 100.0% | — |
Has <think> block |
28.0% | 100.0% | +72.0% |
| No-tool correct | 0.0% | 100.0% | +100.0% |
The model learned to pick the right tool (+6.1%), use correct parameters (+4.1%), reason before acting (100% thinking), and know when NOT to call a tool (+100%). The small drop in tool call accuracy (-2%) is mostly sampling noise — 3 of 4 "failures" succeed on rerun.
Tool Call Format
The model uses Qwen3.5's native XML tool calling format:
<think>
The user wants to resolve an ENS name. I'll use resolveENS.
</think>
<tool_call>
<function=resolveENS>
<parameter=name>vitalik.eth</parameter>
</function>
</tool_call>
Tool results are passed back as tool role messages, then the model generates a final response.
System Prompt Format
Tools must be defined in the system prompt as newline-separated JSON inside <tools> tags. This matches the training data format exactly:
# Tools
You have access to the following functions:
<tools>
{"type":"function","function":{"name":"resolveENS","description":"Resolves an ENS name to an Ethereum address","parameters":{"type":"object","properties":{"name":{"type":"string","description":"The ENS name to resolve"}},"required":["name"]}}}
{"type":"function","function":{"name":"getBalance","description":"Get the native token (ETH) balance for an address","parameters":{"type":"object","properties":{"address":{"type":"string","description":"The wallet address (0x...)"},"chainId":{"type":"number","description":"Chain ID (1=Ethereum, 8453=Base)"}},"required":["address"]}}}
</tools>
If you choose to call a function ONLY reply in the following format with NO suffix:
<tool_call>
<function=example_function_name>
<parameter=example_parameter_1>
value_1
</parameter>
</function>
</tool_call>
<IMPORTANT>
Reminder:
- Function calls MUST follow the specified format
- Required parameters MUST be specified
- You may provide optional reasoning BEFORE the function call, but NOT after
- If there is no function call available, answer the question like normal
</IMPORTANT>
You are Nani, a crypto wallet assistant.
Keep tool schemas simple — use {"type": "string", "description": "..."} per property. Complex schemas with anyOf, $ref, or nested objects confuse the 2B model.
Supported Tools
Trained on 103 tools from agentek. Top tools by training examples:
| Tool | Examples | Category |
|---|---|---|
| intentSwap | 183 | DEX trading |
| intentTransfer | 157 | Token transfers |
| resolveENS | 71 | ENS resolution |
| getBalance | 64 | Balance queries |
| getBalanceOf | 59 | ERC20 balances |
| resolveWNS | 58 | WNS resolution |
| getCryptoPrice | 55 | Price data |
| lookupENS | 54 | Reverse ENS |
| getQuote | 53 | Swap quotes |
| getFearAndGreedIndex | 50 | Market sentiment |
Full coverage includes: ENS/WNS, ERC20, Uniswap V3, Aave, bridging (Across), security (ScamSniffer), blockscout explorer, gas estimation, DeFillama yields, NFTs, and more.
Tool Call Parsing
Regex to extract tool calls from model output:
const regex = /<tool_call>\s*<function=(\w+)>([\s\S]*?)<\/function>\s*(?:<\/tool_call>)?/g;
const paramRegex = /<parameter=(\w+)>([\s\S]*?)<\/parameter>/g;
All parameter values are strings — coerce to numbers/booleans based on the tool's schema before execution.
Training Config
model = FastLanguageModel.get_peft_model(
model,
r=16,
lora_alpha=16,
lora_dropout=0.05,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"],
)
# Tokenize with enable_thinking=True (critical)
text = tokenizer.apply_chat_template(messages, tokenize=False,
add_generation_prompt=False, enable_thinking=True)
TrainingArguments(
num_train_epochs=3,
learning_rate=5e-5,
per_device_train_batch_size=1,
gradient_accumulation_steps=16,
lr_scheduler_type="cosine",
warmup_ratio=0.1,
fp16=True,
optim="adamw_8bit",
)
enable_thinking=True is critical. With it disabled (v2), the model regressed -64% in tool accuracy because the tokenizer mangled the <think> blocks present in 97% of training data.
HuggingFace Repos
| Repo | Format | Size |
|---|---|---|
| NaniDAO/nani-qwen-3.5-2B | Merged bf16 | ~5GB |
| NaniDAO/nani-qwen-3.5-2B-gguf-q4km | GGUF Q4_K_M | ~1.3GB |
| NaniDAO/nani-qwen-3.5-2B-mlx-4bit | MLX 4-bit | ~1.2GB |
Vision Support
Qwen3.5-2B is a vision-language model (VLM), and the base weights include a vision encoder. However, this MLX model only contains the text/language model weights. The vision encoder was not included in the conversion.
Known Limitations
- Text-only — vision encoder not included
- 2B model size — sometimes hallucinates tool names or picks wrong parameters. Works best with 5 or fewer tools in the system prompt.
- Simple schemas only — complex JSON schemas with
anyOf,$ref, or nested objects confuse the model. Keep tool definitions flat:{"type": "string", "description": "..."}per property. - Training data imbalance — 78 of 103 tools have fewer than 15 examples. Performance on underrepresented tools is weaker.
Conversion
This model was converted using a standalone converter that requires only numpy + safetensors — no mlx or macOS needed. You can run it on any Linux/Windows machine.
Local Testing
See nani-local for a Vite + React test app that connects to Ollama and executes real agentek tools with streaming, tool call visualization, and regex-based tool filtering (5 tools per message).
- Downloads last month
- 89
4-bit
Model tree for NaniDAO/nani-qwen-3.5-2B-mlx-4bit
Evaluation results
- Tool Call Accuracyself-reported96.000
- Correct Functionself-reported81.600
- Correct Paramsself-reported71.400
- Format Validself-reported100.000
- No-Tool Correctself-reported100.000