Qwen3-0.6B-QAT-phone/edge Deployment

This repository contains a Quantization-Aware Trained (QAT) version of the Qwen3-0.6B model, optimized for mobile deployment. By leveraging QAT during the fine-tuning process, this model maintains high accuracy and reasoning capabilities even when running in low-precision formats on edge devices. Model Details

Base Model: Qwen3-0.6B (supports reasoning and non-reasoning modes)
Fine-tuning Technique: Quantization-Aware Training (QAT)
Export Format: .pte (PyTorch Edge) for mobile execution
Primary Use Case: High-performance local inference on iOS, Android, and macOS.

Datasets

The model was fine-tuned using a balanced mixture of reasoning and general conversational data:

Open Math Reasoning: We sampled 10% of verifiable reasoning traces generated by DeepSeek R1 that achieved >95% accuracy. This preserves the "Reasoning Mode" of Qwen3.
FineTome-100k: Curated by Maxime Labonne, this dataset was converted from ShareGPT to Hugging Face’s standard multi-turn conversation format to enhance general instruction-following.

Training Process

The model was trained using the Unsloth library to maximize memory efficiency and speed. The QAT process ensures that the transition to 4-bit or 8-bit mobile kernels results in minimal perplexity degradation. Detailed training steps, including dataset formatting and QAT configurations, can be found in this Google Colab Notebook.

Deployment Guide

This model is specifically prepared for deployment on mobile devices and macOS via the ExecuTorch framework.

Steps to Deploy:

Export to .pte: Follow the conversion scripts in the training notebook to generate the mobile-ready binary. On-Device Setup: Refer to the official Unsloth DeploymentDocumentation for instructions on: Deploying on iOS using the ExecuTorch demo app. Deploying on Android using the Java/C++ API. Running locally on macOS with hardware acceleration.

Credits

Training Framework: Unsloth
Data Providers: AI-MO (Open Math Reasoning) and Maxime Labonne (FineTome-100k).

license: mit

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kalai4390/Qwen3_0.6B_QAT_phonedeployment

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-0.6B

Finetuned

(802)

this model