Qwen3-0.6B-QAT-phone/edge Deployment
This repository contains a Quantization-Aware Trained (QAT) version of the Qwen3-0.6B model, optimized for mobile deployment. By leveraging QAT during the fine-tuning process, this model maintains high accuracy and reasoning capabilities even when running in low-precision formats on edge devices. Model Details
Base Model: Qwen3-0.6B (supports reasoning and non-reasoning modes)
Fine-tuning Technique: Quantization-Aware Training (QAT)
Export Format: .pte (PyTorch Edge) for mobile execution
Primary Use Case: High-performance local inference on iOS, Android, and macOS.
Datasets
The model was fine-tuned using a balanced mixture of reasoning and general conversational data:
Open Math Reasoning: We sampled 10% of verifiable reasoning traces generated by DeepSeek R1 that achieved >95% accuracy. This preserves the "Reasoning Mode" of Qwen3.
FineTome-100k: Curated by Maxime Labonne, this dataset was converted from ShareGPT to Hugging Face’s standard multi-turn conversation format to enhance general instruction-following.
Training Process
The model was trained using the Unsloth library to maximize memory efficiency and speed. The QAT process ensures that the transition to 4-bit or 8-bit mobile kernels results in minimal perplexity degradation. Detailed training steps, including dataset formatting and QAT configurations, can be found in this Google Colab Notebook.
Deployment Guide
This model is specifically prepared for deployment on mobile devices and macOS via the ExecuTorch framework.
Steps to Deploy:
Export to .pte: Follow the conversion scripts in the training notebook to generate the mobile-ready binary. On-Device Setup: Refer to the official Unsloth DeploymentDocumentation for instructions on: Deploying on iOS using the ExecuTorch demo app. Deploying on Android using the Java/C++ API. Running locally on macOS with hardware acceleration.
Credits
Training Framework: Unsloth
Data Providers: AI-MO (Open Math Reasoning) and Maxime Labonne (FineTome-100k).
license: mit
- Downloads last month
- -