🤖 Model Card for Qwen3-MOE-4x0.6B-2.4B-reasoning-v1-full-GGUF

This repo is packed with multiple quantized versions of leeminwaan/Qwen3-MOE-4x0.6B-2.4B-reasoning-v1-full in GGUF format. 🚀✨
Built for running efficiently on your everyday hardware - no need for enterprise-level specs to deploy these models. 💻🎯🔥

📋 Model Details

⚡ Quantization Results

Quantization	Size (vs. FP16)	Speed	Quality	Recommended For
Q2_K	Tiny 🐭	Lightning ⚡	Basic 📉	Quick prototypes, potato hardware 🧪
Q3_K_S	Mini 🐹	Super fast 🚀	Decent 📊	Mobile devices, quick tests 📱
Q3_K_M	Small 🐰	Fast 💨	Good 📈	Lightweight but better quality
Q3_K_L	Small+ 🐱	Fast ⚡	Good 📊	Speed with acceptable quality
Q4_0	Medium 🐺	Quick ⚡	Solid 👍	Daily driver, casual chats 💬
Q4_1	Medium 🦊	Quick 🚀	Solid+ 👌	Slight upgrade from Q4_0
Q4_K_S	Medium 🐻	Quick 💨	Nice ✨	Well-balanced choice ⚖️
Q4_K_M	Medium 🦁	Quick ⚡	Really nice 🌟	The crowd favorite 🏅
Q5_0	Chunky 🐘	Chill 🚶	Great 💪	Chatbots that actually make sense 🤖
Q5_1	Chunky 🦏	Chill ⏳	Great+ 🔥	When you need quality responses 💼
Q5_K_S	Big 🐳	Chill 🕐	Great+ ⭐	For the quality-conscious 🎯
Q5_K_M	Big 🦣	Chill ⌛	Excellent 🏆	High-end performance 💎
Q6_K	Massive 🐋	Slow 🐌	Near perfect 👑	Enthusiasts only
Q8_0	Absolute unit 🦕	Turtle 🐢	Basically perfect 💎	Max settings gang 🖥️

📝 Real talk:

Lower numbers = smaller files 📉, runs faster ⚡, but quality takes a hit 📊

Q4_K_M hits different - it's the sweet spot most people actually want 👥

Q6_K/Q8_0 are for perfectionists with beefy hardware 🏆🧙‍♂️

Everything here runs on regular consumer hardware 💻 - pick what matches your vibe! 🎯

📝 Model Description

Quantized by: leeminwaan 👨‍💻
Funded by [optional]: Solo project, no corporate backing 💰
Shared by [optional]: leeminwaan 🤝
Model type: Decoder-only transformer (the good stuff) 🧠🤖
Language(s) (NLP): Base on Qwen3-MOE-4x0.6B-2.4B-reasoning-v1-full
License: Apache-2.0 (free to use, modify, distribute) 📄⚖️

🔗 Model Sources

Repository: Hugging Face Repo 🤗📦
Quantization Tool: AllQuants 🔢⚡
Paper [optional]: No research paper (this is practical, not academic) 📝❌
Demo [optional]: Demo coming soon™ 🎮🔜

🚀 How to Get Started with the Model

# 🐍 Quick start - literally just this:
from huggingface_hub import hf_hub_download

# 📥 Grab the model (Q4_K_M is the sweet spot for most people)
model_path = hf_hub_download("leeminwaan/Qwen3-MOE-4x0.6B-2.4B-reasoning-v1-full-GGUF", "Qwen3-MOE-4x0.6B-2.4B-reasoning-v1-full-q4_k_m.gguf")
print("Downloaded:", model_path) # 🎊 You're good to go!

Available flavors: 🎁📦

Q2_K, Q3_K_S, Q3_K_M, Q3_K_L 🏃‍♂️💨 (Speed demons - perfect for testing)
Q4_0, Q4_1, Q4_K_S, Q4_K_M ⚖️✨ (The goldilocks zone - just right)
Q5_0, Q5_1, Q5_K_S, Q5_K_M 💪🎯 (For when you need that extra quality)
Q6_K, Q8_0 🏆👑 (Maxed out settings - if your hardware can handle it)

🎯 Training Details

📊 Training Data

This is a straight quantization - no extra training or fine-tuning involved. ✨

⚙️ Training Procedure

Took leeminwaan/Qwen3-MOE-4x0.6B-2.4B-reasoning-v1-full and compressed it into these GGUF formats. 🔄

🔧 Technical Specifications

💾 Software

llama.cpp for the heavy lifting 🦙
Python 3.10 + huggingface_hub for the workflow 🐍

📚 Citation

BibTeX: 📖🔬

@miscQwen3-MOE-4x0.6B-2.4B-reasoning-v1-full-GGUF,
  title=Qwen3-MOE-4x0.6B-2.4B-reasoning-v1-full-GGUF Quantized Models},
  author={leeminwaan},
  year={2025}, % 🎊 Hot off the press!
  howpublished={\url{https://huggingface.co/leeminwaan/Qwen3-MOE-4x0.6B-2.4B-reasoning-v1-full-GGUF}}
}

APA: 📝✨

leeminwaan. (2025). Qwen3-MOE-4x0.6B-2.4B-reasoning-v1-full-GGUF Quantized Models [Computer software]. 💻 Hugging Face. https://huggingface.co/leeminwaan/Qwen3-MOE-4x0.6B-2.4B-reasoning-v1-full-GGUF 🤗

📖 Glossary

Quantization: Making models smaller by reducing number precision - trades some quality for efficiency. 🔢
GGUF: The file format that llama.cpp loves - optimized for fast inference. ⚡

ℹ️ More Information

This is still a work in progress - expect some rough edges. 🧪
More updates and proper benchmarks coming when I get around to it. 📈

👨‍💻 Model Card Authors

leeminwaan 🚀👨‍💻✨

📧 Model Card Contact

Hugging Face: leeminwaan 🤗💌🎉

Downloads last month: 68

GGUF

Model size

1B params

Architecture

qwen3moe

Hardware compatibility

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Model tree for leeminwaan/Qwen3-MOE-4x0.6B-2.4B-reasoning-v1-full-GGUF

Base model

leeminwaan/Qwen3-MOE-4x0.6B-2.4B-reasoning-v1-full

Quantized

(1)

this model