🤖 Model Card for Qwen3-MOE-4x0.6B-2.4B-reasoning-v1-full-GGUF

This repo is packed with multiple quantized versions of leeminwaan/Qwen3-MOE-4x0.6B-2.4B-reasoning-v1-full in GGUF format. 🚀✨
Built for running efficiently on your everyday hardware - no need for enterprise-level specs to deploy these models. 💻🎯🔥

📋 Model Details

⚡ Quantization Results

Quantization Size (vs. FP16) Speed Quality Recommended For
Q2_K Tiny 🐭 Lightning ⚡ Basic 📉 Quick prototypes, potato hardware 🧪
Q3_K_S Mini 🐹 Super fast 🚀 Decent 📊 Mobile devices, quick tests 📱
Q3_K_M Small 🐰 Fast 💨 Good 📈 Lightweight but better quality
Q3_K_L Small+ 🐱 Fast ⚡ Good 📊 Speed with acceptable quality
Q4_0 Medium 🐺 Quick ⚡ Solid 👍 Daily driver, casual chats 💬
Q4_1 Medium 🦊 Quick 🚀 Solid+ 👌 Slight upgrade from Q4_0
Q4_K_S Medium 🐻 Quick 💨 Nice ✨ Well-balanced choice ⚖️
Q4_K_M Medium 🦁 Quick ⚡ Really nice 🌟 The crowd favorite 🏅
Q5_0 Chunky 🐘 Chill 🚶 Great 💪 Chatbots that actually make sense 🤖
Q5_1 Chunky 🦏 Chill ⏳ Great+ 🔥 When you need quality responses 💼
Q5_K_S Big 🐳 Chill 🕐 Great+ ⭐ For the quality-conscious 🎯
Q5_K_M Big 🦣 Chill ⌛ Excellent 🏆 High-end performance 💎
Q6_K Massive 🐋 Slow 🐌 Near perfect 👑 Enthusiasts only
Q8_0 Absolute unit 🦕 Turtle 🐢 Basically perfect 💎 Max settings gang 🖥️

📝 Real talk:

  • Lower numbers = smaller files 📉, runs faster ⚡, but quality takes a hit 📊
  • Q4_K_M hits different - it's the sweet spot most people actually want 👥
  • Q6_K/Q8_0 are for perfectionists with beefy hardware 🏆🧙‍♂️
  • Everything here runs on regular consumer hardware 💻 - pick what matches your vibe! 🎯

📝 Model Description

  • Quantized by: leeminwaan 👨‍💻
  • Funded by [optional]: Solo project, no corporate backing 💰
  • Shared by [optional]: leeminwaan 🤝
  • Model type: Decoder-only transformer (the good stuff) 🧠🤖
  • Language(s) (NLP): Base on Qwen3-MOE-4x0.6B-2.4B-reasoning-v1-full
  • License: Apache-2.0 (free to use, modify, distribute) 📄⚖️

🔗 Model Sources

  • Repository: Hugging Face Repo 🤗📦
  • Quantization Tool: AllQuants 🔢⚡
  • Paper [optional]: No research paper (this is practical, not academic) 📝❌
  • Demo [optional]: Demo coming soon™ 🎮🔜

🚀 How to Get Started with the Model

# 🐍 Quick start - literally just this:
from huggingface_hub import hf_hub_download

# 📥 Grab the model (Q4_K_M is the sweet spot for most people)
model_path = hf_hub_download("leeminwaan/Qwen3-MOE-4x0.6B-2.4B-reasoning-v1-full-GGUF", "Qwen3-MOE-4x0.6B-2.4B-reasoning-v1-full-q4_k_m.gguf")
print("Downloaded:", model_path) # 🎊 You're good to go!

Available flavors: 🎁📦

  • Q2_K, Q3_K_S, Q3_K_M, Q3_K_L 🏃‍♂️💨 (Speed demons - perfect for testing)
  • Q4_0, Q4_1, Q4_K_S, Q4_K_M ⚖️✨ (The goldilocks zone - just right)
  • Q5_0, Q5_1, Q5_K_S, Q5_K_M 💪🎯 (For when you need that extra quality)
  • Q6_K, Q8_0 🏆👑 (Maxed out settings - if your hardware can handle it)

🎯 Training Details

📊 Training Data

  • This is a straight quantization - no extra training or fine-tuning involved. ✨

⚙️ Training Procedure

  • Took leeminwaan/Qwen3-MOE-4x0.6B-2.4B-reasoning-v1-full and compressed it into these GGUF formats. 🔄

🔧 Technical Specifications

💾 Software

  • llama.cpp for the heavy lifting 🦙
  • Python 3.10 + huggingface_hub for the workflow 🐍

📚 Citation

BibTeX: 📖🔬

@miscQwen3-MOE-4x0.6B-2.4B-reasoning-v1-full-GGUF,
  title=Qwen3-MOE-4x0.6B-2.4B-reasoning-v1-full-GGUF Quantized Models},
  author={leeminwaan},
  year={2025}, % 🎊 Hot off the press!
  howpublished={\url{https://huggingface.co/leeminwaan/Qwen3-MOE-4x0.6B-2.4B-reasoning-v1-full-GGUF}}
}

APA: 📝✨

leeminwaan. (2025). Qwen3-MOE-4x0.6B-2.4B-reasoning-v1-full-GGUF Quantized Models [Computer software]. 💻 Hugging Face. https://huggingface.co/leeminwaan/Qwen3-MOE-4x0.6B-2.4B-reasoning-v1-full-GGUF 🤗

📖 Glossary

  • Quantization: Making models smaller by reducing number precision - trades some quality for efficiency. 🔢
  • GGUF: The file format that llama.cpp loves - optimized for fast inference. ⚡

ℹ️ More Information

  • This is still a work in progress - expect some rough edges. 🧪
  • More updates and proper benchmarks coming when I get around to it. 📈

👨‍💻 Model Card Authors

  • leeminwaan 🚀👨‍💻✨

📧 Model Card Contact

Downloads last month
68
GGUF
Model size
1B params
Architecture
qwen3moe
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for leeminwaan/Qwen3-MOE-4x0.6B-2.4B-reasoning-v1-full-GGUF

Quantized
(1)
this model