Qwen2.5-7B-Instruct GGUF

Quantized GGUF versions of Qwen/Qwen2.5-7B-Instruct for local inference.

Available Quantizations

File Size Quality Use Case
qwen2.5-7b-instruct-Q4_K_M.gguf ~4.4GB โญโญโญโญ Best balance โ€” recommended
qwen2.5-7b-instruct-Q5_K_M.gguf ~5.1GB โญโญโญโญโญ Higher quality, needs more RAM
qwen2.5-7b-instruct-Q8_0.gguf ~7.7GB โญโญโญโญโญ Near-lossless, needs 10GB+ RAM

Usage

Via Ollama (Easiest)

ollama run qwen2.5:7b

Via llama.cpp

./llama-cli -m qwen2.5-7b-instruct-Q4_K_M.gguf -p "Your prompt here" -n 512

Via Python (llama-cpp-python)

from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="paijo77/qwen2.5-7b-GGUF",
    filename="qwen2.5-7b-instruct-Q4_K_M.gguf",
    n_ctx=8192,
    n_gpu_layers=-1  # use GPU if available
)

response = llm.create_chat_completion(
    messages=[{"role": "user", "content": "Explain quantum computing simply"}]
)
print(response["choices"][0]["message"]["content"])

Via Open WebUI

  1. Download the GGUF file
  2. In Open WebUI โ†’ Models โ†’ Add model
  3. Point to local GGUF file

Why Qwen2.5-7B?

  • Multilingual: English, Chinese, 29+ languages
  • Long context: 128K tokens natively
  • Coding: Excellent code generation
  • Math: Strong mathematical reasoning
  • Instruction following: Clean, structured outputs
  • Size: Runs on 6GB VRAM or 8GB RAM (CPU)

Hardware Requirements

Quantization Min RAM Min VRAM Speed (CPU)
Q4_K_M 6GB 5GB ~15 tok/s
Q5_K_M 8GB 6GB ~12 tok/s
Q8_0 10GB 8GB ~8 tok/s

Support This Project

Quantization takes compute and time. If this helps you: ๐Ÿ‘‰ https://www.tip.md/oyi77

License

Apache 2.0 โ€” based on Qwen2.5 (Apache 2.0)

Downloads last month
48
GGUF
Model size
8B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for paijo77/qwen2.5-7b-GGUF

Base model

Qwen/Qwen2.5-7B
Quantized
(291)
this model