Qwen2.5-7B-Instruct GGUF

Quantized GGUF versions of Qwen/Qwen2.5-7B-Instruct for local inference.

Available Quantizations

File	Size	Quality	Use Case
`qwen2.5-7b-instruct-Q4_K_M.gguf`	~4.4GB	⭐⭐⭐⭐	Best balance — recommended
`qwen2.5-7b-instruct-Q5_K_M.gguf`	~5.1GB	⭐⭐⭐⭐⭐	Higher quality, needs more RAM
`qwen2.5-7b-instruct-Q8_0.gguf`	~7.7GB	⭐⭐⭐⭐⭐	Near-lossless, needs 10GB+ RAM

Usage

Via Ollama (Easiest)

ollama run qwen2.5:7b

Via llama.cpp

./llama-cli -m qwen2.5-7b-instruct-Q4_K_M.gguf -p "Your prompt here" -n 512

Via Python (llama-cpp-python)

from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="paijo77/qwen2.5-7b-GGUF",
    filename="qwen2.5-7b-instruct-Q4_K_M.gguf",
    n_ctx=8192,
    n_gpu_layers=-1  # use GPU if available
)

response = llm.create_chat_completion(
    messages=[{"role": "user", "content": "Explain quantum computing simply"}]
)
print(response["choices"][0]["message"]["content"])

Via Open WebUI

Download the GGUF file
In Open WebUI → Models → Add model
Point to local GGUF file

Why Qwen2.5-7B?

Multilingual: English, Chinese, 29+ languages
Long context: 128K tokens natively
Coding: Excellent code generation
Math: Strong mathematical reasoning
Instruction following: Clean, structured outputs
Size: Runs on 6GB VRAM or 8GB RAM (CPU)

Hardware Requirements

Quantization	Min RAM	Min VRAM	Speed (CPU)
Q4_K_M	6GB	5GB	~15 tok/s
Q5_K_M	8GB	6GB	~12 tok/s
Q8_0	10GB	8GB	~8 tok/s

Support This Project

Quantization takes compute and time. If this helps you: 👉 https://www.tip.md/oyi77

License

Apache 2.0 — based on Qwen2.5 (Apache 2.0)

Downloads last month: 48

GGUF

Model size

8B params

Architecture

qwen2

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for paijo77/qwen2.5-7b-GGUF

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Quantized

(291)

this model