Malaysian-Qwen2.5-7B-Instruct-Q4_K_M-GGUF

This is a GGUF quantized version of mesolitica/Malaysian-Qwen2.5-7B-Instruct, converted for local inference without GPU requirements.

Model Details

Base Model: mesolitica/Malaysian-Qwen2.5-7B-Instruct
Quantization: Q4_K_M
Format: GGUF (llama.cpp compatible)
File Size: ~4-5GB
Language Focus: Malaysian/Malay language
Use Case: Local inference, experimentation, CPU-only environments

Files

Malaysian-Qwen2.5-7B-Instruct-Q4_K_M.gguf - Quantized model file

Usage with Ollama

Install Ollama: Download from ollama.ai
Create a Modelfile:

FROM ./Malaysian-Qwen2.5-7B-Instruct-Q4_K_M.gguf

PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER top_k 40

Create the model:

ollama create malaysian-qwen-7b -f Modelfile

Run the model:

ollama run malaysian-qwen-7b

Usage with llama.cpp

./main -m Malaysian-Qwen2.5-7B-Instruct-Q4_K_M.gguf -p "Your prompt here" -n 512

Quantization Info

Q4_K_M offers an optimal balance between model size and quality, suitable for most use cases on consumer hardware.

Requirements

RAM: ~6-8GB minimum for smooth inference
Disk Space: ~5GB

License

This model inherits the license from the original mesolitica/Malaysian-Qwen2.5-7B-Instruct model. Please refer to the original model card for license details.

Credits

Original Model: Mesolitica
Quantization: Converted using llama.cpp

Citation

If you use this model, please cite the original model:

@misc{malaysian-qwen2.5-7b-instruct,
  author = {Mesolitica},
  title = {Malaysian-Qwen2.5-7B-Instruct},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/mesolitica/Malaysian-Qwen2.5-7B-Instruct}
}

Downloads last month: 8

GGUF

Model size

8B params

Architecture

qwen2

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for fahmifauzi/Malaysian-Qwen2.5-7B-Instruct-Q4_K_M.gguf

Base model

mesolitica/Malaysian-Qwen2.5-7B-Instruct

Quantized

(3)

this model