Qwen3-8B-medical GGUF

This is the GGUF quantized version of towardsinnovationlab/Qwen3-8B-medical for use with llama.cpp and compatible tools.

Model Description

This is a merged and quantized version of Qwen/Qwen3-8B fine-tuned via a LoRA adapter (towardsinnovationlab/Qwen3-8B-medical) using the Unsloth library, converted to GGUF format for faster inference. The process was: Base model (Qwen/Qwen3-8B) + LoRA adapter → merged → converted to GGUF.

The adapter was trained on:

OpenMed/Medical-Reasoning-SFT-GPT-OSS-120B - for enhancing medical reasoning skills

Model Details

Developed by: Claudio Giorgio Giancaterino
LoRA Adapter: towardsinnovationlab/Qwen3-8B-medical
Base Model: unsloth/Qwen3-8B
Language: English
License: Apache 2.0
Format: GGUF (Q8_0 quantization)

Available Files

Filename	Quantization	Size	Description
`Qwen3-8B-medical-gguf`	Q8_0	<9 GB	Best quality, recommended

Uses

Direct Use

This model can be used as support in healthcare applications, medical research, and clinical text generation.

Downstream Use

It can be integrated into educational chatbots for medical reasoning conversations.

Out-of-Scope Use

It is not suitable for high-level decision-making.

Bias, Risks, and Limitations

Conversational quality may degrade with complex or multi-turn inputs.
The model may give inaccurate answers and should be referred to a professional.
This is not a substitute for professional medical advice.

How to Use

With llama-cpp-python

from llama_cpp import Llama

llm = Llama(
    model_path="Qwen3-8B-medical-gguf",
    n_ctx=2048,
    n_gpu_layers=-1,  # -1 for GPU, 0 for CPU only
)

response = llm.create_chat_completion(
    messages=[
        {"role": "system", "content": "You are a helpful medical assistant."},
        {"role": "user", "content": "What are the main symptoms of diabetes?"}
    ],
    max_tokens=512,
    temperature=0.2,
)

print(response['choices'][0]['message']['content'])

With llama.cpp CLI

llama-cli -m Qwen3-8B-medical-gguf -p "What are the symptoms of diabetes?" -n 256

With Ollama

# Create a Modelfile
echo 'FROM ./Qwen3-8B-medical-gguf' > Modelfile
ollama create Qwen3-8B-medical -f Modelfile
ollama run Qwen3-8B-medical

Download with huggingface-hub

from huggingface_hub import hf_hub_download

model_path = hf_hub_download(
    repo_id="towardsinnovationlab/Qwen3-8B-medical-gguf",
    filename="Qwen3-8B-medical-gguf"
)