Qwen3-8B-medical GGUF

This is the GGUF quantized version of towardsinnovationlab/Qwen3-8B-medical for use with llama.cpp and compatible tools.

Model Description

This is a merged and quantized version of Qwen/Qwen3-8B fine-tuned via a LoRA adapter (towardsinnovationlab/Qwen3-8B-medical) using the Unsloth library, converted to GGUF format for faster inference. The process was: Base model (Qwen/Qwen3-8B) + LoRA adapter โ†’ merged โ†’ converted to GGUF.

The adapter was trained on:

Model Details

  • Developed by: Claudio Giorgio Giancaterino
  • LoRA Adapter: towardsinnovationlab/Qwen3-8B-medical
  • Base Model: unsloth/Qwen3-8B
  • Language: English
  • License: Apache 2.0
  • Format: GGUF (Q8_0 quantization)

Available Files

Filename Quantization Size Description
Qwen3-8B-medical-gguf Q8_0 <9 GB Best quality, recommended

Uses

Direct Use

This model can be used as support in healthcare applications, medical research, and clinical text generation.

Downstream Use

It can be integrated into educational chatbots for medical reasoning conversations.

Out-of-Scope Use

It is not suitable for high-level decision-making.

Bias, Risks, and Limitations

  • Conversational quality may degrade with complex or multi-turn inputs.
  • The model may give inaccurate answers and should be referred to a professional.
  • This is not a substitute for professional medical advice.

How to Use

With llama-cpp-python

from llama_cpp import Llama

llm = Llama(
    model_path="Qwen3-8B-medical-gguf",
    n_ctx=2048,
    n_gpu_layers=-1,  # -1 for GPU, 0 for CPU only
)

response = llm.create_chat_completion(
    messages=[
        {"role": "system", "content": "You are a helpful medical assistant."},
        {"role": "user", "content": "What are the main symptoms of diabetes?"}
    ],
    max_tokens=512,
    temperature=0.2,
)

print(response['choices'][0]['message']['content'])

With llama.cpp CLI

llama-cli -m Qwen3-8B-medical-gguf -p "What are the symptoms of diabetes?" -n 256

With Ollama

# Create a Modelfile
echo 'FROM ./Qwen3-8B-medical-gguf' > Modelfile
ollama create Qwen3-8B-medical -f Modelfile
ollama run Qwen3-8B-medical

Download with huggingface-hub

from huggingface_hub import hf_hub_download

model_path = hf_hub_download(
    repo_id="towardsinnovationlab/Qwen3-8B-medical-gguf",
    filename="Qwen3-8B-medical-gguf"
)

Training Details

See the original model card: towardsinnovationlab/Qwen3-8B-medical-gguf

Training Data

  • OpenMed/Medical-Reasoning-SFT-GPT-OSS-120B with 200,193 synthetic medical conversations.

Training Procedure

  • Colab with NVIDIA A100 GPU
  • LoRA fine-tuning with Unsloth
  • Loss Value: 1.4693

Conversion Details

This GGUF was created by:

  1. Merging the LoRA adapter with the base model (unsloth/Qwen3-8B)
  2. Converting to GGUF format using llama.cpp's convert_hf_to_gguf.py
  3. Quantizing to Q8_0 for optimal quality/size balance
Downloads last month
94
GGUF
Model size
8B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for towardsinnovationlab/Qwen3-8B-medical-gguf

Finetuned
Qwen/Qwen3-8B
Finetuned
unsloth/Qwen3-8B
Quantized
(1)
this model