Qwen3-8B-medical GGUF
This is the GGUF quantized version of towardsinnovationlab/Qwen3-8B-medical for use with llama.cpp and compatible tools.
Model Description
This is a merged and quantized version of Qwen/Qwen3-8B fine-tuned via a LoRA adapter (towardsinnovationlab/Qwen3-8B-medical) using the Unsloth library, converted to GGUF format for faster inference. The process was: Base model (Qwen/Qwen3-8B) + LoRA adapter โ merged โ converted to GGUF.
The adapter was trained on:
- OpenMed/Medical-Reasoning-SFT-GPT-OSS-120B - for enhancing medical reasoning skills
Model Details
- Developed by: Claudio Giorgio Giancaterino
- LoRA Adapter: towardsinnovationlab/Qwen3-8B-medical
- Base Model: unsloth/Qwen3-8B
- Language: English
- License: Apache 2.0
- Format: GGUF (Q8_0 quantization)
Available Files
| Filename | Quantization | Size | Description |
|---|---|---|---|
Qwen3-8B-medical-gguf |
Q8_0 | <9 GB | Best quality, recommended |
Uses
Direct Use
This model can be used as support in healthcare applications, medical research, and clinical text generation.
Downstream Use
It can be integrated into educational chatbots for medical reasoning conversations.
Out-of-Scope Use
It is not suitable for high-level decision-making.
Bias, Risks, and Limitations
- Conversational quality may degrade with complex or multi-turn inputs.
- The model may give inaccurate answers and should be referred to a professional.
- This is not a substitute for professional medical advice.
How to Use
With llama-cpp-python
from llama_cpp import Llama
llm = Llama(
model_path="Qwen3-8B-medical-gguf",
n_ctx=2048,
n_gpu_layers=-1, # -1 for GPU, 0 for CPU only
)
response = llm.create_chat_completion(
messages=[
{"role": "system", "content": "You are a helpful medical assistant."},
{"role": "user", "content": "What are the main symptoms of diabetes?"}
],
max_tokens=512,
temperature=0.2,
)
print(response['choices'][0]['message']['content'])
With llama.cpp CLI
llama-cli -m Qwen3-8B-medical-gguf -p "What are the symptoms of diabetes?" -n 256
With Ollama
# Create a Modelfile
echo 'FROM ./Qwen3-8B-medical-gguf' > Modelfile
ollama create Qwen3-8B-medical -f Modelfile
ollama run Qwen3-8B-medical
Download with huggingface-hub
from huggingface_hub import hf_hub_download
model_path = hf_hub_download(
repo_id="towardsinnovationlab/Qwen3-8B-medical-gguf",
filename="Qwen3-8B-medical-gguf"
)
Training Details
See the original model card: towardsinnovationlab/Qwen3-8B-medical-gguf
Training Data
- OpenMed/Medical-Reasoning-SFT-GPT-OSS-120B with 200,193 synthetic medical conversations.
Training Procedure
- Colab with NVIDIA A100 GPU
- LoRA fine-tuning with Unsloth
- Loss Value: 1.4693
Conversion Details
This GGUF was created by:
- Merging the LoRA adapter with the base model (unsloth/Qwen3-8B)
- Converting to GGUF format using llama.cpp's
convert_hf_to_gguf.py - Quantizing to Q8_0 for optimal quality/size balance
- Downloads last month
- 94
We're not able to determine the quantization variants.