Medical Document Understanding β€” GGUF

GGUF-quantised versions of Sebukpor/medical-document-understanding-v2 for CPU inference via llama.cpp.

Optimised for HuggingFace free-tier CPU spaces (2 vCPU, 16 GB RAM).

Available Quants

File Size Use case
model-Q4_K_M.gguf ~2.5 GB βœ… Recommended β€” best quality/size for deployment
model-Q8_0.gguf ~4.5 GB Near-lossless, slower
model-F16.gguf ~8.5 GB Reference, too large for free tier

Quick Start (llama-cpp-python)

from llama_cpp import Llama
from llama_cpp.llama_chat_format import Qwen2VLChatHandler

chat_handler = Qwen2VLChatHandler(
    clip_model_path="mmproj-model-f16.gguf"   # vision encoder (see below)
)

llm = Llama(
    model_path="model-Q4_K_M.gguf",
    chat_handler=chat_handler,
    n_ctx=2048,
    n_threads=2,          # HF free tier has 2 vCPU
    verbose=False,
)

import base64
with open("opd_form.jpg", "rb") as f:
    img_b64 = base64.b64encode(f.read()).decode()

response = llm.create_chat_completion(
    messages=[
        {
            "role": "system",
            "content": "You are an expert Medical Transcription AI. Extract all information into structured JSON."
        },
        {
            "role": "user",
            "content": [
                {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{img_b64}"}},
                {"type": "text", "text": "Extract all information from this medical OPD form into structured JSON."}
            ]
        }
    ],
    max_tokens=1024,
    temperature=0.0,
)
print(response["choices"][0]["message"]["content"])

HF Space Deployment

See the app.py included in this repo for a ready-to-deploy Gradio app.

Base Model

Fine-tuned from Qwen/Qwen3.5-4B on handwritten Indian medical OPD forms.

Downloads last month
188
GGUF
Model size
4B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Spaces using Sebukpor/medical-document-understanding-gguf-v2 3