Model Card for biogpt-bioqa-8bit-openvino

Model Description

This model, biogpt-bioqa-8bit-openvino, is an optimized 8-bit quantized version of the kirubel1738/biogpt-bioqa-lora-merged model, specifically designed for efficient CPU inference using OpenVINO. It inherits the specialized biomedical question-answering capabilities of the original model while providing significantly faster inference speeds and reduced memory footprint.

The model combines the extensive biomedical knowledge of Microsoft's BioGPT-Large with targeted fine-tuning using Low-Rank Adaptation (LoRA) on a comprehensive biomedical QA dataset, then optimizes it through 8-bit quantization and OpenVINO runtime compilation for production-ready CPU deployment.

Developed by: kirubel1738
Shared by: kirubel1738
Model type: Causal Language Model (Text Generation / Question Answering)
Language(s) (NLP): English
License: Apache 2.0
Finetuned from model: microsoft/BioGPT-Large
Quantized from model: kirubel1738/biogpt-bioqa-lora-merged

Model Sources

Repository: Hugging Face Model Repository
Base Model: microsoft/BioGPT-Large
Quantized From: kirubel1738/biogpt-bioqa-lora-merged

Uses

Direct Use

This model is optimized for direct use in biomedical question-answering applications where CPU inference is preferred or required. It's ideal for:

Biomedical research assistance and literature review
Educational tools for biology and medical students
Clinical decision support systems (with appropriate validation)
Bioinformatics pipelines requiring efficient text generation
Resource-constrained environments without GPU access

Downstream Use

The model can be integrated into:

Healthcare chatbots and virtual assistants
Scientific literature summarization systems
Drug discovery and pharmacological research tools
Academic research platforms
Biomedical tutoring systems

Out-of-Scope Use

Real-time clinical diagnosis without human oversight
Generating medical advice for individual patients
High-stakes decision making without verification
Non-biomedical domains
Tasks requiring extremely low latency (<100ms) on very old CPUs

Bias, Risks, and Limitations

Limitations:

8-bit quantization may cause minor accuracy degradation compared to the original FP16 model
Model may generate verbose or overly technical responses for simple questions
Limited to biomedical domain knowledge; performance on general topics is reduced

Risks:

Potential for generating plausible but incorrect biomedical information
May reflect biases present in the training data
Should not be used as a sole source for medical decisions

Recommendations

Users should:

Verify critical biomedical information from authoritative sources
Use appropriate prompts and temperature settings for desired response style
Consider the quantized nature when evaluating response quality
Test performance on specific use cases before deployment

How to Get Started with the Model

Installation

pip install "optimum[openvino]" transformers

from optimum.intel import OVModelForCausalLM
from transformers import AutoTokenizer
import time

MODEL_ID = "kirubel1738/biogpt-bioqa-8bit-openvino"

# Load with lean configuration for minimal memory usage
lean_config = {
    "PERFORMANCE_HINT": "LATENCY",
    "ENABLE_MMAP": "YES",
    "CACHE_DIR": "",
}

model = OVModelForCausalLM.from_pretrained(
    MODEL_ID,
    ov_config=lean_config,
    use_cache=True
)
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

# Generate response
question = "What is the function of the p53 gene?"
inputs = tokenizer(question, return_tensors="pt")

start_time = time.time()
outputs = model.generate(**inputs, max_new_tokens=100)
end_time = time.time()

answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"Q: {question}")
print(f"A: {answer[len(question):].strip()}")
print(f"⏱️ Time taken: {end_time - start_time:.2f} seconds")

Downloads last month: 17

Model tree for kirubel1738/biogpt-bioqa-8bit-openvino

Base model

microsoft/BioGPT-Large

Finetuned

(8)

this model

kirubel1738
/

biogpt-bioqa-8bit-openvino