Model Card for biogpt-bioqa-8bit-openvino
Model Description
This model, biogpt-bioqa-8bit-openvino, is an optimized 8-bit quantized version of the kirubel1738/biogpt-bioqa-lora-merged model, specifically designed for efficient CPU inference using OpenVINO. It inherits the specialized biomedical question-answering capabilities of the original model while providing significantly faster inference speeds and reduced memory footprint.
The model combines the extensive biomedical knowledge of Microsoft's BioGPT-Large with targeted fine-tuning using Low-Rank Adaptation (LoRA) on a comprehensive biomedical QA dataset, then optimizes it through 8-bit quantization and OpenVINO runtime compilation for production-ready CPU deployment.
- Developed by: kirubel1738
- Shared by: kirubel1738
- Model type: Causal Language Model (Text Generation / Question Answering)
- Language(s) (NLP): English
- License: Apache 2.0
- Finetuned from model: microsoft/BioGPT-Large
- Quantized from model: kirubel1738/biogpt-bioqa-lora-merged
Model Sources
- Repository: Hugging Face Model Repository
- Base Model: microsoft/BioGPT-Large
- Quantized From: kirubel1738/biogpt-bioqa-lora-merged
Uses
Direct Use
This model is optimized for direct use in biomedical question-answering applications where CPU inference is preferred or required. It's ideal for:
- Biomedical research assistance and literature review
- Educational tools for biology and medical students
- Clinical decision support systems (with appropriate validation)
- Bioinformatics pipelines requiring efficient text generation
- Resource-constrained environments without GPU access
Downstream Use
The model can be integrated into:
- Healthcare chatbots and virtual assistants
- Scientific literature summarization systems
- Drug discovery and pharmacological research tools
- Academic research platforms
- Biomedical tutoring systems
Out-of-Scope Use
- Real-time clinical diagnosis without human oversight
- Generating medical advice for individual patients
- High-stakes decision making without verification
- Non-biomedical domains
- Tasks requiring extremely low latency (<100ms) on very old CPUs
Bias, Risks, and Limitations
Limitations:
- 8-bit quantization may cause minor accuracy degradation compared to the original FP16 model
- Model may generate verbose or overly technical responses for simple questions
- Limited to biomedical domain knowledge; performance on general topics is reduced
Risks:
- Potential for generating plausible but incorrect biomedical information
- May reflect biases present in the training data
- Should not be used as a sole source for medical decisions
Recommendations
Users should:
- Verify critical biomedical information from authoritative sources
- Use appropriate prompts and temperature settings for desired response style
- Consider the quantized nature when evaluating response quality
- Test performance on specific use cases before deployment
How to Get Started with the Model
Installation
pip install "optimum[openvino]" transformers
from optimum.intel import OVModelForCausalLM
from transformers import AutoTokenizer
import time
MODEL_ID = "kirubel1738/biogpt-bioqa-8bit-openvino"
# Load with lean configuration for minimal memory usage
lean_config = {
"PERFORMANCE_HINT": "LATENCY",
"ENABLE_MMAP": "YES",
"CACHE_DIR": "",
}
model = OVModelForCausalLM.from_pretrained(
MODEL_ID,
ov_config=lean_config,
use_cache=True
)
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
# Generate response
question = "What is the function of the p53 gene?"
inputs = tokenizer(question, return_tensors="pt")
start_time = time.time()
outputs = model.generate(**inputs, max_new_tokens=100)
end_time = time.time()
answer = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(f"Q: {question}")
print(f"A: {answer[len(question):].strip()}")
print(f"⏱️ Time taken: {end_time - start_time:.2f} seconds")
- Downloads last month
- 17
Model tree for kirubel1738/biogpt-bioqa-8bit-openvino
Base model
microsoft/BioGPT-Large