Uploaded model

License: apache-2.0
Finetuned from model : unsloth/meta-llama-3.1-8b-instruct-bnb-4bit

This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.

BioMedical Question Answering Model

Usage

from vllm import LLM, SamplingParams
from vllm.lora.request import LoRARequest

# Initialize vLLM Engine with LoRA support
model_path = "unsloth/meta-llama-3.1-8b-instruct-bnb-4bit"
lora_path_answ = "sag-uniroma2/llama3.1_adapter_biorag_answer_generation"
lora_adapter_id = 2

llm = LLM(
    model=model_path,
    enable_lora=True,
    max_loras=2,  # Support multiple LoRA adapters
    max_lora_rank=64,
    gpu_memory_utilization=0.75,
    trust_remote_code=True,
    disable_custom_all_reduce=True,
    enforce_eager=True
)

# Setup LoRA request for answer generation
lora_request_answ = LoRARequest(
    lora_name=str(lora_adapter_id),
    lora_int_id=lora_adapter_id,
    lora_path=lora_path_answ
)

# Define sampling parameters
sampling_params = SamplingParams(temperature=0.0, max_tokens=256)

# Define instruction
instruction = """You are a biomedical expert. Your task is to generate a concise, well-structured summary answering the given question. 
Base your response on the provided PubMed abstracts, focusing on the text marked with [BS] and [ES].

Rules:
- Use only the information provided with a special focus on the marked information.
- The summary must be ≤200 words.
- Do NOT include personal opinions, speculations, or unrelated information.
- Maintain a neutral and scientific tone."""

def answer_from_snippets(query_text: str, documents: list):
    """
    Generates an answer from biomedical documents with context windowing.
    Processes up to the first 3 documents and surrounds snippets with context.
    
    Args:
        query_text: The biomedical question
        documents: List of documents with extracted snippets [BS] and [ES]
    
    Returns:
        Generated answer (≤200 words)
    """
    # Take only the first 3 valid documents
    top_3_docs = documents[:3]
    
    doc_blocks = []
    
    for doc in top_3_docs:
        # Combine title and abstract into full text
        title = doc["title"][0] if doc.get("title") and doc["title"] else ""
        abstract = doc["text"][0] if doc.get("text") and doc["text"] else ""
        full_text = f"{title} {abstract}".strip()
        
        # Collect all snippets from this document
        all_snippets = doc.get("snippets_title", []) + doc.get("snippets_abstract", [])
        
        formatted_snippets = []
        
        for snippet in all_snippets:
            # Find snippet position in original text
            start_idx = full_text.find(snippet)
            
            if start_idx != -1:
                end_idx = start_idx + len(snippet)
                
                # Split surrounding text into words
                words_before = full_text[:start_idx].split()
                words_after = full_text[end_idx:].split()
                
                # Get context: max 20 words before, max 10 words after
                context_before = " ".join(words_before[-20:]) if words_before else ""
                context_after = " ".join(words_after[:10]) if words_after else ""
                
                # Format with context window
                block = f"... abstract truncated here... {context_before} [BS] {snippet} [ES] {context_after} ... abstract truncated here..."
                block_clean = " ".join(block.split())
                formatted_snippets.append(block_clean)
        
        if formatted_snippets:
            doc_blocks.append("\n".join(formatted_snippets))

    # Combine all document blocks
    resources_text = "\n".join(doc_blocks)
    
    # Build final prompt
    prompt = f"{instruction}\n\n# Question: {query_text}\n# PubMed resources:\n{resources_text}\n# Answer:"
    
    # Generate answer using LoRA adapter for answer generation
    output = llm.generate(
        [prompt],
        sampling_params,
        lora_request=lora_request_answ,
        use_tqdm=False
    )
    
    # Parse generated answer
    generated_text = output[0].outputs[0].text
    answer = generated_text.split("<|eot_id|>")[0].strip()
    
    return answer

# Example usage
question = "YOUR_BIOMEDICAL_QUESTION_HERE"
documents = [
    {
        "id": "PUBMED_ID_1",
        "title": ["Article Title"],
        "text": ["Article abstract..."],
        "snippets_title": ["relevant snippet from title"],
        "snippets_abstract": ["relevant snippet from abstract"]
    },
    # ... more documents ...
]

final_answer = answer_from_snippets(question, documents)
print(f"Generated Answer: {final_answer}")

Description

The Question Answering Module is a fine-tuned language model powered by vLLM inference engine, designed to generate accurate, evidence-based answers to biomedical questions. The model uses parameter-efficient LoRA (Low-Rank Adaptation) fine-tuning on the Llama-3.1-8B-Instruct base model, specialized in answer generation from extracted snippets.

Model Details

Base Model: Llama-3.1-8B-Instruct
Fine-tuning Method: LoRA (Low-Rank Adaptation)
LoRA Rank: 64
Inference Engine: vLLM (for optimized generation)

Key Features

✅ Biomedical Domain Expertise: Fine-tuned on BioASQ biomedical QA dataset
✅ Evidence-Based Answers: Generates responses grounded in provided snippets with [BS] and [ES] markers
✅ Context Windowing: Incorporates surrounding context (20 words before, 10 words after snippets)
✅ Multi-Document Fusion: Synthesizes information from up to 3 PubMed documents
✅ High-Performance Inference: vLLM enables fast batch generation
✅ Accurate Summarization: Produces concise, scientifically accurate answers (≤200 words)

Performance

Tested on BioASQ 13B Phase A+ and B test set
Optimized for factual accuracy in biomedical question answering
Produces well-structured, evidence-grounded answers
Suitable for clinical decision support and literature-based QA systems

Use Cases

Biomedical Question Answering: Answer complex medical and biological questions
Clinical Decision Support: Generate evidence-based clinical summaries
Literature Summarization: Create concise summaries from multiple literature sources
Medical Education: Support educational applications with accurate biomedical information

Input Format

The function accepts:

query_text (str): A biomedical question requiring an answer

documents (list): Documents with extracted snippets in format:

{
  "id": "PUBMED_ID",
  "title": ["Article Title"],
  "text": ["Article abstract"],
  "snippets_title": ["snippet from title"],
  "snippets_abstract": ["snippet from abstract"]
}

Output Format

Returns a single string:

answer: Evidence-based answer grounded in provided snippets (≤200 words)
Formatted in neutral, scientific tone
Directly addresses the query without speculation

GitHub

For implementation details, training scripts, and integration guides:

GitHub Repository: LocalBioRAG

GitHub Repository: BioASQ2025-UNITOR)

Citation

If you use this model, please cite:

@InProceedings{10.1007/978-3-032-21324-2_31,
author="Borazio, Federico
and Labbate, Francesco
and Croce, Danilo
and Basili, Roberto",
editor="Campos, Ricardo
and Jatowt, Adam
and Lan, Yanyan
and Aliannejadi, Mohammad
and Bauer, Christine
and MacAvaney, Sean
and Anand, Avishek
and Ren, Zhaochun
and Verberne, Suzan
and Bai, Nan
and Mansoury, Masoud",
title="Integrating AI and IR Paradigms for Sustainable and Trustworthy Accurate Access to Large Scale Biomedical Information",
booktitle="Advances in Information Retrieval",
year="2026",
publisher="Springer Nature Switzerland",
address="Cham",
pages="398--412",
isbn="978-3-032-21324-2"
}

@inproceedings{unitor,
    title={{UniTor at BioASQ 2025: Modular Biomedical QA with Synthetic Snippets and Multiple Task Answer Generation}},
    author={Borazio, Federico and Shcherbakov, Andriy and Croce, Danilo and Basili, Roberto},
    year=2025,
    booktitle={CLEF 2025 Working Notes},
    editor= {Faggioli, Guglielmo and  Ferro,  Nicola and  Rosso,  Paolo and  Spina, Damiano}
}

Disclaimer

This model is fine-tuned for biomedical question answering on PubMed literature. While it performs well on BioASQ data, results may vary on other biomedical datasets or clinical settings. The model generates context-aware answers based on provided evidence snippets; quality depends on upstream snippet extraction quality. Always validate generated answers for critical applications in clinical or research settings. For production use, consider the computational requirements: vLLM inference requires adequate GPU memory (recommended ≥24GB for batch processing). The model is designed for informational purposes and should not be used as a substitute for professional medical advice.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including sag-uniroma2/llama3.1_adapter_biorag_answer_generation

LocalBioRAG

Collection

https://github.com/crux82/LocalBioRag • 2 items • Updated 25 days ago