Model Card for Klara-Llama3-8B-v1

Note: This model is part of a private graduation project (PFE). Access to weights and the Inference API is restricted to authorized users only.

Model Details

Model Description

Klara-Llama3-8B-v1 is a sophisticated medical assistant model fine-tuned from Meta's Llama 3 8B. It serves as the intelligent interface for the Klara health monitoring ecosystem, providing expert-level interpretation of physiological sensor data.

Developed by: Koussay Chaanbi
Project Name: Klara
Model type: Fine-tuned Causal Language Model
Fine-tuning Technique: QLoRA (4-bit Quantized LoRA)
Base Model: Meta-Llama-3-8B-Instruct
Persona: A professional and precise medical assistant specialized in real-time health data analysis.

Model Sources

Repository: Klara-Project on Hugging Face
Deployment Target: Desktop/Edge environments using 4-bit quantization.

Uses

Direct Use

Crisis Interpretation: Explaining the severity and nature of medical crises detected by companion sensor models.
Contextual Health Advice: Providing preventative advice based on physiological trends.

Out-of-Scope Use

This model is not a substitute for professional clinical diagnostics or emergency medical services. It is intended for research and demonstration within the Klara project framework.

Bias, Risks, and Limitations

Compute Requirements: Requires significant VRAM or 4-bit quantization (GGUF/EXL2) for efficient inference.
Medical Accuracy: Users must verify all outputs; the model may hallucinate specific clinical values.

How to Get Started with the Model

Note: Access must be requested and approved via the "Gated Access" system.

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
from peft import PeftModel

model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
adapter_id = "Koussay/Klara-Llama3-8B-v1-LoRA"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    load_in_4bit=True,
    device_map="auto",
    torch_dtype=torch.bfloat16
)

model = PeftModel.from_pretrained(model, adapter_id)

messages = [
    {"role": "system", "content": "You are Klara, a professional medical assistant created by Koussay Chaanbi."},
    {"role": "user", "content": "The system detected a sudden drop in SpO2. What are the immediate steps?"}
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
outputs = model.generate(inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Downloads last month: -

Model tree for Kossayart/klara_ai

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2402)

this model