File size: 3,022 Bytes

0975c5d
 
 
31b7248
 
0975c5d
 
 
31b7248
0975c5d
31b7248
0975c5d
 
8a13cf9
0975c5d
 
31b7248
0975c5d
8a13cf9
0975c5d
 
 
 
 
8a13cf9
0975c5d
 
31b7248
0975c5d
31b7248
 
8a13cf9
0975c5d
 
 
31b7248
8a13cf9
0975c5d
 
 
 
 
8a13cf9
 
0975c5d
 
 
8a13cf9
0975c5d
 
 
8a13cf9
 
0975c5d
 
 
8a13cf9
0975c5d
31b7248
 
 
8a13cf9
31b7248

---
language:
- en
license: llama3
base_model: meta-llama/Meta-Llama-3-8B-Instruct
tags:
- medical-assistant
- lora
- qlora
- health-tech
- medical-llm
library_name: peft
pipeline_tag: text-generation
inference: false # تم إضافة هذا السطر لإغلاق الـ API ومنع استهلاك التوكنز
---

# Model Card for Klara-Llama3-8B-v1

**Note:** This model is part of a private graduation project (PFE). Access to weights and the Inference API is restricted to authorized users only.

## Model Details

### Model Description

Klara-Llama3-8B-v1 is a sophisticated medical assistant model fine-tuned from Meta's Llama 3 8B. It serves as the intelligent interface for the Klara health monitoring ecosystem, providing expert-level interpretation of physiological sensor data.

- **Developed by:** Koussay Chaanbi
- **Project Name:** Klara
- **Model type:** Fine-tuned Causal Language Model
- **Fine-tuning Technique:** QLoRA (4-bit Quantized LoRA)
- **Base Model:** Meta-Llama-3-8B-Instruct
- **Persona:** A professional and precise medical assistant specialized in real-time health data analysis.

### Model Sources

- **Repository:** [Klara-Project on Hugging Face](https://huggingface.co/Koussay/Klara-Llama3-8B-v1)
- **Deployment Target:** Desktop/Edge environments using 4-bit quantization.

## Uses

### Direct Use

- **Crisis Interpretation:** Explaining the severity and nature of medical crises detected by companion sensor models.
- **Contextual Health Advice:** Providing preventative advice based on physiological trends.

### Out-of-Scope Use

This model is **not** a substitute for professional clinical diagnostics or emergency medical services. It is intended for research and demonstration within the Klara project framework.

## Bias, Risks, and Limitations

- **Compute Requirements:** Requires significant VRAM or 4-bit quantization (GGUF/EXL2) for efficient inference.
- **Medical Accuracy:** Users must verify all outputs; the model may hallucinate specific clinical values.

## How to Get Started with the Model

*Note: Access must be requested and approved via the "Gated Access" system.*

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
from peft import PeftModel

model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
adapter_id = "Koussay/Klara-Llama3-8B-v1-LoRA"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    load_in_4bit=True,
    device_map="auto",
    torch_dtype=torch.bfloat16
)

model = PeftModel.from_pretrained(model, adapter_id)

messages = [
    {"role": "system", "content": "You are Klara, a professional medical assistant created by Koussay Chaanbi."},
    {"role": "user", "content": "The system detected a sudden drop in SpO2. What are the immediate steps?"}
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
outputs = model.generate(inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))