klara_ai / README.md
Kossayart's picture
Update README.md
8a13cf9 verified
---
language:
- en
license: llama3
base_model: meta-llama/Meta-Llama-3-8B-Instruct
tags:
- medical-assistant
- lora
- qlora
- health-tech
- medical-llm
library_name: peft
pipeline_tag: text-generation
inference: false # تم إضافة هذا السطر لإغلاق الـ API ومنع استهلاك التوكنز
---
# Model Card for Klara-Llama3-8B-v1
**Note:** This model is part of a private graduation project (PFE). Access to weights and the Inference API is restricted to authorized users only.
## Model Details
### Model Description
Klara-Llama3-8B-v1 is a sophisticated medical assistant model fine-tuned from Meta's Llama 3 8B. It serves as the intelligent interface for the Klara health monitoring ecosystem, providing expert-level interpretation of physiological sensor data.
- **Developed by:** Koussay Chaanbi
- **Project Name:** Klara
- **Model type:** Fine-tuned Causal Language Model
- **Fine-tuning Technique:** QLoRA (4-bit Quantized LoRA)
- **Base Model:** Meta-Llama-3-8B-Instruct
- **Persona:** A professional and precise medical assistant specialized in real-time health data analysis.
### Model Sources
- **Repository:** [Klara-Project on Hugging Face](https://huggingface.co/Koussay/Klara-Llama3-8B-v1)
- **Deployment Target:** Desktop/Edge environments using 4-bit quantization.
## Uses
### Direct Use
- **Crisis Interpretation:** Explaining the severity and nature of medical crises detected by companion sensor models.
- **Contextual Health Advice:** Providing preventative advice based on physiological trends.
### Out-of-Scope Use
This model is **not** a substitute for professional clinical diagnostics or emergency medical services. It is intended for research and demonstration within the Klara project framework.
## Bias, Risks, and Limitations
- **Compute Requirements:** Requires significant VRAM or 4-bit quantization (GGUF/EXL2) for efficient inference.
- **Medical Accuracy:** Users must verify all outputs; the model may hallucinate specific clinical values.
## How to Get Started with the Model
*Note: Access must be requested and approved via the "Gated Access" system.*
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
from peft import PeftModel
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
adapter_id = "Koussay/Klara-Llama3-8B-v1-LoRA"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
load_in_4bit=True,
device_map="auto",
torch_dtype=torch.bfloat16
)
model = PeftModel.from_pretrained(model, adapter_id)
messages = [
{"role": "system", "content": "You are Klara, a professional medical assistant created by Koussay Chaanbi."},
{"role": "user", "content": "The system detected a sudden drop in SpO2. What are the immediate steps?"}
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
outputs = model.generate(inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))