Kossayart
/

klara_ai

Text Generation

medical-assistant

Model card Files Files and versions

klara_ai / README.md

Kossayart's picture

Update README.md

8a13cf9 verified 24 days ago

|

history blame contribute delete

3.02 kB

	---
	language:
	- en
	license: llama3
	base_model: meta-llama/Meta-Llama-3-8B-Instruct
	tags:
	- medical-assistant
	- lora
	- qlora
	- health-tech
	- medical-llm
	library_name: peft
	pipeline_tag: text-generation
	inference: false # تم إضافة هذا السطر لإغلاق الـ API ومنع استهلاك التوكنز
	---

	# Model Card for Klara-Llama3-8B-v1

	Note: This model is part of a private graduation project (PFE). Access to weights and the Inference API is restricted to authorized users only.

	## Model Details

	### Model Description

	Klara-Llama3-8B-v1 is a sophisticated medical assistant model fine-tuned from Meta's Llama 3 8B. It serves as the intelligent interface for the Klara health monitoring ecosystem, providing expert-level interpretation of physiological sensor data.

	- Developed by: Koussay Chaanbi
	- Project Name: Klara
	- Model type: Fine-tuned Causal Language Model
	- Fine-tuning Technique: QLoRA (4-bit Quantized LoRA)
	- Base Model: Meta-Llama-3-8B-Instruct
	- Persona: A professional and precise medical assistant specialized in real-time health data analysis.

	### Model Sources

	- Repository: [Klara-Project on Hugging Face](https://huggingface.co/Koussay/Klara-Llama3-8B-v1)
	- Deployment Target: Desktop/Edge environments using 4-bit quantization.

	## Uses

	### Direct Use

	- Crisis Interpretation: Explaining the severity and nature of medical crises detected by companion sensor models.
	- Contextual Health Advice: Providing preventative advice based on physiological trends.

	### Out-of-Scope Use

	This model is not a substitute for professional clinical diagnostics or emergency medical services. It is intended for research and demonstration within the Klara project framework.

	## Bias, Risks, and Limitations

	- Compute Requirements: Requires significant VRAM or 4-bit quantization (GGUF/EXL2) for efficient inference.
	- Medical Accuracy: Users must verify all outputs; the model may hallucinate specific clinical values.

	## How to Get Started with the Model

	Note: Access must be requested and approved via the "Gated Access" system.

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch
	from peft import PeftModel

	model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
	adapter_id = "Koussay/Klara-Llama3-8B-v1-LoRA"

	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	load_in_4bit=True,
	device_map="auto",
	torch_dtype=torch.bfloat16
	)

	model = PeftModel.from_pretrained(model, adapter_id)

	messages = [
	{"role": "system", "content": "You are Klara, a professional medical assistant created by Koussay Chaanbi."},
	{"role": "user", "content": "The system detected a sudden drop in SpO2. What are the immediate steps?"}
	]

	inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
	outputs = model.generate(inputs, max_new_tokens=256)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))