Instructions to use Kossayart/klara_ai with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Kossayart/klara_ai with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
| language: | |
| - en | |
| license: llama3 | |
| base_model: meta-llama/Meta-Llama-3-8B-Instruct | |
| tags: | |
| - medical-assistant | |
| - lora | |
| - qlora | |
| - health-tech | |
| - medical-llm | |
| library_name: peft | |
| pipeline_tag: text-generation | |
| inference: false # تم إضافة هذا السطر لإغلاق الـ API ومنع استهلاك التوكنز | |
| # Model Card for Klara-Llama3-8B-v1 | |
| **Note:** This model is part of a private graduation project (PFE). Access to weights and the Inference API is restricted to authorized users only. | |
| ## Model Details | |
| ### Model Description | |
| Klara-Llama3-8B-v1 is a sophisticated medical assistant model fine-tuned from Meta's Llama 3 8B. It serves as the intelligent interface for the Klara health monitoring ecosystem, providing expert-level interpretation of physiological sensor data. | |
| - **Developed by:** Koussay Chaanbi | |
| - **Project Name:** Klara | |
| - **Model type:** Fine-tuned Causal Language Model | |
| - **Fine-tuning Technique:** QLoRA (4-bit Quantized LoRA) | |
| - **Base Model:** Meta-Llama-3-8B-Instruct | |
| - **Persona:** A professional and precise medical assistant specialized in real-time health data analysis. | |
| ### Model Sources | |
| - **Repository:** [Klara-Project on Hugging Face](https://huggingface.co/Koussay/Klara-Llama3-8B-v1) | |
| - **Deployment Target:** Desktop/Edge environments using 4-bit quantization. | |
| ## Uses | |
| ### Direct Use | |
| - **Crisis Interpretation:** Explaining the severity and nature of medical crises detected by companion sensor models. | |
| - **Contextual Health Advice:** Providing preventative advice based on physiological trends. | |
| ### Out-of-Scope Use | |
| This model is **not** a substitute for professional clinical diagnostics or emergency medical services. It is intended for research and demonstration within the Klara project framework. | |
| ## Bias, Risks, and Limitations | |
| - **Compute Requirements:** Requires significant VRAM or 4-bit quantization (GGUF/EXL2) for efficient inference. | |
| - **Medical Accuracy:** Users must verify all outputs; the model may hallucinate specific clinical values. | |
| ## How to Get Started with the Model | |
| *Note: Access must be requested and approved via the "Gated Access" system.* | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| import torch | |
| from peft import PeftModel | |
| model_id = "meta-llama/Meta-Llama-3-8B-Instruct" | |
| adapter_id = "Koussay/Klara-Llama3-8B-v1-LoRA" | |
| tokenizer = AutoTokenizer.from_pretrained(model_id) | |
| model = AutoModelForCausalLM.from_pretrained( | |
| model_id, | |
| load_in_4bit=True, | |
| device_map="auto", | |
| torch_dtype=torch.bfloat16 | |
| ) | |
| model = PeftModel.from_pretrained(model, adapter_id) | |
| messages = [ | |
| {"role": "system", "content": "You are Klara, a professional medical assistant created by Koussay Chaanbi."}, | |
| {"role": "user", "content": "The system detected a sudden drop in SpO2. What are the immediate steps?"} | |
| ] | |
| inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda") | |
| outputs = model.generate(inputs, max_new_tokens=256) | |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |