--- language: - en license: llama3 base_model: meta-llama/Meta-Llama-3-8B-Instruct tags: - medical-assistant - lora - qlora - health-tech - medical-llm library_name: peft pipeline_tag: text-generation inference: false # تم إضافة هذا السطر لإغلاق الـ API ومنع استهلاك التوكنز --- # Model Card for Klara-Llama3-8B-v1 **Note:** This model is part of a private graduation project (PFE). Access to weights and the Inference API is restricted to authorized users only. ## Model Details ### Model Description Klara-Llama3-8B-v1 is a sophisticated medical assistant model fine-tuned from Meta's Llama 3 8B. It serves as the intelligent interface for the Klara health monitoring ecosystem, providing expert-level interpretation of physiological sensor data. - **Developed by:** Koussay Chaanbi - **Project Name:** Klara - **Model type:** Fine-tuned Causal Language Model - **Fine-tuning Technique:** QLoRA (4-bit Quantized LoRA) - **Base Model:** Meta-Llama-3-8B-Instruct - **Persona:** A professional and precise medical assistant specialized in real-time health data analysis. ### Model Sources - **Repository:** [Klara-Project on Hugging Face](https://huggingface.co/Koussay/Klara-Llama3-8B-v1) - **Deployment Target:** Desktop/Edge environments using 4-bit quantization. ## Uses ### Direct Use - **Crisis Interpretation:** Explaining the severity and nature of medical crises detected by companion sensor models. - **Contextual Health Advice:** Providing preventative advice based on physiological trends. ### Out-of-Scope Use This model is **not** a substitute for professional clinical diagnostics or emergency medical services. It is intended for research and demonstration within the Klara project framework. ## Bias, Risks, and Limitations - **Compute Requirements:** Requires significant VRAM or 4-bit quantization (GGUF/EXL2) for efficient inference. - **Medical Accuracy:** Users must verify all outputs; the model may hallucinate specific clinical values. ## How to Get Started with the Model *Note: Access must be requested and approved via the "Gated Access" system.* ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch from peft import PeftModel model_id = "meta-llama/Meta-Llama-3-8B-Instruct" adapter_id = "Koussay/Klara-Llama3-8B-v1-LoRA" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, load_in_4bit=True, device_map="auto", torch_dtype=torch.bfloat16 ) model = PeftModel.from_pretrained(model, adapter_id) messages = [ {"role": "system", "content": "You are Klara, a professional medical assistant created by Koussay Chaanbi."}, {"role": "user", "content": "The system detected a sudden drop in SpO2. What are the immediate steps?"} ] inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda") outputs = model.generate(inputs, max_new_tokens=256) print(tokenizer.decode(outputs[0], skip_special_tokens=True))