Kossayart commited on
Commit
31b7248
·
verified ·
1 Parent(s): 0975c5d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +47 -27
README.md CHANGED
@@ -1,65 +1,85 @@
1
  ---
2
  language:
3
  - en
4
- license: apache-2.0
5
- base_model: tinyllama/tinyllama-1.1b-chat-v1.0 # Or smollm/smollm2-1.7b
6
  tags:
7
  - medical-assistant
8
  - lora
9
- - quantized
10
- - edge-ai
11
  - health-tech
 
12
  library_name: peft
13
  pipeline_tag: text-generation
14
  ---
15
 
16
- # Model Card for Klara-LLM-v1
17
 
18
- Klara-LLM-v1 is a lightweight, fine-tuned Large Language Model designed to act as the cognitive core for a smart health monitoring system. It translates physiological sensor data and crisis predictions into actionable, human-readable medical advice.
19
 
20
  ## Model Details
21
 
22
  ### Model Description
23
 
24
- This model is a specialized version of a small-parameter LLM (e.g., TinyLlama/SmolLM2), fine-tuned using **LoRA (Low-Rank Adaptation)**. It is specifically optimized for local deployment on edge hardware to ensure user data privacy and low-latency responses.
25
 
26
  - **Developed by:** Koussay Chaanbi
27
- - **Project Name:** Klara (formerly Lyna.ai)
28
  - **Model type:** Fine-tuned Causal Language Model
29
- - **Fine-tuning Technique:** LoRA / QLoRA
30
- - **Base Model:** TinyLlama-1.1B or SmolLM2-1.7B
31
- - **Persona:** A professional, empathetic medical assistant programmed to assist users in interpreting health metrics and managing medical crises.
32
 
33
  ### Model Sources
34
 
35
- - **Repository:** [Klara-Project on Hugging Face](https://huggingface.co/Koussay/Klara-LLM-v1)
36
- - **Deployment Platform:** Raspberry Pi 4/5 via Ollama or Llama.cpp
37
 
38
  ## Uses
39
 
40
  ### Direct Use
41
 
42
- The model is designed to:
43
- 1. Provide context-aware medical explanations based on sensor inputs (HR, SpO2).
44
- 2. Offer immediate guidance when a medical crisis is detected by the companion CNN-BiLSTM model.
45
- 3. Answer user queries regarding physiological health trends.
46
 
47
  ### Out-of-Scope Use
48
 
49
- This model is not a licensed medical professional. It must not be used for life-critical decisions without human verification. It is intended for supportive health monitoring and research purposes within the Klara ecosystem.
50
 
51
  ## Bias, Risks, and Limitations
52
 
53
- - **Medical Accuracy:** While fine-tuned on medical logic, the model may occasionally hallucinate or provide generalized advice.
54
- - **Hardware Constraints:** Being a small-parameter model, its reasoning depth is more limited compared to larger models like Llama-3 or GPT-4.
55
- - **Privacy:** Designed for local inference to mitigate the risks associated with transmitting sensitive health data to the cloud.
56
 
57
  ## How to Get Started with the Model
58
 
59
- The model is typically served via **Ollama** or **Gemma.cpp**. You can interact with it using a structured system prompt:
60
 
61
- ```text
62
- System Prompt:
63
- "You are Klara, a professional medical assistant created by Koussay Chaanbi.
64
- Your goal is to monitor the user's health using sensor data and provide
65
- clear, supportive advice during medical crises."
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  language:
3
  - en
4
+ license: llama3
5
+ base_model: meta-llama/Meta-Llama-3-8B-Instruct
6
  tags:
7
  - medical-assistant
8
  - lora
9
+ - qlora
 
10
  - health-tech
11
+ - medical-llm
12
  library_name: peft
13
  pipeline_tag: text-generation
14
  ---
15
 
16
+ # Model Card for Klara-Llama3-8B-v1
17
 
18
+ Klara-Llama3-8B-v1 is a sophisticated medical assistant model fine-tuned from Meta's Llama 3 8B. It serves as the intelligent interface for the Klara health monitoring ecosystem, providing expert-level interpretation of physiological sensor data.
19
 
20
  ## Model Details
21
 
22
  ### Model Description
23
 
24
+ By leveraging the advanced reasoning capabilities of Llama 3 8B, this model provides high-fidelity medical guidance. It is fine-tuned using **QLoRA** to maintain a balance between high-performance medical logic and the ability to run on consumer-grade or high-end edge hardware.
25
 
26
  - **Developed by:** Koussay Chaanbi
27
+ - **Project Name:** Klara
28
  - **Model type:** Fine-tuned Causal Language Model
29
+ - **Fine-tuning Technique:** QLoRA (4-bit Quantized LoRA)
30
+ - **Base Model:** Meta-Llama-3-8B-Instruct
31
+ - **Persona:** A professional and precise medical assistant created by Koussay Chaanbi, specialized in real-time health data analysis.
32
 
33
  ### Model Sources
34
 
35
+ - **Repository:** [Klara-Project on Hugging Face](https://huggingface.co/Koussay/Klara-Llama3-8B-v1)
36
+ - **Deployment Target:** Desktop/Edge environments using 4-bit quantization (GGUF/EXL2).
37
 
38
  ## Uses
39
 
40
  ### Direct Use
41
 
42
+ 1. **Crisis Interpretation:** Explaining the severity and nature of medical crises detected by the CNN-BiLSTM sensor model.
43
+ 2. **Contextual Health Advice:** Providing preventative advice based on trends in Heart Rate and SpO2.
44
+ 3. **Structured Reporting:** Summarizing health metrics for university or medical review.
 
45
 
46
  ### Out-of-Scope Use
47
 
48
+ This model is intended for research and supportive monitoring. It is not a substitute for clinical diagnostics or emergency medical services.
49
 
50
  ## Bias, Risks, and Limitations
51
 
52
+ - **Compute Requirements:** Unlike 1B models, Llama 3 8B requires significant VRAM or 4-bit quantization to run effectively on edge devices like the Raspberry Pi (8GB recommended).
53
+ - **Domain Specificity:** While excellent at reasoning, the model may still produce hallucinations regarding specific clinical dosages unless grounded in a RAG system.
 
54
 
55
  ## How to Get Started with the Model
56
 
57
+ You can run this model using the `transformers` library with 4-bit quantization:
58
 
59
+ ```python
60
+ from transformers import AutoModelForCausalLM, AutoTokenizer
61
+ import torch
62
+
63
+ model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
64
+ adapter_id = "Koussay/Klara-Llama3-8B-v1-LoRA"
65
+
66
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
67
+ model = AutoModelForCausalLM.from_pretrained(
68
+ model_id,
69
+ load_in_4bit=True,
70
+ device_map="auto",
71
+ torch_dtype=torch.bfloat16
72
+ )
73
+
74
+ # Apply the fine-tuned Klara adapters
75
+ from peft import PeftModel
76
+ model = PeftModel.from_pretrained(model, adapter_id)
77
+
78
+ messages = [
79
+ {"role": "system", "content": "You are Klara, a professional medical assistant created by Koussay Chaanbi."},
80
+ {"role": "user", "content": "The system detected a sudden drop in SpO2. What are the immediate steps?"}
81
+ ]
82
+
83
+ inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
84
+ outputs = model.generate(inputs, max_new_tokens=256)
85
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))