Awaliuddin commited on
Commit
2442126
·
verified ·
1 Parent(s): 7b41c18

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +64 -0
README.md CHANGED
@@ -10,6 +10,70 @@ language:
10
  - en
11
  ---
12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  # Uploaded finetuned model
14
 
15
  - **Developed by:** Awaliuddin
 
10
  - en
11
  ---
12
 
13
+ # Fine-tuned Vision-Language Model for Radiology Report Generation
14
+
15
+ This repository contains a fine-tuned vision-language model for generating radiology reports. It's based on the [Unsloth](https://github.com/unslothai/unsloth) library and utilizes the Llama-3.2-11B-Vision-Instruct model as a base.
16
+
17
+ ## Model Description
18
+
19
+ This model is fine-tuned on a sampled version of the ROCO radiography dataset ([Radiology_mini](https://huggingface.co/datasets/unsloth/Radiology_mini)). It's designed to assist medical professionals by providing accurate descriptions of medical images, such as X-rays, CT scans, and ultrasounds.
20
+
21
+ The fine-tuning process uses Low-Rank Adaptation (LoRA) to efficiently train the model, focusing on the language layers while keeping the vision layers frozen. This approach minimizes the computational resources required for fine-tuning while achieving significant performance improvements.
22
+
23
+ ## Usage
24
+
25
+ To use this model, you'll need the Unsloth library:
26
+
27
+ ```bash
28
+ pip install unsloth
29
+ ```
30
+
31
+ Then, you can load the model and tokenizer:
32
+
33
+ ```python
34
+ from unsloth import FastVisionModel
35
+
36
+ model, tokenizer = FastVisionModel.from_pretrained("awaliuddin/unsloth_finetune", load_in_4bit=True)
37
+ FastVisionModel.for_inference(model)
38
+ ```
39
+
40
+ ```python
41
+ from PIL import Image
42
+
43
+ image = Image.open("path/to/your/image.jpg") # Replace with your image path
44
+ instruction = "You are an expert radiographer. Describe accurately what you see in this image."
45
+ messages = [ {"role": "user", "content": [ {"type": "image"}, {"type": "text", "text": instruction} ]} ]
46
+
47
+ input_text = tokenizer.apply_chat_template(messages, add_generation_prompt=True) inputs = tokenizer(image, input_text, add_special_tokens=False, return_tensors="pt").to("cuda")
48
+
49
+ from transformers import TextStreamer
50
+
51
+ text_streamer = TextStreamer(tokenizer, skip_prompt=True) _ = model.generate(**inputs, streamer=text_streamer, max_new_tokens=128, use_cache=True, temperature=1.5, min_p=0.1)
52
+ ```
53
+
54
+ ## Training Details
55
+
56
+ * **Base Model:** Llama-3.2-11B-Vision-Instruct
57
+ * **Dataset:** Radiology_mini (sampled from ROCO radiography dataset)
58
+ * **Fine-tuning Method:** LoRA (language layers only)
59
+ * **Optimizer:** AdamW 8-bit
60
+ * **Learning Rate:** 2e-4
61
+
62
+ ## Limitations
63
+
64
+ * This model is trained on a limited dataset and might not generalize well to all types of medical images.
65
+ * The generated reports should be reviewed by qualified medical professionals before being used for diagnostic purposes.
66
+
67
+ ## Acknowledgements
68
+
69
+ * The Unsloth library for efficient fine-tuning of vision-language models.
70
+ * The Hugging Face team for providing the platform and tools for model sharing.
71
+ * The authors of the ROCO radiography dataset.
72
+
73
+ ## License
74
+
75
+ [Apache-2.0 License]
76
+
77
  # Uploaded finetuned model
78
 
79
  - **Developed by:** Awaliuddin