Awaliuddin
/

unsloth_finetune

Image-Text-to-Text

text-generation-inference

Model card Files Files and versions

Awaliuddin commited on Mar 4, 2025

Commit

2442126

·

verified ·

1 Parent(s): 7b41c18

Update README.md

Files changed (1) hide show

README.md +64 -0

README.md CHANGED Viewed

@@ -10,6 +10,70 @@ language:
 - en
 ---
 # Uploaded finetuned  model
 - **Developed by:** Awaliuddin

 - en
 ---
+# Fine-tuned Vision-Language Model for Radiology Report Generation
+This repository contains a fine-tuned vision-language model for generating radiology reports. It's based on the [Unsloth](https://github.com/unslothai/unsloth) library and utilizes the Llama-3.2-11B-Vision-Instruct model as a base.
+## Model Description
+This model is fine-tuned on a sampled version of the ROCO radiography dataset ([Radiology_mini](https://huggingface.co/datasets/unsloth/Radiology_mini)). It's designed to assist medical professionals by providing accurate descriptions of medical images, such as X-rays, CT scans, and ultrasounds.
+The fine-tuning process uses Low-Rank Adaptation (LoRA) to efficiently train the model, focusing on the language layers while keeping the vision layers frozen. This approach minimizes the computational resources required for fine-tuning while achieving significant performance improvements.
+## Usage
+To use this model, you'll need the Unsloth library:
+```bash
+pip install unsloth
+```
+Then, you can load the model and tokenizer:
+```python
+from unsloth import FastVisionModel
+model, tokenizer = FastVisionModel.from_pretrained("awaliuddin/unsloth_finetune", load_in_4bit=True)
+FastVisionModel.for_inference(model)
+```
+```python
+from PIL import Image
+image = Image.open("path/to/your/image.jpg") # Replace with your image path
+instruction = "You are an expert radiographer. Describe accurately what you see in this image."
+messages = [ {"role": "user", "content": [ {"type": "image"}, {"type": "text", "text": instruction} ]} ]
+input_text = tokenizer.apply_chat_template(messages, add_generation_prompt=True) inputs = tokenizer(image, input_text, add_special_tokens=False, return_tensors="pt").to("cuda")
+from transformers import TextStreamer
+text_streamer = TextStreamer(tokenizer, skip_prompt=True) _ = model.generate(**inputs, streamer=text_streamer, max_new_tokens=128, use_cache=True, temperature=1.5, min_p=0.1)
+```
+## Training Details
+* **Base Model:** Llama-3.2-11B-Vision-Instruct
+* **Dataset:** Radiology_mini (sampled from ROCO radiography dataset)
+* **Fine-tuning Method:** LoRA (language layers only)
+* **Optimizer:** AdamW 8-bit
+* **Learning Rate:** 2e-4
+## Limitations
+* This model is trained on a limited dataset and might not generalize well to all types of medical images.
+* The generated reports should be reviewed by qualified medical professionals before being used for diagnostic purposes.
+## Acknowledgements
+* The Unsloth library for efficient fine-tuning of vision-language models.
+* The Hugging Face team for providing the platform and tools for model sharing.
+* The authors of the ROCO radiography dataset.
+## License
+[Apache-2.0 License]
 # Uploaded finetuned  model
 - **Developed by:** Awaliuddin