Vision Language Models as Explainable Classifiers for Skin Lesions
This is the project I submitted to the Terra North Jersey STEM Fair in 2026. I also gave a short talk about it at the Bridgewater-Raritan High School AI/ML club. It finetunes Qwen 3 VL 30b A3b with reinforcement learning to classify skin lesions as benign or malignant and explain rationale using the Tinker platform. It is also specifically trained to respond to human pushback by either defending or revising its response, follow specific formatting instructions that a user might give it, and use a zoom tool to take a better look at parts of an image. This technique is flexible and can be easily adapted to other tasks by swapping datasets and modifying the rubric used for grading explanations.
Poster (presented to judges): https://github.com/sr5434/VLMSkinLesionClassifier/blob/main/Poster.pdf
Slideshow (not presented to judges; more detailed): https://github.com/sr5434/VLMSkinLesionClassifier/blob/main/Slides.pdf
The training pipeline can be found here: https://github.com/sr5434/VLMSkinLesionClassifier/tree/main
Acknowledgements
I would like to thank Thinking Machines for funding this research. Without their support, this work would not be possible.
Citation
If you found this work useful, please cite it.
@software{Rangwalla_2026,
title={Vision Language Models as Explainable Classifiers for Skin Lesions},
url={http://github.com/sr5434},
author={Rangwalla, Samir},
year={2026},
month={Mar}
}
Model tree for sr5434/skin-cancer-classifier
Base model
Qwen/Qwen3-VL-30B-A3B-Instruct