Vision Language Models as Explainable Classifiers for Skin Lesions

This is the project I submitted to the Terra North Jersey STEM Fair in 2026. I also gave a short talk about it at the Bridgewater-Raritan High School AI/ML club. It finetunes Qwen 3 VL 30b A3b with reinforcement learning to classify skin lesions as benign or malignant and explain rationale using the Tinker platform. It is also specifically trained to respond to human pushback by either defending or revising its response, follow specific formatting instructions that a user might give it, and use a zoom tool to take a better look at parts of an image. This technique is flexible and can be easily adapted to other tasks by swapping datasets and modifying the rubric used for grading explanations.

Poster (presented to judges): https://github.com/sr5434/VLMSkinLesionClassifier/blob/main/Poster.pdf

Slideshow (not presented to judges; more detailed): https://github.com/sr5434/VLMSkinLesionClassifier/blob/main/Slides.pdf

The training pipeline can be found here: https://github.com/sr5434/VLMSkinLesionClassifier/tree/main

Acknowledgements

I would like to thank Thinking Machines for funding this research. Without their support, this work would not be possible.

Citation

If you found this work useful, please cite it.

@software{Rangwalla_2026,
    title={Vision Language Models as Explainable Classifiers for Skin Lesions},
    url={http://github.com/sr5434},
    author={Rangwalla, Samir},
    year={2026},
    month={Mar}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sr5434/skin-cancer-classifier

Base model

Qwen/Qwen3-VL-30B-A3B-Instruct

Finetuned

(22)

this model