a tiny vision language model
Send video and text for explanation or action
Restore and enhance faces in photos