Improve VST-7B-SFT model card with metadata, paper link, and usage clarity
#1
by nielsr HF Staff - opened
This PR enhances the model card by adding key metadata and improving its clarity and discoverability:
pipeline_tag: image-text-to-text: This tag accurately reflects the model's functionality of processing visual (image/video) and text inputs to generate text. It will help users find this model when searching for multimodal models.library_name: transformers: The inclusion oftransformersas thelibrary_nameensures that users are provided with an automated, functional code snippet on the model page, facilitating easier adoption. Evidence for compatibility is found inconfig.json,tokenizer_config.json, and the provided code snippet.- Hugging Face Paper Link: Added a direct link to the Hugging Face paper page, complementing the existing arXiv link and improving the discoverability of the research on the platform.
- Improved Title: The model card title has been updated to
# VST-7B-SFT: Visual Spatial Tuningfor better clarity. - "Sample Usage" Section: The "Quickstart" section has been renamed to "Sample Usage" and includes a clearer introduction for installing dependencies. The provided code snippet has been retained as it correctly targets
VST-7B-SFT.
These updates will make the model more accessible and easier to use for the community.
rayruiyang changed pull request status to merged