Improve VST-7B-SFT model card with metadata, paper link, and usage clarity

#1
by nielsr HF Staff - opened

This PR enhances the model card by adding key metadata and improving its clarity and discoverability:

  • pipeline_tag: image-text-to-text: This tag accurately reflects the model's functionality of processing visual (image/video) and text inputs to generate text. It will help users find this model when searching for multimodal models.
  • library_name: transformers: The inclusion of transformers as the library_name ensures that users are provided with an automated, functional code snippet on the model page, facilitating easier adoption. Evidence for compatibility is found in config.json, tokenizer_config.json, and the provided code snippet.
  • Hugging Face Paper Link: Added a direct link to the Hugging Face paper page, complementing the existing arXiv link and improving the discoverability of the research on the platform.
  • Improved Title: The model card title has been updated to # VST-7B-SFT: Visual Spatial Tuning for better clarity.
  • "Sample Usage" Section: The "Quickstart" section has been renamed to "Sample Usage" and includes a clearer introduction for installing dependencies. The provided code snippet has been retained as it correctly targets VST-7B-SFT.

These updates will make the model more accessible and easier to use for the community.

rayruiyang changed pull request status to merged

Sign up or log in to comment