Improve model card for llava-extract-qwen3-1.7B: Add metadata, links, and usage

#1
by nielsr HF Staff - opened

This PR significantly enhances the model card for markendo/llava-extract-qwen3-1.7B by providing crucial metadata and comprehensive usage instructions.

Key improvements include:

  • Adding pipeline_tag: image-text-to-text to accurately reflect its multimodal capabilities (image + text input, text output) and improve discoverability.
  • Specifying library_name: transformers based on config.json and usage patterns, enabling the automated "how to use" widget on the Hub.
  • Including descriptive tags (e.g., multimodal, vision-language-model, reasoning, small-language-model) and base_model information for better context and searchability.
  • Linking directly to the associated paper, project page, and GitHub repository for easy access to more information and code.
  • Adding a comprehensive "Usage" section with setup instructions and code snippets directly from the GitHub README, demonstrating how to use the Extract+Think framework for visual extraction and reasoning.

These additions will make the model more discoverable and user-friendly on the Hugging Face Hub.

markendo changed pull request status to merged

Sign up or log in to comment