YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Qwen2.5-Omni Inference Endpoint
This repository contains code for deploying the Qwen2.5-Omni-0.5B model to Hugging Face Inference Endpoints for use with the Indoor Scenes dataset.
Overview
The LLaVA-Onevision implementation with Qwen2.5-Omni provides multimodal capabilities for:
- Image captioning
- Audio recognition
- Video understanding
- Test-time scaling implementation
Deployment Instructions
Setup your Hugging Face account:
- Ensure you have a Hugging Face account with a valid API token
- Use
huggingface-cli loginto authenticate
Create and push to a Hugging Face repository:
huggingface-cli repo create YOUR_USERNAME/my-qwen-omni-endpoint --type model git init git add . git commit -m "Initial commit" git remote add origin https://huggingface.co/YOUR_USERNAME/my-qwen-omni-endpoint git push -u origin mainDeploy to Inference Endpoints:
- Go to your repository on Hugging Face
- Navigate to "Settings" > "Inference Endpoints"
- Create a new endpoint
- Select appropriate hardware (recommend at least 16GB GPU)
- Deploy!
Using the Endpoint
Text-only example:
{
"conversation": [
{"role": "user", "content": "Tell me about yourself."}
]
}
Image example:
{
"conversation": [
{
"role": "user",
"content": "What do you see in this image?",
"images": ["https://example.com/image.jpg"]
}
]
}
For MIT Indoor Scenes Dataset
This endpoint is specifically designed to work with the MIT Indoor Scenes dataset from CVPR 2019. The model can be used to generate captions for indoor scene images to evaluate captioning performance.
Testing Test-Time Scaling
The implementation supports test-time scaling through the standard inference interface, allowing for:
- Budget scaling/forcing
- Beam search integration
- Various performance metrics
- Downloads last month
- -