Qwen3 Omni 30B ASR Fine-Tuned Model
Model Description
This model is a fine-tuned version of Qwen3-Omni-30B-A3B-Instruct designed for Automatic Speech Recognition (ASR) tasks.
The model was fine-tuned on multilingual speech data containing English and Hindi audio transcripts to improve transcription accuracy.
It can transcribe spoken audio into text and is suitable for:
- Voice assistants
- Call transcription
- Conversational AI systems
- Speech analytics pipelines
Base Model
Base model used:
Qwen/Qwen3-Omni-30B-A3B-Instruct
This model supports multimodal inputs including:
- Text
- Audio
Fine-Tuning Details
| Property | Value |
|---|---|
| Base Model | Qwen3 Omni 30B |
| Task | Automatic Speech Recognition |
| Languages | English, Hindi |
| Training Method | Fine-tuning |
| Format | Instruction-based training |
| Dataset Format | JSONL |
Dataset
The model was trained on a dataset containing:
- Audio recordings
- Corresponding transcripts
- Language labels
- Duration metadata
Languages included:
- English
- Hindi
Dataset repository:
Shanmugapriyan/qwen_omni_ft_data
Usage
You can use this model for Automatic Speech Recognition.
Intended Use
This model is designed for:
- Speech-to-text transcription
- Conversational AI pipelines
- Voice assistant systems
- Customer call analytics
Limitations
- Requires high GPU memory (30B model)
- Accuracy depends on audio quality
- Performance may vary on unseen languages
Hardware Requirements
Recommended hardware:
- GPU: A100 / H100
- VRAM: 40GB+
Author
Shanmugapriyan T
AI Engineer | Voice AI Systems | LLM Fine-tuning
License
Apache 2.0
- Downloads last month
- 17
Model tree for Shanmugapriyan/Qwen3-Omni-30B-A3B-Instruct-merged-ft
Base model
Qwen/Qwen3-Omni-30B-A3B-Instruct