Qwen3 Omni 30B ASR Fine-Tuned Model

Model Description

This model is a fine-tuned version of Qwen3-Omni-30B-A3B-Instruct designed for Automatic Speech Recognition (ASR) tasks.

The model was fine-tuned on multilingual speech data containing English and Hindi audio transcripts to improve transcription accuracy.

It can transcribe spoken audio into text and is suitable for:

  • Voice assistants
  • Call transcription
  • Conversational AI systems
  • Speech analytics pipelines

Base Model

Base model used:

Qwen/Qwen3-Omni-30B-A3B-Instruct

This model supports multimodal inputs including:

  • Text
  • Audio

Fine-Tuning Details

Property Value
Base Model Qwen3 Omni 30B
Task Automatic Speech Recognition
Languages English, Hindi
Training Method Fine-tuning
Format Instruction-based training
Dataset Format JSONL

Dataset

The model was trained on a dataset containing:

  • Audio recordings
  • Corresponding transcripts
  • Language labels
  • Duration metadata

Languages included:

  • English
  • Hindi

Dataset repository:

Shanmugapriyan/qwen_omni_ft_data


Usage

You can use this model for Automatic Speech Recognition.


Intended Use

This model is designed for:

  • Speech-to-text transcription
  • Conversational AI pipelines
  • Voice assistant systems
  • Customer call analytics

Limitations

  • Requires high GPU memory (30B model)
  • Accuracy depends on audio quality
  • Performance may vary on unseen languages

Hardware Requirements

Recommended hardware:

  • GPU: A100 / H100
  • VRAM: 40GB+

Author

Shanmugapriyan T

AI Engineer | Voice AI Systems | LLM Fine-tuning


License

Apache 2.0

Downloads last month
17
Safetensors
Model size
32B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Shanmugapriyan/Qwen3-Omni-30B-A3B-Instruct-merged-ft

Finetuned
(16)
this model

Space using Shanmugapriyan/Qwen3-Omni-30B-A3B-Instruct-merged-ft 1