# Nepali Automatic Speech Recognition (ASR) ## Overview Fine-tuning and inference for Nepali language speech recognition using Wav2Vec2 and Whisper models. ## Model Details | Property | Value | |----------|-------| | **Model ID** | `Saugat212/ASR_MODEL` | | **Base Model** | facebook/wav2vec2-base | | **Architecture** | wav2vec2 | | **Parameters** | 0.3B | | **Language** | Nepali | ## Purpose - Convert Nepali speech audio to text - Fine-tune Wav2Vec2 on Nepali datasets - Evaluate ASR performance using WER metric ## Contents | File | Description | |------|-------------| | `whisper_transcription.ipynb` | Whisper model for Nepali speech-to-text transcription | | `wav2vec2_finetuning.ipynb` | Wav2Vec2 fine-tuning recipe for Nepali ASR | | `wav2vec2_finetune.py` | Python script for Wav2Vec2 fine-tuning | | `finetune.py` | ASR fine-tuning script | | `Dataset/` | Training datasets (CSV files with audio paths and transcriptions) | | `Phase 1/Finetuning/` | Phase 1 training data, checkpoints, and inference notebooks | ## Usage ### Load Model ```python from transformers import Wav2Vec2Processor, Wav2Vec2ForCTC model_name = "Saugat212/ASR_MODEL" processor = Wav2Vec2Processor.from_pretrained(model_name) model = Wav2Vec2ForCTC.from_pretrained(model_name) ``` ### Inference ```python import torchaudio import torch # Load audio waveform, sample_rate = torchaudio.load("audio.wav") # Process input_values = processor(waveform.squeeze(), return_tensors="pt", sampling_rate=sample_rate).input_values # Infer with torch.no_grad(): logits = model(input_values).logits predicted_ids = torch.argmax(logits, dim=-1) # Decode transcription = processor.batch_decode(predicted_ids)[0] print(transcription) ``` ## Models Available - **Wav2Vec2**: `Saugat212/ASR_MODEL` - Fine-tuned Nepali ASR - **Whisper**: OpenAI Whisper for alternative transcription ## Dataset - Located in `Dataset/` - Contains `final_transcriptions.csv` with audio paths and transcriptions - Cleaned data in `cleaned_data.csv` ## Requirements - transformers - torchaudio - datasets - evaluate - jiwer ## Fine-tuning See `wav2vec2_finetuning.ipynb` for complete fine-tuning pipeline.