You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Urdu-Punjabi Language Learning Model V2

This is a fine-tuned XLM-RoBERTa model for:

  • Answer Scoring: Evaluating user responses in Urdu and Pakistani Punjabi
  • Grammar Checking: Validating sentence structure
  • Translation Validation: Checking translation accuracy

Model Details

  • Base Model: xlm-roberta-base
  • Languages: Urdu (Nastaliq), Pakistani Punjabi (Shahmukhi), English
  • Task: Regression (score 0.0 to 1.0)
  • Fine-tuned on: Custom vocabulary dataset (1000 words)

Dataset

The model was trained on a comprehensive custom dataset with:

  • 20 Chapters (10 Urdu + 10 Punjabi)
  • 50 words per chapter = 1000 vocabulary items
  • 100 Quiz MCQ Questions (5 per chapter)
  • 100 User Input Questions (5 per chapter)
  • 1000+ Grammar Examples (sentence-translation pairs)

Chapter Topics:

  1. Greetings & Polite Expressions (سلام و آداب)
  2. Family & Relationships (خاندان اور رشتے)
  3. Food & Dining (کھانا اور خوراک)
  4. Numbers & Counting (گنتی اور اعداد)
  5. Places & Locations (جگہیں اور مقامات)
  6. Shopping & Money (خریداری اور پیسے)
  7. Emotions & Feelings (جذبات اور احساسات)
  8. Weather & Nature (موسم اور فطرت)
  9. Body Parts & Health (جسم کے اعضاء اور صحت)
  10. Education & Learning (تعلیم اور سیکھنا)

Usage

Python

from transformers import XLMRobertaTokenizer, XLMRobertaForSequenceClassification
import torch

# Load model
model_name = "RAFAY-484/Urdu-Punjabi-V2"
tokenizer = XLMRobertaTokenizer.from_pretrained(model_name)
model = XLMRobertaForSequenceClassification.from_pretrained(model_name)

# Score an answer
expected = "خوش"  # Happy in Urdu/Punjabi
user_input = "خوش"

inputs = tokenizer(expected, user_input, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)
    score = torch.sigmoid(outputs.logits).item()

print(f"Score: {score:.3f}")  # Output: Score: 0.95+

API Usage (Flutter/Mobile App)

Future<double> scoreAnswer(String expected, String userInput) async {
  final response = await http.post(
    Uri.parse('https://api-inference.huggingface.co/models/RAFAY-484/Urdu-Punjabi-V2'),
    headers: {
      'Authorization': 'Bearer YOUR_HF_TOKEN',
      'Content-Type': 'application/json',
    },
    body: jsonEncode({
      'inputs': '$expected [SEP] $userInput',
    }),
  );
  
  final result = jsonDecode(response.body);
  return result[0]['score'];
}

Scoring Guide

Score Range Meaning Example
0.9 - 1.0 Exact match Expected: خوش, User: خوش
0.7 - 0.9 Close match Expected: السلام علیکم, User: السلام
0.4 - 0.7 Partial match Expected: شکریہ, User: شکر
0.2 - 0.4 Related word Expected: کتاب, User: book
0.0 - 0.2 Incorrect Expected: پیار, User: نفرت

Vocabulary Samples

Urdu

Word Translation Pronunciation
شکریہ Thank you shukriya
خوش آمدید Welcome khush aamdeed
براہ کرم Please barah-e-karam

Pakistani Punjabi (Shahmukhi)

Word Translation Pronunciation
جی آیاں نوں Welcome ji aayan nu
ودھیا Very good wadiya
سوہنا Beautiful sohna

Important Note on Punjabi

⚠️ This model uses Pakistani Muslim Punjabi written in Shahmukhi script (Arabic-based). It does NOT include Sikh/Hindi Punjabi (Gurmukhi script) words.

All Punjabi vocabulary is authentic Pakistani Punjabi as spoken in Punjab, Pakistan.

Training Configuration

  • Epochs: 2
  • Batch Size: 32
  • Learning Rate: 2e-05
  • Max Length: 128
  • Training Samples: 3386
  • Validation Samples: 598

Use Cases

  1. Language Learning Apps: Score user responses in vocabulary exercises
  2. Quiz Systems: Validate answers in MCQ and fill-in-the-blank questions
  3. Grammar Checking: Evaluate sentence correctness
  4. Translation Apps: Verify translation accuracy

Citation

@misc{urdu-punjabi-v2-2024,
  author = {RAFAY-484},
  title = {Urdu-Punjabi Language Learning Model V2},
  year = {2024},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/RAFAY-484/Urdu-Punjabi-V2}}
}

License

MIT License - Free for educational and commercial use.

Downloads last month
2
Safetensors
Model size
0.3B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for RAFAY-484/Urdu-Punjabi-V2

Quantizations
1 model