YAML Metadata Warning:The pipeline tag "text2text-generation" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, image-text-to-image, image-text-to-video, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other

Hugging Face: Banglish to Bangla Translation

This repository demonstrates how to use a Hugging Face model to translate Banglish (Romanized Bangla) text into Bangla using the MBart50 tokenizer and model. The model, Mdkaif2782/banglish-to-bangla, is pre-trained and fine-tuned for this task.

Setup in Google Colab

Follow these steps to use the model in Google Colab:

1. Install Dependencies

Make sure you have the transformers library installed. Run the following command in your Colab notebook:

!pip install transformers torch

2. Load and Use the Model

Copy the code below into a cell in your Colab notebook to start translating Banglish to Bangla:

from transformers import MBartForConditionalGeneration, MBart50TokenizerFast
import torch

# Load the pre-trained model and tokenizer directly from Hugging Face
model_name = "Mdkaif2782/banglish-to-bangla"
tokenizer = MBart50TokenizerFast.from_pretrained(model_name)
model = MBartForConditionalGeneration.from_pretrained(model_name)

def translate_banglish_to_bangla(model, tokenizer, banglish_input):
    inputs = tokenizer(banglish_input, return_tensors="pt", padding=True, truncation=True, max_length=128)

    if torch.cuda.is_available():
        inputs = {key: value.cuda() for key, value in inputs.items()}
        model = model.cuda()

    translated_tokens = model.generate(**inputs, decoder_start_token_id=tokenizer.lang_code_to_id["bn_IN"])
    translated_text = tokenizer.batch_decode(translated_tokens, skip_special_tokens=True)[0]

    return translated_text

# Take custom input
print("Enter your Banglish text (type 'exit' to quit):")
while True:
    banglish_text = input("Banglish: ")
    if banglish_text.lower() == "exit":
        break

    # Translate Banglish to Bangla
    translated_text = translate_banglish_to_bangla(model, tokenizer, banglish_text)
    print(f"Translated Bangla: {translated_text}\n")

3. Run the Notebook

  1. Paste the above code into a cell.
  2. Run the cell.
  3. Enter your Banglish text in the input prompt to get the translated Bangla text. Type exit to quit.

Example Usage

Input:

Banglish: amar valo lagche onek

Output:

Translated Bangla: আমার ভালো লাগছে অনেক

Notes

  • Ensure your runtime in Google Colab supports GPU for faster processing. Go to Runtime > Change runtime type and select GPU.
  • The model Mdkaif2782/banglish-to-bangla can be fine-tuned further if required.

License

This project uses the Hugging Face transformers library. Refer to the Hugging Face documentation for more details.

Downloads last month
8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Mdkaif2782/banglish-to-bangla

Finetuned
(345)
this model

Dataset used to train Mdkaif2782/banglish-to-bangla