EN-VI Parallel Sense Tagger

This project is a token classification model for predicting English-Vietnamese word senses.
It is developed as a student project for NLP coursework, aiming to explore cross-lingual sense tagging.

Project Overview

Objective:
The goal of this project is to build a model that can accurately identify the sense of words in English and Vietnamese sentences, using token-level classification. This is especially useful in tasks like machine translation, semantic understanding, and multilingual NLP applications.

Dataset:

A custom dataset combining English and Vietnamese texts.
Labels include:
- PAD: Padding tokens
- O: Tokens not associated with a sense
- Sense labels: Specific word senses in English and Vietnamese
Number of English labels: 6946
Number of Vietnamese labels: 7029

Model Architecture:

Base model: XLM-Roberta (cross-lingual transformer)
Task: Token Classification
Hidden size: 768
Number of layers: 12
Attention heads: 12
Tokenizer: XLM-Roberta tokenizer
Special tokens: PAD, BOS, EOS

Training

The model is trained using PyTorch and the Hugging Face Transformers library.
Optimized for cross-lingual word sense tagging.
Supports batching, GPU acceleration, and token-level predictions.

Usage

from transformers import AutoTokenizer, AutoModelForTokenClassification

# Load the tokenizer and model from Hugging Face Hub
tokenizer = AutoTokenizer.from_pretrained("kytrungchauwork/en-vi-parallel-sense-tagger")
model = AutoModelForTokenClassification.from_pretrained("kytrungchauwork/en-vi-parallel-sense-tagger")

# Example sentence
text = "Your input sentence here"
inputs = tokenizer(text, return_tensors="pt")

# Model inference
outputs = model(**inputs)
predictions = outputs.logits.argmax(dim=-1)

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kytrungchauwork/eng_viet_parrallel_sense_tagger

Base model

FacebookAI/xlm-roberta-base

Finetuned

(3893)

this model

kytrungchauwork
/

eng_viet_parrallel_sense_tagger

EN-VI Parallel Sense Tagger

Project Overview

Training

Usage

Model tree for kytrungchauwork/eng_viet_parrallel_sense_tagger

Dataset used to train kytrungchauwork/eng_viet_parrallel_sense_tagger