NamSyntax/xlmr-large-viquad

Model Description

NamSyntax/xlmr-large-viquad is a fine-tuned version of the XLM-RoBERTa Large model for Vietnamese Question Answering (QA). It was meticulously fine-tuned on the UIT-ViQuAD 2.0 dataset to deliver precise, context-aware answers.

This model is a part of the Vietnamese-QA end-to-end Machine Learning pipeline project. If you find this repo helpful, feel free to give it a star ⭐—it really means a lot!

Intended Uses

Primary Use: Extractive Question Answering in Vietnamese. Given a context paragraph and a question, the model extracts the substring representing the answer from the context.
Ecosystem: It can be instantly integrated into web services. The associated project provides out-of-the-box support for a Gradio Web UI and a FastAPI REST backend.

Training Data

The model was fine-tuned on UIT-ViQuAD 2.0, a high-quality machine reading comprehension dataset curated specifically for the Vietnamese language.

Training Procedure

This model was trained using PyTorch and the Hugging Face Trainer API. The training pipeline implements:

Clean modular architecture for HuggingFace Dataset loading.
Preprocessing steps emphasizing text tokenization and chunking logic to handle long documents.
Hyperparameter management via config.yaml.

How to use

You can use this model directly with the pipeline API from Hugging Face:

from transformers import pipeline

qa_pipeline = pipeline("question-answering", model="NamSyntax/xlmr-large-viquad")

context = "Hà Nội là thủ đô của nước Cộng hòa Xã hội chủ nghĩa Việt Nam. Thành phố nằm ở phía tây bắc trung tâm vùng đồng bằng châu thổ sông Hồng."
question = "Thủ đô của Việt Nam là gì?"

result = qa_pipeline(question=question, context=context)
print(result)

Alternatively, to experience this model with an interactive UI or to deploy it as a production-grade backend service (FastAPI and Docker), please refer to the official repository: NamSyntax/Vietnamese-QA.

Downloads last month: 36

Safetensors

Model size

0.6B params

Tensor type

F32

Model tree for NamSyntax/xlmr-large-viquad

Base model

FacebookAI/xlm-roberta-large

Finetuned

(926)

this model

NamSyntax
/

xlmr-large-viquad