NamSyntax/xlmr-large-viquad

Model Description

NamSyntax/xlmr-large-viquad is a fine-tuned version of the XLM-RoBERTa Large model for Vietnamese Question Answering (QA). It was meticulously fine-tuned on the UIT-ViQuAD 2.0 dataset to deliver precise, context-aware answers.

This model is a part of the Vietnamese-QA end-to-end Machine Learning pipeline project. If you find this repo helpful, feel free to give it a star ⭐—it really means a lot!

Intended Uses

  • Primary Use: Extractive Question Answering in Vietnamese. Given a context paragraph and a question, the model extracts the substring representing the answer from the context.
  • Ecosystem: It can be instantly integrated into web services. The associated project provides out-of-the-box support for a Gradio Web UI and a FastAPI REST backend.

Training Data

The model was fine-tuned on UIT-ViQuAD 2.0, a high-quality machine reading comprehension dataset curated specifically for the Vietnamese language.

Training Procedure

This model was trained using PyTorch and the Hugging Face Trainer API. The training pipeline implements:

  • Clean modular architecture for HuggingFace Dataset loading.
  • Preprocessing steps emphasizing text tokenization and chunking logic to handle long documents.
  • Hyperparameter management via config.yaml.

How to use

You can use this model directly with the pipeline API from Hugging Face:

from transformers import pipeline

qa_pipeline = pipeline("question-answering", model="NamSyntax/xlmr-large-viquad")

context = "Hà Nội là thủ đô của nước Cộng hòa Xã hội chủ nghĩa Việt Nam. Thành phố nằm ở phía tây bắc trung tâm vùng đồng bằng châu thổ sông Hồng."
question = "Thủ đô của Việt Nam là gì?"

result = qa_pipeline(question=question, context=context)
print(result)

Alternatively, to experience this model with an interactive UI or to deploy it as a production-grade backend service (FastAPI and Docker), please refer to the official repository: NamSyntax/Vietnamese-QA.

Downloads last month
36
Safetensors
Model size
0.6B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NamSyntax/xlmr-large-viquad

Finetuned
(926)
this model

Dataset used to train NamSyntax/xlmr-large-viquad