Papers
arxiv:2003.00744

PhoBERT: Pre-trained language models for Vietnamese

Published on Mar 2, 2020

Abstract

PhoBERT, a large-scale monolingual language model for Vietnamese, outperforms XLM-R and sets new benchmarks in various Vietnamese-specific NLP tasks.

AI-generated summary

We present PhoBERT with two versions, PhoBERT-base and PhoBERT-large, the first public large-scale monolingual language models pre-trained for Vietnamese. Experimental results show that PhoBERT consistently outperforms the recent best pre-trained multilingual model XLM-R (Conneau et al., 2020) and improves the state-of-the-art in multiple Vietnamese-specific NLP tasks including Part-of-speech tagging, Dependency parsing, Named-entity recognition and Natural language inference. We release PhoBERT to facilitate future research and downstream applications for Vietnamese NLP. Our PhoBERT models are available at https://github.com/VinAIResearch/PhoBERT

Community

This comment has been hidden (marked as Spam)
This comment has been hidden (marked as Spam)

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2003.00744
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 8

Browse 8 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2003.00744 in a dataset README.md to link it from this page.

Spaces citing this paper 74

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.