Abstract
A large-scale Danish text corpus (Danish Gigaword Corpus) is presented to address language technology limitations due to insufficient data availability.
AI-generated summary
Danish language technology has been hindered by a lack of broad-coverage corpora at the scale modern NLP prefers. This paper describes the Danish Gigaword Corpus, the result of a focused effort to provide a diverse and freely-available one billion word corpus of Danish text. The Danish Gigaword corpus covers a wide array of time periods, domains, speakers' socio-economic status, and Danish dialects.
Get this paper in your agent:
hf papers read 2005.03521 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 4
Datasets citing this paper 4
Spaces citing this paper 1
Collections including this paper 0
No Collection including this paper
Add this paper to a collection to link it from this page.