Overview
This repository contains three sets of yearly word embedding models between 1900 and 1999 trained on the Japanese National Diet Library Ngram dataset: a book-only model, a magazine-only model, and a combined model trained on books and magazines together. The combined model was used for the paper’s main analysis because it had better overall embedding quality than the single-source models.
Combined model
The combined model is a series of yearly skip-gram with negative sampling (SGNS) word embeddings trained on the merged NDL Ngram corpus of books and magazines for 1900–1999. This is the primary model used in the paper’s main analysis because it showed the most reliable embedding quality across years.
Book-only model
The book-only model contains yearly SGNS word embeddings trained only on the book portion of the NDL Ngram dataset.
Magazine-only model
The magazine-only model contains yearly SGNS word embeddings trained only on the magazine portion of the NDL Ngram dataset.