venue: ACL 2019
The paper proposes a GCN-based method to produce word embeddings for out-of-vocabulary(OOV) words.
1. Graph Construction
To construct a knowledge graph, vocabulary is constructed from Wikipedia English dataset (3B tokens). To note that, this vocabulary includes OOV words which are not in the vocabulary of pre-trained embeddings such as GLOVE.
For each node/word, they define the concatenation of Wikipedia page summary and Wiktionary definition of the word as
. The edge between two words
and
is constructed if the Jaccard coefficient between
and
is larger than 0.5 (empirically chosen). The Jaccard coefficient is also used as the edge weight
.. Besides, the mean of pre-trained embeddings of words in
is calcuated as the feature vector of
.
2. Embedding Learning
Given the constructed graph and extracted feature vectors of nodes, they use GCN (Kipf and Welling, 2016) for node embedding learning:

where and the normalization constant
. Node embeddings are initialized as feature vectors. The final node embeddings are computed without ReLU. Mean square loss between the node embeddings and pretrained embddings (e.g. GLOVE) is minimized to optimize the parameters. During the inference, OOV words are also assigned embeddings.