# Paper Reading: Out-of-Vocabulary Embedding Imputation with Grounded Language Information by Graph Convolutional Networks

venue: ACL 2019

The paper proposes a GCN-based method to produce word embeddings for out-of-vocabulary(OOV) words.

## 1. Graph Construction

To construct a knowledge graph, vocabulary is constructed from Wikipedia English dataset (3B tokens). To note that, this vocabulary includes OOV words which are not in the vocabulary of pre-trained embeddings such as GLOVE.

For each node/word, they define the concatenation of Wikipedia page summary and Wiktionary definition of the word $w_v$ as $D_v$. The edge between two words $w_v$ and $w_u$ is constructed if the Jaccard coefficient between $D_v$ and $D_u$ is larger than 0.5 (empirically chosen). The Jaccard coefficient is also used as the edge weight $s_{vu}$.. Besides, the mean of pre-trained embeddings of words in $D_v$ is calcuated as the feature vector of $w_v$.

## 2. Embedding Learning

Given the constructed graph and extracted feature vectors of nodes, they use GCN (Kipf and Welling, 2016) for node embedding learning:

where $S(v)= N(v) \cup \{v\}$ and the normalization constant $C = 1 + \sum_{u \in N(v)} s_{vu}$. Node embeddings are initialized as feature vectors. The final node embeddings are computed without ReLU. Mean square loss between the node embeddings and pretrained embddings (e.g. GLOVE) is minimized to optimize the parameters. During the inference, OOV words are also assigned embeddings.