Paper Reading: Out-of-Vocabulary Embedding Imputation with Grounded Language Information by Graph Convolutional Networks

venue: ACL 2019

The paper proposes a GCN-based method to produce word embeddings for out-of-vocabulary(OOV) words.

1. Graph Construction

To construct a knowledge graph, vocabulary is constructed from Wikipedia English dataset (3B tokens). To note that, this vocabulary includes OOV words which are not in the vocabulary of pre-trained embeddings such as GLOVE.

For each node/word, they define the concatenation of Wikipedia page summary and Wiktionary definition of the word w_v as D_v. The edge between two words w_v and w_u is constructed if the Jaccard coefficient between D_v and D_u is larger than 0.5 (empirically chosen). The Jaccard coefficient is also used as the edge weight s_{vu}.. Besides, the mean of pre-trained embeddings of words in D_v is calcuated as the feature vector of w_v.

2. Embedding Learning

Given the constructed graph and extracted feature vectors of nodes, they use GCN (Kipf and Welling, 2016) for node embedding learning:

where S(v)= N(v) \cup \{v\} and the normalization constant C = 1 + \sum_{u \in N(v)} s_{vu}. Node embeddings are initialized as feature vectors. The final node embeddings are computed without ReLU. Mean square loss between the node embeddings and pretrained embddings (e.g. GLOVE) is minimized to optimize the parameters. During the inference, OOV words are also assigned embeddings.

3. Experiments

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s