This paper claims it is the first study on how to use the click-graph features in neural models for retrieval. The graph embedding techniques proposed in this paper can be plugged into other scenarios where graph-structure information is available(ensemble).
First we describe the basic IR model for product search without proposed graph embedding techniques(baseline). As shown in Figure 1, CNN is used to extract semantic query features and semantic product features from product descriptions. The product id embedding is concatenated with and . Finally, an MLP is used to predict the relevance score. In fact, the structure is similar to C-DSSM.
2. Graph Embedding-based Ranking Model
Assume we have a query graph and a product graph . There could be different ways to construct such graphs(e.g. click-through data). For example, quries clicking the same product can be linked, products appearing in the same order can be linked as shown in Figure 2. Those graphs can prodivde additional information for retrieval tasks.
The paper proposes to replace product id embeddings with product graph embeddings as shown in Figure 3. They concatenate embeddings from DeepWalk with 1st order embeddings from LINE in their framework for graph embeddings.
It is more challenging to use query graph embeddings considering there could be a lot of new queries not in the query graph in the training time. One way is to use a regression model(e.g. CNN or RNN) with query terms as input that “mimics” the query graph embeddings:
and then use f to recreate query embeddings. However, this method can easily overfit. This paper propose to use an MLP to transform into the same vector space as and minimize the reconstruction error as shown in Figure 3:
This approach can jointly optimize reconstruction error with final ranking loss.
Given a query along with a positive product and a negative product , the loss is defined as :
where $score$ is the relevance score from model proposed in previous section and h is the smoothed hinge loss. At the same time, the reconstruction error can be minimized. So the overall loss function is:
Here I summarize some interesting results. First, they find that the performance of baseline method starts to decline for long tail queries while the proposed method remains robust. Second, they also show the baseline tends to rank popular products than propose method.
They also show the proposed model can better capture transitivity and discriminate among those decreasingly relevant query-product pairs (3-hop, 5-hop and so on relatively to totally random).