venlue: EMNLP 2020 (link)

Previous work that uses pretrained language model (PLM) such as BERT for information retrieval takes the [CLS] embedding of the concatenation of query and document as features for discriminative learning. In other words, the relevance label for a given (query, document) pair is modeled as:

where is the [CLS] embedding from the last layer of BERT and is usually a classification layer.

Language models are also used in traditional IR methods (link) in a generative way, where the conditional likelihood is used as the relevance score. This paper experiments with modern PLM to model .

To finetune the PLM, the input is formated as:

<bos> document <boq> query <eoq>

At training time, is minimized while at inference time, the conditional log-likelihood is calculated for every document.

In practice, there could be different loss functions for finetuning. The following loss functions are tested:

**LUL**: loss function for likelihood and unlikelihood estimation. This is an extention of regular cross entropy loss where the **unlikelihood training objective** is added (2nd term). The 2nd term can be considered as a regularizer that makes the model less overconfident with query likelihoods.

**RLL**: pairwise ranking loss on the likelihood of positive and negative examples

**MLE**: maximum likelihood estimation (only positive examples are used)

Results with different PLMs and on different datasets are shown in the following table:

Take-way conclusions/tricks:

- the unlikehood term in equation 4 is effective by comparing with MLE with LUL (row 6,7);
- Among all generatice methods, BART-large(RLL) performs the best.
- RLL loss seems to be the most effective for IR task, which is widely used in pre-BERT IR models. Though the paper emphasizes the effectiveness of the new generative approach, I am more attracted by the experiments on different loss functions (especially RLL) 🙂
- When finetuning with RLL, for a question, 15 negative passages are sampled while only the one with highest score is used to update the model.

### Like this:

Like Loading...

*Related*