Paper Reading: To Pretrain or Not to Pretrain: Examining the Benefits of Pretraining on Resource Rich Tasks

Venue: ACL 2020

This paper presents the empirical results of how the performance gap between pretraining models (RoBERTa) and vanilla LSTM changes in terms of the size of training samples for text classification tasks.

They experimented on 3 text classification datasets with 3 models: RoBERTa, LSTM, LSTM initialized with pretrained RoBERTa embeddings. They used different portion of training samples(1%, 10%, 30%, 50%, 70%, 90%) to mimic the “low-resource”, “medium-resource” and “high-resource” regime. The results are shown in Figure 1.

There are 2 important conclusions:

  1. When increasing the size of training data, the accuracy gap between LSTM and RoBERTa decreases.
  2. Initializing LSTM with RoBERTa embeddings can improve the performance of LSTM.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s