Venue: ACL 2020
This paper presents the empirical results of how the performance gap between pretraining models (RoBERTa) and vanilla LSTM changes in terms of the size of training samples for text classification tasks.
They experimented on 3 text classification datasets with 3 models: RoBERTa, LSTM, LSTM initialized with pretrained RoBERTa embeddings. They used different portion of training samples(1%, 10%, 30%, 50%, 70%, 90%) to mimic the “low-resource”, “medium-resource” and “high-resource” regime. The results are shown in Figure 1.

There are 2 important conclusions:
- When increasing the size of training data, the accuracy gap between LSTM and RoBERTa decreases.
- Initializing LSTM with RoBERTa embeddings can improve the performance of LSTM.