Paper Reading: To Pretrain or Not to Pretrain: Examining the Benefits of Pretraining on Resource Rich Tasks
Venue: ACL 2020 This paper presents the empirical results of how the performance gap between pretraining models (RoBERTa) and vanilla LSTM changes in terms of the size of training samples for text classification tasks. They experimented on 3 text classification datasets with 3 models: RoBERTa, LSTM, LSTM initialized with pretrained RoBERTa embeddings. They used different portion of training samples(1%, 10%, 30%, 50%, 70%, 90%) to … Continue reading Paper Reading: To Pretrain or Not to Pretrain: Examining the Benefits of Pretraining on Resource Rich Tasks