Paper Reading: What Does BERT Look At? An Analysis of BERT’s Attention
code of this paper: link High-level Summary This paper studies the attention maps of the pre-trained BERT-base model. More specically, it : explore generally how BERT’s attention heads behave. eg. attending to fixed positional offsets. eg. attending broadly over the whole sentence. a large amount of attention attends to [SEP]. attention heads in the same layer behave similarly. probe each attention head for linguistic phenomena. … Continue reading Paper Reading: What Does BERT Look At? An Analysis of BERT’s Attention