Skip to Main Content
Table 6: 
The effect of replacing BERT’s original masking scheme (Subword Tokens) with different masking schemes. Results are F1 scores for QA tasks and accuracy for MNLI and QNLI on the development sets. All the models are based on bi-sequence training with NSP.
SQuAD 2.0NewsQATriviaQACoreferenceMNLI-mQNLIGLUE (Avg)
Subword Tokens 83.8 72.0 76.3 77.7 86.7 92.5 83.2 
Whole Words 84.3 72.8 77.1 76.6 86.3 92.8 82.9 
Named Entities 84.8 72.7 78.7 75.6 86.0 93.1 83.2 
Noun Phrases 85.0 73.0 77.7 76.7 86.5 93.2 83.5 
Geometric Spans 85.4 73.0 78.8 76.4 87.0 93.3 83.4 
SQuAD 2.0NewsQATriviaQACoreferenceMNLI-mQNLIGLUE (Avg)
Subword Tokens 83.8 72.0 76.3 77.7 86.7 92.5 83.2 
Whole Words 84.3 72.8 77.1 76.6 86.3 92.8 82.9 
Named Entities 84.8 72.7 78.7 75.6 86.0 93.1 83.2 
Noun Phrases 85.0 73.0 77.7 76.7 86.5 93.2 83.5 
Geometric Spans 85.4 73.0 78.8 76.4 87.0 93.3 83.4 
Close Modal

or Create an Account

Close Modal
Close Modal