Skip to Main Content
Table 6

Performance of different LMs on the MC-test dataset. “Original” indicates the original language model, and “+ UnifiedQA” indicates fine-tuning following the recipe of UnifiedQA.

MethodBARTGPT-2 large
ACCECEACCECE
Original 0.295 0.225 0.272 0.244 
+ UnifiedQA 0.662 0.166 0.414 0.243 
+ softmax 0.658 0.097 0.434 0.177 
+ margin 0.632 0.090 0.450 0.123 
+ Temp. 0.632 0.064 0.450 0.067 
+ XGB 0.624 0.090 0.440 0.080 
+ Para. 0.624 0.084 0.436 0.104 
+ Aug. 0.600 0.089 0.441 0.126 
+ Combo 0.591 0.065 0.429 0.069 
MethodBARTGPT-2 large
ACCECEACCECE
Original 0.295 0.225 0.272 0.244 
+ UnifiedQA 0.662 0.166 0.414 0.243 
+ softmax 0.658 0.097 0.434 0.177 
+ margin 0.632 0.090 0.450 0.123 
+ Temp. 0.632 0.064 0.450 0.067 
+ XGB 0.624 0.090 0.440 0.080 
+ Para. 0.624 0.084 0.436 0.104 
+ Aug. 0.600 0.089 0.441 0.126 
+ Combo 0.591 0.065 0.429 0.069 
Close Modal

or Create an Account

Close Modal
Close Modal