Skip to Main Content
Table 1: 

CiC-Language Generalization. NES on real-world language from the Chairs-in-Context (CiC) dataset. *SG architectures from Achlioptas et al. (2019) are the previously reported state-of-the-art method. NES+ grounds sub-events on the feature grid input. -SN indicates ShapeNet pre-trained features.

MethodInputListener Acc.
Majority N/A 0.333 
*SG-NoAttn VGG16-SN 0.812 ± 0.008 
*SG-Attn VGG16-SN 0.817 ± 0.008 
LSTM-Attn VGG16-SN 0.731 ± 0.012 
PoE VGG16-SN 0.752 ± 0.009 
NMN VGG16-SN 0.763 ± 0.023 
MAC VGG16-SN 0.818 ± 0.013 
NES VGG16 0.842 ± 0.005 
NES VGG16-SN 0.856 ± 0.005 
NES Res101 0.853 ± 0.011 
NES+ Res101 0.870 ± 0.009 
MethodInputListener Acc.
Majority N/A 0.333 
*SG-NoAttn VGG16-SN 0.812 ± 0.008 
*SG-Attn VGG16-SN 0.817 ± 0.008 
LSTM-Attn VGG16-SN 0.731 ± 0.012 
PoE VGG16-SN 0.752 ± 0.009 
NMN VGG16-SN 0.763 ± 0.023 
MAC VGG16-SN 0.818 ± 0.013 
NES VGG16 0.842 ± 0.005 
NES VGG16-SN 0.856 ± 0.005 
NES Res101 0.853 ± 0.011 
NES+ Res101 0.870 ± 0.009 
Close Modal

or Create an Account

Close Modal
Close Modal