Skip to Main Content
Table 2: 
DM policies using the BCE loss. Development Set (top three rows), De-duplicated Test Set (middle three rows), Original Test Set (bottom three rows). * denotes statistically significant (p < 0.05) from baseline. Bold figures indicate the best performing method(s) for each of the two metrics and for each row (experimental setting).
DatasetsBASE (B)D-SAMPLEDATA DUPLMFSMFS + BORACLE + B
MetricsH-F1S-F1H-F1S-F1H-F1S-F1H-F1S-F1H-F1S-F1H-F1S-F1
M2M (R) 66.9 93.3 61.9* 85.6* 65.4* 92.5 65.0* 94.9 67.1 95.6* 68.8* 94.4 
M2M (M) 65.4 90.7 65.4 91.5 64.9 91.4 65.4 95.2* 66.5* 93.7* 68.3* 93.1* 
M-WOZ 46.6 67.9 45.6* 66.7* 46.6 67.9 46.4 73.9* 46.6 69.0* 46.6 67.5 
 
M2M (R) 61.7 91.8 59.4* 86.6* 60.8 91.0 60.0* 95.8* 61.7 95.0* 63.8* 92.5 
M2M (M) 69.8 91.0 69.9 90.8 69.7 91.1 69.4 94.7* 69.9 93.1* 72.2* 92.3* 
M-WOZ 45.9 66.5 45.3 65.6 45.8 66.4 45.5 72.6* 45.9 67.9 46.0 66.4 
 
M2M (R) 65.8 93.4 60.1* 85.8* 64.7 92.6 64.8 95.9* 66.3 96.0* 67.8* 94.1 
M2M (M) 71.5 92.0 71.4 92.2 71.4 92.5 71.2 95.6* 71.7 94.1* 73.8* 93.6* 
M-WOZ 46.5 67.1 45.4* 65.8* 46.4 67.2 46.3 73.7* 46.6 68.6* 46.7 67.1 
DatasetsBASE (B)D-SAMPLEDATA DUPLMFSMFS + BORACLE + B
MetricsH-F1S-F1H-F1S-F1H-F1S-F1H-F1S-F1H-F1S-F1H-F1S-F1
M2M (R) 66.9 93.3 61.9* 85.6* 65.4* 92.5 65.0* 94.9 67.1 95.6* 68.8* 94.4 
M2M (M) 65.4 90.7 65.4 91.5 64.9 91.4 65.4 95.2* 66.5* 93.7* 68.3* 93.1* 
M-WOZ 46.6 67.9 45.6* 66.7* 46.6 67.9 46.4 73.9* 46.6 69.0* 46.6 67.5 
 
M2M (R) 61.7 91.8 59.4* 86.6* 60.8 91.0 60.0* 95.8* 61.7 95.0* 63.8* 92.5 
M2M (M) 69.8 91.0 69.9 90.8 69.7 91.1 69.4 94.7* 69.9 93.1* 72.2* 92.3* 
M-WOZ 45.9 66.5 45.3 65.6 45.8 66.4 45.5 72.6* 45.9 67.9 46.0 66.4 
 
M2M (R) 65.8 93.4 60.1* 85.8* 64.7 92.6 64.8 95.9* 66.3 96.0* 67.8* 94.1 
M2M (M) 71.5 92.0 71.4 92.2 71.4 92.5 71.2 95.6* 71.7 94.1* 73.8* 93.6* 
M-WOZ 46.5 67.1 45.4* 65.8* 46.4 67.2 46.3 73.7* 46.6 68.6* 46.7 67.1 
Close Modal

or Create an Account

Close Modal
Close Modal