Skip to Main Content
Table 3: 
DM policies using the SBCE loss. Development Set (top three rows), De-duplicated Test Set (middle three rows), Original Test Set (bottom three rows). * denotes statistically significant (p ¡ 0.05) from baseline. Bold figures indicate the best performing method(s) for each of the two metrics and for each row (experimental setting).
DatasetsBASE (B)D-SAMPLEDATA DUPLMFSMFS + BORACLE + B
MetricsH-F1S-F1H-F1S-F1H-F1S-F1H-F1S-F1H-F1S-F1H-F1S-F1
M2M (R) 54.5 98.0 55.2 97.8 55.2 97.3 54.3 97.0 54.7 98.0 55.5 98.6 
M2M (M) 56.8 97.1 57.8 96.4 56.1 96.4 59.8* 97.1 56.9 96.2 56.2 96.6 
M-WOZ 43.7 77.7 43.6 77.8 43.3 77.4 47.4* 78.2 43.2 77.5 43.6 78.2 
 
M2M (R) 48.8 96.7 49.5 96.9 48.9 96.1 48.1 96.6 49.0 96.8 49.7 97.9* 
M2M (M) 61.1 96.4 61.3 95.7 60.6 95.7 61.4 96.6 60.9 95.7 60.5 95.8 
M-WOZ 44.1 76.4 44.1 76.8 44.2 76.6 47.2* 77.0 43.2 76.0 43.9 77.0 
 
M2M (R) 53.6 97.8 54.0 97.9 53.7 97.3 53.2 97.3 53.4 97.9 54.2 98.7* 
M2M (M) 60.0 97.0 61.0 96.4 59.5 96.3 62.2* 97.1 60.0 96.3 59.5 96.5 
M-WOZ 43.7 77.4 43.6 77.8 43.6 77.6 47.5* 77.8 42.9 77.0 43.4 77.8 
DatasetsBASE (B)D-SAMPLEDATA DUPLMFSMFS + BORACLE + B
MetricsH-F1S-F1H-F1S-F1H-F1S-F1H-F1S-F1H-F1S-F1H-F1S-F1
M2M (R) 54.5 98.0 55.2 97.8 55.2 97.3 54.3 97.0 54.7 98.0 55.5 98.6 
M2M (M) 56.8 97.1 57.8 96.4 56.1 96.4 59.8* 97.1 56.9 96.2 56.2 96.6 
M-WOZ 43.7 77.7 43.6 77.8 43.3 77.4 47.4* 78.2 43.2 77.5 43.6 78.2 
 
M2M (R) 48.8 96.7 49.5 96.9 48.9 96.1 48.1 96.6 49.0 96.8 49.7 97.9* 
M2M (M) 61.1 96.4 61.3 95.7 60.6 95.7 61.4 96.6 60.9 95.7 60.5 95.8 
M-WOZ 44.1 76.4 44.1 76.8 44.2 76.6 47.2* 77.0 43.2 76.0 43.9 77.0 
 
M2M (R) 53.6 97.8 54.0 97.9 53.7 97.3 53.2 97.3 53.4 97.9 54.2 98.7* 
M2M (M) 60.0 97.0 61.0 96.4 59.5 96.3 62.2* 97.1 60.0 96.3 59.5 96.5 
M-WOZ 43.7 77.4 43.6 77.8 43.6 77.6 47.5* 77.8 42.9 77.0 43.4 77.8 
Close Modal

or Create an Account

Close Modal
Close Modal