Within both groups there are algorithms with significantly different test errors (without info: p-value: 0.0103, with info: p-value: , Friedman's rank sum test with Davenport correction). Pairwise comparison of results without info shows that GPC is better than GP (p-value: 0.011) and AML (p-value: 0.043) using Friedman's rank sum test with Bergman correction. For the problem instances with noise, AML produced the best result for only 1 out of 194 instances. For the instances without noise, the results of AML are similar to results of GP, GPC, and ITEA. Pairwise comparison of results with info shows that SCPR is better than the other algorithms and no statistically significant difference was found between GP, GPC and FIIT. The p-values for all pairwise tests are shown in Table 5.
p-values for pairwise comparison of algorithms in both groups (Friedman's rank sum test with Bergman correction).
without info . | with info . | ||||||
---|---|---|---|---|---|---|---|
GP . | GPC . | ITEA . | AML . | GP . | GPC . | FIIT . | SCPR . |
n/a | 0.011 | 0.325 | 0.499 | n/a | 0.151 | 0.353 | 0.002 |
0.011 | n/a | 0.325 | 0.043 | 0.151 | n/a | 0.054 | 0.000 |
0.325 | 0.325 | n/a | 0.353 | 0.353 | 0.054 | n/a | 0.028 |
0.499 | 0.043 | 0.353 | n/a | 0.002 | 0.000 | 0.028 | n/a |
without info . | with info . | ||||||
---|---|---|---|---|---|---|---|
GP . | GPC . | ITEA . | AML . | GP . | GPC . | FIIT . | SCPR . |
n/a | 0.011 | 0.325 | 0.499 | n/a | 0.151 | 0.353 | 0.002 |
0.011 | n/a | 0.325 | 0.043 | 0.151 | n/a | 0.054 | 0.000 |
0.325 | 0.325 | n/a | 0.353 | 0.353 | 0.054 | n/a | 0.028 |
0.499 | 0.043 | 0.353 | n/a | 0.002 | 0.000 | 0.028 | n/a |