Skip to Main Content

Nevertheless, computing 33 features repeatedly during the instance generation process described in Section 3 could be time-consuming. Therefore, we employed feature selection to reduce this set to a more manageable one. Using the data from the 6480 COCO instances, we define a dissimilarity matrix as 1-ρλi,λj, where ρ is the Pearson correlation between features λi and λj. Then, we use this dissimilarity matrix as input to a k-means clustering algorithm, such that similar features are clustered together. To determine the number of clusters, k=8, we use silhouette analysis. The results are shown in Table 2. We leverage our knowledge of the features to select the most suitable ones. For example, CN and RQ2 can be computed from the same model, while EL25 is required to calculate LQ25. Moreover, H(Y) has proven to be an effective predictor of ill-conditioning (Muñoz, Kirley et al., 2015). The resulting feature vector used to summarize each instance is λ=RQ2CNH(Y)ξ(1)γ(Y)EL25LQ25PKST.

Table 2:
Average silhouette value for each feature cluster obtained using correlation as the dissimilarity measure for k-means clustering.
1.000CN
0.907 H(Y) βmin βmax εmax      
0.534 RQ2 RL2 RLI2 RQI2 EQ10 EQ25 EL50 EQ50 FDC 
0.525 LQ25 LQ10 DISP1% Hmax M0     
0.514 γ(Y) κ(Y)        
0.385 ξ(1) ξ(2) ξ(N)       
0.149 EL25 EL10 LQ50       
0.146 PKS σ(1) σ(2) ET10 ET25 ET50    
1.000CN
0.907 H(Y) βmin βmax εmax      
0.534 RQ2 RL2 RLI2 RQI2 EQ10 EQ25 EL50 EQ50 FDC 
0.525 LQ25 LQ10 DISP1% Hmax M0     
0.514 γ(Y) κ(Y)        
0.385 ξ(1) ξ(2) ξ(N)       
0.149 EL25 EL10 LQ50       
0.146 PKS σ(1) σ(2) ET10 ET25 ET50    

Close Modal

or Create an Account

Close Modal
Close Modal