Cluster statistics for 12 variants of four-stage clustering. All results include Stages 1 and 4, but some pipelines do not use Stages 2 or 3; all clusterings are kmp-valid. Top: results for k = 5, Bottom: results for k = 10. Stage 2 is performed using either Iterative Graclus (IG) or Recursive Graclus (RG), which are each run with either 0 or 2,000 local search iterations. Node coverage refers to the percentage of network nodes contained in nonsingleton clusters and singletons refers to the number of nodes in singleton clusters. All other statistics refer to nonsingleton clusters, with the last three columns refering to the sizes of nonsingleton clusters. Specific noteworthy trends include: (a) All clusterings that use Stage 3 have node coverage above 18%; (b) all clusterings have at least one very large cluster; (c) Stage 2 choice impacts maximum cluster size; and (d) setting k = 5 produces more clusters with a smaller median cluster size than setting k = 10.
Stage 2 . | Stage 3 . | Node coverage . | Clusters (number) . | Singletons (number) . | Min. . | Median . | Max. . |
---|---|---|---|---|---|---|---|
k = 5 | |||||||
No | No | 7.38% | 276 | 13,611,485 | 8 | 28.0 | 345,139 |
No | Yes | 36.63% | 276 | 9,312,583 | 15 | 265.5 | 856,623 |
IG(0) | Yes | 20.01% | 13,709 | 11,752,582 | 6 | 118.0 | 19,934 |
IG(2000) | Yes | 20.84% | 9,698 | 11,632,959 | 6 | 106.0 | 32,487 |
RG(0) | Yes | 26.27% | 2,261 | 10,835,596 | 10 | 145.0 | 579,720 |
RG(2000) | Yes | 26.09% | 3,417 | 10,861,670 | 11 | 105.0 | 578,265 |
k = 10 | |||||||
No | No | 4.22% | 119 | 14,075,787 | 12 | 85.0 | 213,670 |
No | Yes | 33.69% | 119 | 9,744,368 | 62 | 1638.0 | 964,503 |
IG(0) | Yes | 18.51% | 4,185 | 11,975,796 | 33 | 427.0 | 27,007 |
IG(2000) | Yes | 20.00% | 3,044 | 11,757,196 | 31 | 488.5 | 60,189 |
RG(0) | Yes | 27.26% | 359 | 10,689,730 | 67 | 1014.0 | 679,922 |
RG(2000) | Yes | 27.61% | 473 | 10,637,388 | 55 | 761.0 | 620,491 |
Stage 2 . | Stage 3 . | Node coverage . | Clusters (number) . | Singletons (number) . | Min. . | Median . | Max. . |
---|---|---|---|---|---|---|---|
k = 5 | |||||||
No | No | 7.38% | 276 | 13,611,485 | 8 | 28.0 | 345,139 |
No | Yes | 36.63% | 276 | 9,312,583 | 15 | 265.5 | 856,623 |
IG(0) | Yes | 20.01% | 13,709 | 11,752,582 | 6 | 118.0 | 19,934 |
IG(2000) | Yes | 20.84% | 9,698 | 11,632,959 | 6 | 106.0 | 32,487 |
RG(0) | Yes | 26.27% | 2,261 | 10,835,596 | 10 | 145.0 | 579,720 |
RG(2000) | Yes | 26.09% | 3,417 | 10,861,670 | 11 | 105.0 | 578,265 |
k = 10 | |||||||
No | No | 4.22% | 119 | 14,075,787 | 12 | 85.0 | 213,670 |
No | Yes | 33.69% | 119 | 9,744,368 | 62 | 1638.0 | 964,503 |
IG(0) | Yes | 18.51% | 4,185 | 11,975,796 | 33 | 427.0 | 27,007 |
IG(2000) | Yes | 20.00% | 3,044 | 11,757,196 | 31 | 488.5 | 60,189 |
RG(0) | Yes | 27.26% | 359 | 10,689,730 | 67 | 1014.0 | 679,922 |
RG(2000) | Yes | 27.61% | 473 | 10,637,388 | 55 | 761.0 | 620,491 |