Skip to Main Content
Table 1. 
Performance of document similarity methods in a set of 205,037 PubMed articles
Feature representation and reduction methodsMedian rank (IQR)Recall@1Recall@50
Threshold parameters 
Binary 238.5 (1–9154) 0.25 0.42 
TF 427.5 (5–10075.25) 0.19 0.37 
TF-IDF 41 (1–799.25) 0.26 0.52 
T-SVD (100 components) 
Binary 8858 (1198–34252.25) 0.05 0.10 
TF 38491.5 (4968.75–104229.25) 0.05 0.08 
TF-IDF* 2768 (203.5–24884.5) 0.07 0.17 
T-SVD (200 components) 
Binary 5522.5 (495–27377.5) 0.07 0.14 
TF 36429 (3924.75–99717) 0.05 0.09 
TF-IDF* 1513 (84.75–15572.25) 0.10 0.22 
T-SVD (400 components) 
Binary 3211.5 (188–21040.25) 0.09 0.18 
TF 31220 (2967.25–96203.5) 0.07 0.10 
TF-IDF* 720 (36–9674.25) 0.13 0.28 
T-SVD (800 components) 
Binary 1606 (41.75–15311.75) 0.13 0.26 
TF 29421 (2245.25–92871.5) 0.07 0.12 
TF-IDF* 385.5 (13–6211.25) 0.15 0.34 
T-SVD (1600 components) 
Binary 824.5 (9–12704.5) 0.17 0.33 
TF 29519.5 (1597.5–93890) 0.08 0.13 
TF-IDF* 219 (6–4145.75) 0.17 0.37 
Feature representation and reduction methodsMedian rank (IQR)Recall@1Recall@50
Threshold parameters 
Binary 238.5 (1–9154) 0.25 0.42 
TF 427.5 (5–10075.25) 0.19 0.37 
TF-IDF 41 (1–799.25) 0.26 0.52 
T-SVD (100 components) 
Binary 8858 (1198–34252.25) 0.05 0.10 
TF 38491.5 (4968.75–104229.25) 0.05 0.08 
TF-IDF* 2768 (203.5–24884.5) 0.07 0.17 
T-SVD (200 components) 
Binary 5522.5 (495–27377.5) 0.07 0.14 
TF 36429 (3924.75–99717) 0.05 0.09 
TF-IDF* 1513 (84.75–15572.25) 0.10 0.22 
T-SVD (400 components) 
Binary 3211.5 (188–21040.25) 0.09 0.18 
TF 31220 (2967.25–96203.5) 0.07 0.10 
TF-IDF* 720 (36–9674.25) 0.13 0.28 
T-SVD (800 components) 
Binary 1606 (41.75–15311.75) 0.13 0.26 
TF 29421 (2245.25–92871.5) 0.07 0.12 
TF-IDF* 385.5 (13–6211.25) 0.15 0.34 
T-SVD (1600 components) 
Binary 824.5 (9–12704.5) 0.17 0.33 
TF 29519.5 (1597.5–93890) 0.08 0.13 
TF-IDF* 219 (6–4145.75) 0.17 0.37 
*

Experiments for which results have also been included in Table 2.

IQR: interquartile range; TF: term frequency; TF-IDF: term frequency-inverse document frequency; T-SVD: truncated singular value decomposition.

Close Modal

or Create an Account

Close Modal
Close Modal