Correlation r and uncertainty prediction quality metrics (Cal, NLPD, and Shp) on three STS datasets (STS-B, EBMSASS, and MedSTS) and a SA rating dataset (Yelp), with SBERT and BERT sentence embeddings with various task-specific layers: Cosine similarity = calculate cosine similarity between vectors representing S1 and S2; LR = single-layer linear regression; Bayesian LR = Bayesian linear regression; and Sparse GP Regression = Sparse Gaussian process regression. n/a indicates that the method doesn’t produce an uncertainty estimate to apply the given metric to.
. | STS-B test . | EBMSASS test . | MedSTS test . | Yelp test . | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
r ↑ . | Cal ↓ . | NLPD ↓ . | Shp ↓ . | r ↑ . | Cal ↓ . | NLPD ↓ . | Shp ↓ . | r ↑ . | Cal ↓ . | NLPD ↓ . | Shp ↓ . | r ↑ . | Cal ↓ . | NLPD ↓ . | Shp ↓ . | |
SBERT Cosine similarity | 0.842 | n/a | n/a | n/a | 0.773 | n/a | n/a | n/a | 0.784 | n/a | n/a | n/a | — | n/a | n/a | n/a |
SBERT LR | 0.835 | n/a | n/a | n/a | 0.743 | n/a | n/a | n/a | 0.776 | n/a | n/a | n/a | 0.666 | n/a | n/a | n/a |
SBERT Bayesian LR | 0.810 | 0.046 | 0.648 | 1.632 | 0.688 | 0.443 | 1.095 | 2.156 | 0.740 | 0.101 | 0.801 | 2.092 | 0.671 | 0.019 | 0.447 | 0.753 |
SBERT Sparse GP Regression | 0.847 | 0.065 | 0.614 | 1.621 | 0.788 | 0.195 | 0.541 | 1.627 | 0.781 | 0.073 | 0.499 | 1.453 | 0.689 | 0.049 | 0.573 | 1.507 |
BERT LR | 0.868 | n/a | n/a | n/a | 0.914 | n/a | n/a | n/a | 0.858 | n/a | n/a | n/a | 0.826 | n/a | n/a | n/a |
BERT ConvLR | 0.855 | n/a | n/a | n/a | 0.922 | n/a | n/a | n/a | 0.846 | n/a | n/a | n/a | 0.822 | n/a | n/a | n/a |
BERT Bayesian LR (BBB) | 0.848 | 0.521 | 0.005 | 0.914 | 0.669 | 1177.2 | 0.005 | 0.848 | 0.514 | 6594.3 | 0.006 | 0.827 | 0.531 | 3908.6 | 0.083 | |
BERT Bayesian ConvLR (BBB) | 0.849 | 0.495 | 2061.0 | 0.015 | 0.898 | 0.618 | 327.3 | 0.010 | 0.835 | 0.506 | 1037.2 | 0.017 | 0.797 | 1.513 | 119.2 | 0.089 |
BERT LR MC dropout | 0.868 | 0.181 | 4.659 | 0.215 | 0.921 | 0.054 | 0.036 | 0.140 | 0.859 | 0.163 | 4.118 | 0.168 | 0.827 | 0.267 | 7.285 | 0.153 |
BERT ConvLR MC dropout | 0.855 | 0.202 | 5.830 | 0.209 | 0.922 | 0.093 | 2.137 | 0.085 | 0.852 | 0.219 | 6.402 | 0.146 | 0.823 | 0.291 | 8.214 | 0.150 |
. | STS-B test . | EBMSASS test . | MedSTS test . | Yelp test . | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
r ↑ . | Cal ↓ . | NLPD ↓ . | Shp ↓ . | r ↑ . | Cal ↓ . | NLPD ↓ . | Shp ↓ . | r ↑ . | Cal ↓ . | NLPD ↓ . | Shp ↓ . | r ↑ . | Cal ↓ . | NLPD ↓ . | Shp ↓ . | |
SBERT Cosine similarity | 0.842 | n/a | n/a | n/a | 0.773 | n/a | n/a | n/a | 0.784 | n/a | n/a | n/a | — | n/a | n/a | n/a |
SBERT LR | 0.835 | n/a | n/a | n/a | 0.743 | n/a | n/a | n/a | 0.776 | n/a | n/a | n/a | 0.666 | n/a | n/a | n/a |
SBERT Bayesian LR | 0.810 | 0.046 | 0.648 | 1.632 | 0.688 | 0.443 | 1.095 | 2.156 | 0.740 | 0.101 | 0.801 | 2.092 | 0.671 | 0.019 | 0.447 | 0.753 |
SBERT Sparse GP Regression | 0.847 | 0.065 | 0.614 | 1.621 | 0.788 | 0.195 | 0.541 | 1.627 | 0.781 | 0.073 | 0.499 | 1.453 | 0.689 | 0.049 | 0.573 | 1.507 |
BERT LR | 0.868 | n/a | n/a | n/a | 0.914 | n/a | n/a | n/a | 0.858 | n/a | n/a | n/a | 0.826 | n/a | n/a | n/a |
BERT ConvLR | 0.855 | n/a | n/a | n/a | 0.922 | n/a | n/a | n/a | 0.846 | n/a | n/a | n/a | 0.822 | n/a | n/a | n/a |
BERT Bayesian LR (BBB) | 0.848 | 0.521 | 0.005 | 0.914 | 0.669 | 1177.2 | 0.005 | 0.848 | 0.514 | 6594.3 | 0.006 | 0.827 | 0.531 | 3908.6 | 0.083 | |
BERT Bayesian ConvLR (BBB) | 0.849 | 0.495 | 2061.0 | 0.015 | 0.898 | 0.618 | 327.3 | 0.010 | 0.835 | 0.506 | 1037.2 | 0.017 | 0.797 | 1.513 | 119.2 | 0.089 |
BERT LR MC dropout | 0.868 | 0.181 | 4.659 | 0.215 | 0.921 | 0.054 | 0.036 | 0.140 | 0.859 | 0.163 | 4.118 | 0.168 | 0.827 | 0.267 | 7.285 | 0.153 |
BERT ConvLR MC dropout | 0.855 | 0.202 | 5.830 | 0.209 | 0.922 | 0.093 | 2.137 | 0.085 | 0.852 | 0.219 | 6.402 | 0.146 | 0.823 | 0.291 | 8.214 | 0.150 |