Spearman rank correlation ρ between the cosine similarity of sentence representations and the gold labels for various Textual Similarity (STS) tasks under the unsupervised setting. We use *-NLI to denote the model additionally trained on NLI datasets. ♯ indicates that results are reproduced by ourselves; § indicates results are taken from Reimers and Gurevych (2019); Surrogate are results for our proposed method.
Model . | STS12 . | STS13 . | STS14 . | STS15 . | STS16 . | STSb . | SICK-R . | Avg . |
---|---|---|---|---|---|---|---|---|
fully unsupervised without human labels | ||||||||
Avg. Glove embeddings§ | 55.14 | 70.66 | 59.73 | 68.25 | 63.66 | 58.02 | 53.76 | 61.32 |
Avg. Skip-Thought embeddings§ | 57.11 | 71.98 | 61.30 | 70.13 | 65.21 | 59.42 | 55.50 | 62.95 |
InferSent-Glove♯ | 52.86 | 66.75 | 62.15 | 72.77 | 66.87 | 68.03 | 65.65 | 65.01 |
Avg. BERT embeddings§ | 38.78 | 57.98 | 57.98 | 63.15 | 61.06 | 46.35 | 58.40 | 54.81 |
BERT [CLS]♯ | 20.16 | 30.01 | 20.09 | 36.88 | 38.08 | 16.50 | 42.63 | 29.19 |
BERTScore♯ | 54.60 | 50.11 | 57.74 | 70.79 | 64.58 | 57.58 | 51.37 | 58.11 |
DPR♯ | 53.98 | 56.00 | 57.83 | 66.68 | 67.43 | 58.53 | 61.85 | 60.33 |
BLEURT♯ | 70.16 | 64.97 | 57.41 | 72.91 | 70.01 | 69.81 | 58.46 | 66.25 |
Universal Sent Encoder♯ | 64.49 | 67.80 | 64.61 | 76.83 | 73.18 | 74.92 | 76.69 | 71.22 |
Origin | 72.41 | 74.30 | 75.45 | 78.45 | 79.93 | 78.47 | 79.49 | 76.93 |
Surrogatebase | 70.62 | 72.14 | 72.72 | 76.34 | 75.24 | 74.19 | 77.20 | 74.06 |
Surrogatelarge | 71.93 | 73.74 | 73.95 | 77.01 | 76.64 | 75.32 | 77.84 | 75.20 |
partially supervised without human labels but not the same domain | ||||||||
InferSent-NLI♯ | 50.48 | 67.75 | 62.15 | 72.77 | 66.87 | 68.03 | 65.65 | 64.81 |
BERT [CLS]-NLI♯ | 60.35 | 54.97 | 64.92 | 71.49 | 70.49 | 73.25 | 70.79 | 66.61 |
BERTScore-NLI♯ | 60.89 | 54.64 | 63.96 | 74.35 | 66.67 | 65.65 | 66.01 | 64.60 |
DPR-NLI♯ | 61.36 | 56.71 | 65.49 | 71.80 | 71.03 | 74.08 | 70.86 | 67.33 |
BLEURT-NLI♯ | 66.40 | 68.15 | 71.98 | 79.69 | 77.86 | 77.98 | 70.92 | 73.28 |
Universal Sent Ecoder-NLI♯ | 65.55 | 67.95 | 71.47 | 80.81 | 78.70 | 78.41 | 69.31 | 73.17 |
BERT-NLIbase♯ | 71.07 | 76.81 | 73.29 | 79.56 | 74.58 | 77.10 | 72.65 | 75.01 |
SBERT-NLIbase§ | 70.97 | 76.53 | 73.19 | 79.09 | 74.30 | 77.03 | 72.91 | 74.86 |
SRoBERTa-NLIbase§ | 71.54 | 72.49 | 70.80 | 78.74 | 73.69 | 77.77 | 74.46 | 74.21 |
Surrogate-NLIbase | 74.15 | 76.50 | 72.23 | 81.24 | 78.75 | 79.32 | 78.56 | 77.25 |
BERT-NLIlarge♯ | 71.62 | 77.40 | 72.69 | 78.61 | 75.28 | 77.83 | 72.64 | 75.15 |
SBERT-NLIlarge§ | 72.27 | 78.46 | 74.90 | 80.99 | 76.25 | 79.23 | 73.75 | 76.55 |
SRoBERTa-NLIlarge§ | 74.53 | 77.00 | 73.18 | 81.85 | 76.82 | 79.10 | 74.29 | 76.68 |
Surrogate-NLIlarge | 76.98 | 79.83 | 75.15 | 83.54 | 79.32 | 80.82 | 79.64 | 79.33 |
Model . | STS12 . | STS13 . | STS14 . | STS15 . | STS16 . | STSb . | SICK-R . | Avg . |
---|---|---|---|---|---|---|---|---|
fully unsupervised without human labels | ||||||||
Avg. Glove embeddings§ | 55.14 | 70.66 | 59.73 | 68.25 | 63.66 | 58.02 | 53.76 | 61.32 |
Avg. Skip-Thought embeddings§ | 57.11 | 71.98 | 61.30 | 70.13 | 65.21 | 59.42 | 55.50 | 62.95 |
InferSent-Glove♯ | 52.86 | 66.75 | 62.15 | 72.77 | 66.87 | 68.03 | 65.65 | 65.01 |
Avg. BERT embeddings§ | 38.78 | 57.98 | 57.98 | 63.15 | 61.06 | 46.35 | 58.40 | 54.81 |
BERT [CLS]♯ | 20.16 | 30.01 | 20.09 | 36.88 | 38.08 | 16.50 | 42.63 | 29.19 |
BERTScore♯ | 54.60 | 50.11 | 57.74 | 70.79 | 64.58 | 57.58 | 51.37 | 58.11 |
DPR♯ | 53.98 | 56.00 | 57.83 | 66.68 | 67.43 | 58.53 | 61.85 | 60.33 |
BLEURT♯ | 70.16 | 64.97 | 57.41 | 72.91 | 70.01 | 69.81 | 58.46 | 66.25 |
Universal Sent Encoder♯ | 64.49 | 67.80 | 64.61 | 76.83 | 73.18 | 74.92 | 76.69 | 71.22 |
Origin | 72.41 | 74.30 | 75.45 | 78.45 | 79.93 | 78.47 | 79.49 | 76.93 |
Surrogatebase | 70.62 | 72.14 | 72.72 | 76.34 | 75.24 | 74.19 | 77.20 | 74.06 |
Surrogatelarge | 71.93 | 73.74 | 73.95 | 77.01 | 76.64 | 75.32 | 77.84 | 75.20 |
partially supervised without human labels but not the same domain | ||||||||
InferSent-NLI♯ | 50.48 | 67.75 | 62.15 | 72.77 | 66.87 | 68.03 | 65.65 | 64.81 |
BERT [CLS]-NLI♯ | 60.35 | 54.97 | 64.92 | 71.49 | 70.49 | 73.25 | 70.79 | 66.61 |
BERTScore-NLI♯ | 60.89 | 54.64 | 63.96 | 74.35 | 66.67 | 65.65 | 66.01 | 64.60 |
DPR-NLI♯ | 61.36 | 56.71 | 65.49 | 71.80 | 71.03 | 74.08 | 70.86 | 67.33 |
BLEURT-NLI♯ | 66.40 | 68.15 | 71.98 | 79.69 | 77.86 | 77.98 | 70.92 | 73.28 |
Universal Sent Ecoder-NLI♯ | 65.55 | 67.95 | 71.47 | 80.81 | 78.70 | 78.41 | 69.31 | 73.17 |
BERT-NLIbase♯ | 71.07 | 76.81 | 73.29 | 79.56 | 74.58 | 77.10 | 72.65 | 75.01 |
SBERT-NLIbase§ | 70.97 | 76.53 | 73.19 | 79.09 | 74.30 | 77.03 | 72.91 | 74.86 |
SRoBERTa-NLIbase§ | 71.54 | 72.49 | 70.80 | 78.74 | 73.69 | 77.77 | 74.46 | 74.21 |
Surrogate-NLIbase | 74.15 | 76.50 | 72.23 | 81.24 | 78.75 | 79.32 | 78.56 | 77.25 |
BERT-NLIlarge♯ | 71.62 | 77.40 | 72.69 | 78.61 | 75.28 | 77.83 | 72.64 | 75.15 |
SBERT-NLIlarge§ | 72.27 | 78.46 | 74.90 | 80.99 | 76.25 | 79.23 | 73.75 | 76.55 |
SRoBERTa-NLIlarge§ | 74.53 | 77.00 | 73.18 | 81.85 | 76.82 | 79.10 | 74.29 | 76.68 |
Surrogate-NLIlarge | 76.98 | 79.83 | 75.15 | 83.54 | 79.32 | 80.82 | 79.64 | 79.33 |