Skip to Main Content
Table 1: 

STS/SA rating/QE-DA datasets. Train, Test, Dev Size = number of text pairs, range = label range. In practice, QE-DA is normalised by z-score.

DatasetSize (train, test, dev)RangeDomain
STS-B (2017) 5749, 1379, 1500 [0,5] general 
MedSTS (2018) 750, 318, — [0,5] clinical 
N2C2-STS (2019) 1642, 412, — [0,5] clinical 
BIOSSES (2017) 100, —, — [0,4] biomedical 
EBMSASS (2019) 700, 300, — [1,5] biomedical 
 
Yelp (2018) 7000, 1500, 1500, [1,5] product 
PeerRead (2018) 713, 290, — [1,5] paper 
 
WMT en-zh (2020) 7000, 1000, 1000 [0,100] high-resource 
WMT ru-en (2020) 7000, 1000, 1000 [0,100] medium-resource 
WMT si-en (2020) 7000, 1000, 1000 [0,100] low-resource 
DatasetSize (train, test, dev)RangeDomain
STS-B (2017) 5749, 1379, 1500 [0,5] general 
MedSTS (2018) 750, 318, — [0,5] clinical 
N2C2-STS (2019) 1642, 412, — [0,5] clinical 
BIOSSES (2017) 100, —, — [0,4] biomedical 
EBMSASS (2019) 700, 300, — [1,5] biomedical 
 
Yelp (2018) 7000, 1500, 1500, [1,5] product 
PeerRead (2018) 713, 290, — [1,5] paper 
 
WMT en-zh (2020) 7000, 1000, 1000 [0,100] high-resource 
WMT ru-en (2020) 7000, 1000, 1000 [0,100] medium-resource 
WMT si-en (2020) 7000, 1000, 1000 [0,100] low-resource 
Close Modal

or Create an Account

Close Modal
Close Modal