Skip to Main Content
Table 3: 
Results for topic segmentation and single-label classification on four WikiSection data sets. n = 718 / 464 / 3,907 / 2,507 documents. Numbers are given as Pk on sentence level, micro-averaged F1 and MAP at segment-level. For methods without segmentation, we used newlines as segment boundaries (NL) and merged sections of same classes after prediction. Models marked with * are based on pre-trained distributional embeddings.
WikiSection-topicsen_diseasede_diseaseen_cityde_city
single-label classification27 topics25 topics30 topics27 topics
model configurationsegm.PkF1MAPPkF1MAPPkF1MAPPkF1MAP
Classification with newline prior segmentation 
PV>T* NL 35.6 31.7 47.2 36.0 29.6 44.5 22.5 52.9 63.9 27.2 42.9 55.5 
CNN>T* NL 31.5 40.4 55.6 31.6 38.1 53.7 13.2 66.3 76.1 13.7 63.4 75.0 
SEC>T+bow NL 25.8 54.7 68.4 25.0 52.7 66.9 21.0 43.7 55.3 20.2 40.5 52.2 
SEC>T+bloom NL 22.7 59.3 71.9 27.9 50.2 65.5 9.8 74.9 82.6 11.7 73.1 81.5 
SEC>T+emb* NL 22.5 58.7 71.4 23.6 50.9 66.8 10.7 74.1 82.2 10.7 74.0 83.0 
Classification and segmentation on plain text 
C99  37.4 n/a n/a 42.7 n/a n/a 36.8 n/a n/a 38.3 n/a n/a 
TopicTiling  43.4 n/a n/a 45.4 n/a n/a 30.5 n/a n/a 41.3 n/a n/a 
TextSeg  24.3 n/a n/a 35.7 n/a n/a 19.3 n/a n/a 27.5 n/a n/a 
PV>T* max 43.6 20.4 36.5 44.3 19.3 34.6 31.1 28.1 43.1 36.4 20.2 35.5 
PV>T* emd 39.2 32.9 49.3 37.4 32.9 48.7 24.9 53.1 65.1 32.9 40.6 55.0 
CNN>T* max 40.1 26.9 45.0 40.7 25.2 43.8 21.9 42.1 58.7 21.4 42.1 59.5 
SEC>T+bow max 30.1 40.9 58.5 32.1 38.9 56.8 24.5 28.4 43.5 28.0 26.8 42.6 
SEC>T+bloom max 27.9 49.6 64.7 35.3 39.5 57.3 12.7 63.3 74.3 26.2 58.9 71.6 
SEC>T+bloom emd 29.7 52.8 67.5 35.3 44.8 61.6 16.4 65.8 77.3 26.0 65.5 76.7 
SEC>T+bloom bemd 26.8 56.6 70.1 31.7 47.8 63.7 14.4 71.6 80.9 16.8 70.8 80.1 
SEC>T+bloom+rank* bemd 26.8 56.7 68.8 33.1 44.0 58.5 15.7 71.1 79.1 18.0 66.8 76.1 
SEC>T+emb* bemd 26.3 55.8 69.4 27.5 48.9 65.1 15.5 71.6 81.0 16.2 71.0 81.1 
WikiSection-topicsen_diseasede_diseaseen_cityde_city
single-label classification27 topics25 topics30 topics27 topics
model configurationsegm.PkF1MAPPkF1MAPPkF1MAPPkF1MAP
Classification with newline prior segmentation 
PV>T* NL 35.6 31.7 47.2 36.0 29.6 44.5 22.5 52.9 63.9 27.2 42.9 55.5 
CNN>T* NL 31.5 40.4 55.6 31.6 38.1 53.7 13.2 66.3 76.1 13.7 63.4 75.0 
SEC>T+bow NL 25.8 54.7 68.4 25.0 52.7 66.9 21.0 43.7 55.3 20.2 40.5 52.2 
SEC>T+bloom NL 22.7 59.3 71.9 27.9 50.2 65.5 9.8 74.9 82.6 11.7 73.1 81.5 
SEC>T+emb* NL 22.5 58.7 71.4 23.6 50.9 66.8 10.7 74.1 82.2 10.7 74.0 83.0 
Classification and segmentation on plain text 
C99  37.4 n/a n/a 42.7 n/a n/a 36.8 n/a n/a 38.3 n/a n/a 
TopicTiling  43.4 n/a n/a 45.4 n/a n/a 30.5 n/a n/a 41.3 n/a n/a 
TextSeg  24.3 n/a n/a 35.7 n/a n/a 19.3 n/a n/a 27.5 n/a n/a 
PV>T* max 43.6 20.4 36.5 44.3 19.3 34.6 31.1 28.1 43.1 36.4 20.2 35.5 
PV>T* emd 39.2 32.9 49.3 37.4 32.9 48.7 24.9 53.1 65.1 32.9 40.6 55.0 
CNN>T* max 40.1 26.9 45.0 40.7 25.2 43.8 21.9 42.1 58.7 21.4 42.1 59.5 
SEC>T+bow max 30.1 40.9 58.5 32.1 38.9 56.8 24.5 28.4 43.5 28.0 26.8 42.6 
SEC>T+bloom max 27.9 49.6 64.7 35.3 39.5 57.3 12.7 63.3 74.3 26.2 58.9 71.6 
SEC>T+bloom emd 29.7 52.8 67.5 35.3 44.8 61.6 16.4 65.8 77.3 26.0 65.5 76.7 
SEC>T+bloom bemd 26.8 56.6 70.1 31.7 47.8 63.7 14.4 71.6 80.9 16.8 70.8 80.1 
SEC>T+bloom+rank* bemd 26.8 56.7 68.8 33.1 44.0 58.5 15.7 71.1 79.1 18.0 66.8 76.1 
SEC>T+emb* bemd 26.3 55.8 69.4 27.5 48.9 65.1 15.5 71.6 81.0 16.2 71.0 81.1 
Close Modal

or Create an Account

Close Modal
Close Modal