Skip to Main Content
Table 1: 
Data set characteristics for disease (German: Krankheit) and city (German: Stadt). Headings denotes the number of distinct section and subsection headings among the documents. Topics stands for the number of topic labels after synset clustering. Coverage denotes the proportion of headings covered by topics; the remaining headings are labeled as other.
Data setdiseasecity
languageendeende
total docs 3.6k 2.3k 19.5k 12.5k 
avg sents per doc 58.5 45.7 56.5 39.9 
avg sects per doc 7.5 7.2 8.3 7.6 
headings 8.5k 6.1k 23.0k 12.2k 
topics 27 25 30 27 
coverage 94.6% 89.5% 96.6% 96.1% 
Data setdiseasecity
languageendeende
total docs 3.6k 2.3k 19.5k 12.5k 
avg sents per doc 58.5 45.7 56.5 39.9 
avg sects per doc 7.5 7.2 8.3 7.6 
headings 8.5k 6.1k 23.0k 12.2k 
topics 27 25 30 27 
coverage 94.6% 89.5% 96.6% 96.1% 
Close Modal

or Create an Account

Close Modal
Close Modal