Table 1:
Data set characteristics for disease (German: Krankheit) and city (German: Stadt). Headings denotes the number of distinct section and subsection headings among the documents. Topics stands for the number of topic labels after synset clustering. Coverage denotes the proportion of headings covered by topics; the remaining headings are labeled as other.
Data setdiseasecity
languageendeende
total docs 3.6k 2.3k 19.5k 12.5k
avg sents per doc 58.5 45.7 56.5 39.9
avg sects per doc 7.5 7.2 8.3 7.6
topics 27 25 30 27
coverage 94.6% 89.5% 96.6% 96.1%
Data setdiseasecity
languageendeende
total docs 3.6k 2.3k 19.5k 12.5k
avg sents per doc 58.5 45.7 56.5 39.9
avg sects per doc 7.5 7.2 8.3 7.6