. | Tokens . | Vocab. . | Vocabulary Population . | Long Memory . | |||
---|---|---|---|---|---|---|---|
Zipf’s Law f(r) ∝ r−α . | Heaps’ Law v(n) ∝ nβ . | Ebeling’s Method m(l) ∝ lη . | Taylor’s Law σ ∝ μζ . | Long Range Correlation c(s) ∝ s−ξ . | |||
Wikitext-2 (English, Wikipedia article) | |||||||
preprocessed data set | 2,088,628 | 33,278 | Yes | 0.75 (0.13) | 1.33 (0.10) | 0.62 (0.15) | 0.33 (0.04) |
original data set | 2,088,628 | 76,617 | Yes | 0.78 (0.09) | 1.33 (0.10) | 0.65 (0.11) | 0.32 (0.03) |
Penn Treebank (English, The Wall Street Journal news article) | |||||||
preprocessed data set | 887,521 | 10,000 | Yes | 0.70 (0.16) | 1.23 (0.06) | 0.56 (0.14) | 0.81 (0.24) |
original data set | 892,008 | 89,317 | Yes | 0.83 (0.07) | 1.20 (0.05) | 0.57 (0.06) | 0.60 (0.16) |
Shakespeare (old English collection of literature works) | |||||||
original text | 740,706 | 83,105 | Yes | 0.79 (0.07) | 1.24 (0.09) | 0.59 (0.05) | 0.13 (0.02) |
Hong Lou Meng (Chinese, literature work) | |||||||
original text | 703,034 | 18,312 | Yes | 0.74 (0.14) | 1.31 (0.07) | 0.58 (0.07) | 0.39 (0.04) |
. | Tokens . | Vocab. . | Vocabulary Population . | Long Memory . | |||
---|---|---|---|---|---|---|---|
Zipf’s Law f(r) ∝ r−α . | Heaps’ Law v(n) ∝ nβ . | Ebeling’s Method m(l) ∝ lη . | Taylor’s Law σ ∝ μζ . | Long Range Correlation c(s) ∝ s−ξ . | |||
Wikitext-2 (English, Wikipedia article) | |||||||
preprocessed data set | 2,088,628 | 33,278 | Yes | 0.75 (0.13) | 1.33 (0.10) | 0.62 (0.15) | 0.33 (0.04) |
original data set | 2,088,628 | 76,617 | Yes | 0.78 (0.09) | 1.33 (0.10) | 0.65 (0.11) | 0.32 (0.03) |
Penn Treebank (English, The Wall Street Journal news article) | |||||||
preprocessed data set | 887,521 | 10,000 | Yes | 0.70 (0.16) | 1.23 (0.06) | 0.56 (0.14) | 0.81 (0.24) |
original data set | 892,008 | 89,317 | Yes | 0.83 (0.07) | 1.20 (0.05) | 0.57 (0.06) | 0.60 (0.16) |
Shakespeare (old English collection of literature works) | |||||||
original text | 740,706 | 83,105 | Yes | 0.79 (0.07) | 1.24 (0.09) | 0.59 (0.05) | 0.13 (0.02) |
Hong Lou Meng (Chinese, literature work) | |||||||
original text | 703,034 | 18,312 | Yes | 0.74 (0.14) | 1.31 (0.07) | 0.58 (0.07) | 0.39 (0.04) |