This sample corresponds to the original 3LB corpus (Civit and Martí 2004), that can be considered to be a preliminary version of AnCora-ES. The set of 817 nominalizations consists of those occurrences in this sub-corpus. Despite coming from different sources, the 100kw corpus and the full 500kw corpus are comparable as is shown in Table 1.
Descriptive content of AnCora-ES and its 100Kw subset. In each cell the values corresponding to the subset and the whole corpus are present.
. | Min . | Max . | Mean . | Standard Deviation . |
---|---|---|---|---|
sense/lemma | 1/1 | 13/13 | 2.19/1.86 | 1.54/1.31 |
examples/lemma | 1/1 | 239/255 | 19.99/14.15 | 30.76/26.44 |
length sentences | 4/4 | 149/149 | 39.14/39.51 | 10.69/12.08 |
. | Min . | Max . | Mean . | Standard Deviation . |
---|---|---|---|---|
sense/lemma | 1/1 | 13/13 | 2.19/1.86 | 1.54/1.31 |
examples/lemma | 1/1 | 239/255 | 19.99/14.15 | 30.76/26.44 |
length sentences | 4/4 | 149/149 | 39.14/39.51 | 10.69/12.08 |