Skip to Main Content

This sample corresponds to the original 3LB corpus (Civit and Martí 2004), that can be considered to be a preliminary version of AnCora-ES. The set of 817 nominalizations consists of those occurrences in this sub-corpus. Despite coming from different sources, the 100kw corpus and the full 500kw corpus are comparable as is shown in Table 1.

Table 1

Descriptive content of AnCora-ES and its 100Kw subset. In each cell the values corresponding to the subset and the whole corpus are present.


Min
Max
Mean
Standard Deviation
sense/lemma 1/1 13/13 2.19/1.86 1.54/1.31 
examples/lemma 1/1 239/255 19.99/14.15 30.76/26.44 
length sentences 4/4 149/149 39.14/39.51 10.69/12.08 

Min
Max
Mean
Standard Deviation
sense/lemma 1/1 13/13 2.19/1.86 1.54/1.31 
examples/lemma 1/1 239/255 19.99/14.15 30.76/26.44 
length sentences 4/4 149/149 39.14/39.51 10.69/12.08 

Close Modal

or Create an Account

Close Modal
Close Modal