Skip to Main Content
Table 5. 
Descriptive statistics for the classification variables in the synthetic training and evaluation data separated for true-negative and true-positive
VariableSameMedianMeanMinMax
spell_research 0.6368 
spell_research 0.0657 
spell_hospital 0.3745 
spell_hospital 0.1008 
prop_educ 0.9507 
prop_educ 0.3238 
age_sub 31 32.5199 20 91 
age_sub 36 37.8844 20 102 
right_age 0.8996 
right_age 0.4546 
same_ror_y5 0.7297 
same_ror_y5 0.0156 
first_spell_before −6 −6.9672 −40 37 
first_spell_before −11 −11.4541 −45 39 
right_first_spell_before 0.7112 
right_first_spell_before 0.4242 
VariableSameMedianMeanMinMax
spell_research 0.6368 
spell_research 0.0657 
spell_hospital 0.3745 
spell_hospital 0.1008 
prop_educ 0.9507 
prop_educ 0.3238 
age_sub 31 32.5199 20 91 
age_sub 36 37.8844 20 102 
right_age 0.8996 
right_age 0.4546 
same_ror_y5 0.7297 
same_ror_y5 0.0156 
first_spell_before −6 −6.9672 −40 37 
first_spell_before −11 −11.4541 −45 39 
right_first_spell_before 0.7112 
right_first_spell_before 0.4242 

Note: Descriptive statistics on the distribution of features used to classify true-positive matched entries in the IEB and DNB data in the synthetic training and evaluation data set. The data are split into two samples: true-positive matches based on unique name-surname combinations and true-negative matches based on entries with the same name, but different surname. The true-positive matches are indicated by “Same” = 1.

Close Modal

or Create an Account

Close Modal
Close Modal