Distribution of entity types across the TAB corpus, along with their corresponding identifier type (direct identifier, quasi-identifier, or no need to mask) and confidential status. The parentheses in the first column refers to the proportion of entities of that type, in percent. The parentheses in the three other columns refer to the percentage of entities within this entity type that are respectively labeled as direct identifier, quasi-identifier, or have a confidential status.
Entity type . | # mentions . | # direct . | # quasi . | # confidential . |
---|---|---|---|---|
DATETIME | 53,668 (34.6) | 23 (0.04) | 48,086 (89.6) | 530 (0.99) |
ORG | 40,695 (26.3) | 20 (0.05) | 12,880 (31.6) | 866 (2.1) |
PERSON | 24,322 (15.7) | 4,182 (17.2) | 15,839 (65.1) | 413 (1.7) |
LOC | 9,982 (6.4) | 1 (0.01) | 6,908 (69.2) | 19 (0.2) |
DEM | 8,683 (5.6) | 1 (0.01) | 4,166 (48.0) | 2,278 (26.2) |
MISC | 7,044 (4.5) | 28 (0.4) | 3,437 (48.8) | 1,125 (16.0) |
CODE | 6,471 (4.2) | 2,484 (38.4) | 3,558 (55.0) | 18 (0.3) |
QUANTITY | 4,141 (2.7) | 0 (0.0) | 3,370 (81.4) | 87 (2.1) |
Total | 155,006 (100.0) | 6,739 (4.4) | 98,244 (63.4) | 5,336 (3.4) |
Entity type . | # mentions . | # direct . | # quasi . | # confidential . |
---|---|---|---|---|
DATETIME | 53,668 (34.6) | 23 (0.04) | 48,086 (89.6) | 530 (0.99) |
ORG | 40,695 (26.3) | 20 (0.05) | 12,880 (31.6) | 866 (2.1) |
PERSON | 24,322 (15.7) | 4,182 (17.2) | 15,839 (65.1) | 413 (1.7) |
LOC | 9,982 (6.4) | 1 (0.01) | 6,908 (69.2) | 19 (0.2) |
DEM | 8,683 (5.6) | 1 (0.01) | 4,166 (48.0) | 2,278 (26.2) |
MISC | 7,044 (4.5) | 28 (0.4) | 3,437 (48.8) | 1,125 (16.0) |
CODE | 6,471 (4.2) | 2,484 (38.4) | 3,558 (55.0) | 18 (0.3) |
QUANTITY | 4,141 (2.7) | 0 (0.0) | 3,370 (81.4) | 87 (2.1) |
Total | 155,006 (100.0) | 6,739 (4.4) | 98,244 (63.4) | 5,336 (3.4) |