Property statistics, probing accuracies, and the influence of the amnesic intervention on the model’s distribution over words. dep: dependency edge identity; f-pos and c-pos: fine-grained and coarse POS tags; phrase start and phrase end: beginning and end of phrases. Rand refers to replacing our INLP-based projection with removal of an equal number of random directions from the representation. The number of iterations per task can be inferred from: N.dir/N.classes.
. | . | dep . | f-pos . | c-pos . | ner . | phrase start . | phrase end . |
---|---|---|---|---|---|---|---|
Properties | N. dir | 738 | 585 | 264 | 133 | 36 | 22 |
N. classes | 41 | 45 | 12 | 19 | 2 | 2 | |
Majority | 11.44 | 13.22 | 31.76 | 86.09 | 59.25 | 58.51 | |
Probing | Vanilla | 76.00 | 89.50 | 92.34 | 93.53 | 85.12 | 83.09 |
LM-Acc | Vanilla | 94.12 | 94.12 | 94.12 | 94.00 | 94.00 | 94.00 |
Rand | 12.31 | 56.47 | 89.65 | 92.56 | 93.75 | 93.86 | |
Selectivity | 73.78 | 92.68 | 97.26 | 96.06 | 96.96 | 96.93 | |
Amnesic | 7.05 | 12.31 | 61.92 | 83.14 | 94.21 | 94.32 | |
LM-DKL | Rand | 8.11 | 4.61 | 0.36 | 0.08 | 0.01 | 0.01 |
Amnesic | 8.53 | 7.63 | 3.21 | 1.24 | 0.01 | 0.01 |
. | . | dep . | f-pos . | c-pos . | ner . | phrase start . | phrase end . |
---|---|---|---|---|---|---|---|
Properties | N. dir | 738 | 585 | 264 | 133 | 36 | 22 |
N. classes | 41 | 45 | 12 | 19 | 2 | 2 | |
Majority | 11.44 | 13.22 | 31.76 | 86.09 | 59.25 | 58.51 | |
Probing | Vanilla | 76.00 | 89.50 | 92.34 | 93.53 | 85.12 | 83.09 |
LM-Acc | Vanilla | 94.12 | 94.12 | 94.12 | 94.00 | 94.00 | 94.00 |
Rand | 12.31 | 56.47 | 89.65 | 92.56 | 93.75 | 93.86 | |
Selectivity | 73.78 | 92.68 | 97.26 | 96.06 | 96.96 | 96.93 | |
Amnesic | 7.05 | 12.31 | 61.92 | 83.14 | 94.21 | 94.32 | |
LM-DKL | Rand | 8.11 | 4.61 | 0.36 | 0.08 | 0.01 | 0.01 |
Amnesic | 8.53 | 7.63 | 3.21 | 1.24 | 0.01 | 0.01 |