TACRED evaluation of the plausability of explanations, which measures the overlap between machine explanations and human annotations. For each method, we pick the higher F1 score between the two human annotators.
Approach . | Precision . | Recall . | F1 . |
---|---|---|---|
Attention | 41.39 | 20.60 | 26.50 |
Saliency Mapping | 18.73 | 35.58 | 23.41 |
LIME | 14.31 | 26.03 | 18.09 |
Unsupervised Rationale | 4.73 | 69.66 | 8.30 |
SHAP | 13.86 | 22.85 | 16.79 |
CXPlain | 28.84 | 55.06 | 36.48 |
Greedy Adding | 31.59 | 33.52 | 30.16 |
Our Approach | 74.72 | 61.20 | 62.05 |
Approach . | Precision . | Recall . | F1 . |
---|---|---|---|
Attention | 41.39 | 20.60 | 26.50 |
Saliency Mapping | 18.73 | 35.58 | 23.41 |
LIME | 14.31 | 26.03 | 18.09 |
Unsupervised Rationale | 4.73 | 69.66 | 8.30 |
SHAP | 13.86 | 22.85 | 16.79 |
CXPlain | 28.84 | 55.06 | 36.48 |
Greedy Adding | 31.59 | 33.52 | 30.16 |
Our Approach | 74.72 | 61.20 | 62.05 |