Skip to Main Content
Table 4: 

Evaluation results for constrained debiasing on Dial, Pan16, and Biographies. For Dial and Pan16, we evaluate the approaches for two different configurations of target and proteccted variables, and report the performances in each setting. FaRM outperforms AdS (Basu Roy Chowdhury et al., 2021) in DP metric in all setups, while achieving comparable target task performance.

MethodDial
Sentiment (y)Race (g)FairnessMention (y)Race (g)Fairness
F1↑MDL↓ΔF1↓MDL↑DP↓GapgRMSF1↑MDL↓ΔF1↓MDL↑DP↓GapgRMS
BERTbase (pre-trained) 63.9 300.7 10.9 242.6 0.41 0.20 66.1 290.1 24.6 258.8 0.20 0.10 
BERTbase (fine-tuned) 76.9 99.0 18.4 176.2 0.30 0.14 81.7 49.1 28.7 199.2 0.06 0.03 
AdS 72.9 56.9 5.2 290.6 0.43 0.21 81.1 7.6 21.7 270.3 0.06 0.03 
FaRM 73.2 17.9 0.2 296.5 0.26 0.14 78.8 3.1 0.3 324.8 0.06 0.03 
MethodDial
Sentiment (y)Race (g)FairnessMention (y)Race (g)Fairness
F1↑MDL↓ΔF1↓MDL↑DP↓GapgRMSF1↑MDL↓ΔF1↓MDL↑DP↓GapgRMS
BERTbase (pre-trained) 63.9 300.7 10.9 242.6 0.41 0.20 66.1 290.1 24.6 258.8 0.20 0.10 
BERTbase (fine-tuned) 76.9 99.0 18.4 176.2 0.30 0.14 81.7 49.1 28.7 199.2 0.06 0.03 
AdS 72.9 56.9 5.2 290.6 0.43 0.21 81.1 7.6 21.7 270.3 0.06 0.03 
FaRM 73.2 17.9 0.2 296.5 0.26 0.14 78.8 3.1 0.3 324.8 0.06 0.03 
Close Modal

or Create an Account

Close Modal
Close Modal