Skip to Main Content
Table 2: 

Attribute probabilities for GPT2-XL and its self-debiased variant (+SD) both with regular attribute descriptions and keywords (kw) on the challenging subset of RealToxicityPrompts. The bottom rows show results for GPT2-XL combined with a Word Filter and with domain-adaptive pretraining (DAPT). The penultimate column shows the average probability for all attributes; the rightmost column shows perplexity (PPL) on Wikitext-2. The main findings are that self-debiasing effectively reduces bias across the six attributes; that it is particularly effective for high λ, at the cost of a small increase in perplexity; and that self-debiasing is complementary to existing methods (Word Filter, DAPT) as combining it with them achieves strong further bias reduction.

Attribute probabilities for GPT2-XL and its self-debiased variant (+SD) both with regular attribute descriptions and keywords (kw) on the challenging subset of RealToxicityPrompts. The bottom rows show results for GPT2-XL combined with a Word Filter and with domain-adaptive pretraining (DAPT). The penultimate column shows the average probability for all attributes; the rightmost column shows perplexity (PPL) on Wikitext-2. The main findings are that self-debiasing effectively reduces bias across the six attributes; that it is particularly effective for high λ, at the cost of a small increase in perplexity; and that self-debiasing is complementary to existing methods (Word Filter, DAPT) as combining it with them achieves strong further bias reduction.
Attribute probabilities for GPT2-XL and its self-debiased variant (+SD) both with regular attribute descriptions and keywords (kw) on the challenging subset of RealToxicityPrompts. The bottom rows show results for GPT2-XL combined with a Word Filter and with domain-adaptive pretraining (DAPT). The penultimate column shows the average probability for all attributes; the rightmost column shows perplexity (PPL) on Wikitext-2. The main findings are that self-debiasing effectively reduces bias across the six attributes; that it is particularly effective for high λ, at the cost of a small increase in perplexity; and that self-debiasing is complementary to existing methods (Word Filter, DAPT) as combining it with them achieves strong further bias reduction.
Close Modal

or Create an Account

Close Modal
Close Modal