Learning Fair Representations via Rate-Distortion Maximization

Abstract Text representations learned by machine learning models often encode undesirable demographic information of the user. Predictive models based on these representations can rely on such information, resulting in biased decisions. We present a novel debiasing technique, Fairness-aware Rate Maximization (FaRM), that removes protected information by making representations of instances belonging to the same protected attribute class uncorrelated, using the rate-distortion function. FaRM is able to debias representations with or without a target task at hand. FaRM can also be adapted to remove information about multiple protected attributes simultaneously. Empirical evaluations show that FaRM achieves state-of-the-art performance on several datasets, and learned representations leak significantly less protected attribute information against an attack by a non-linear probing network.


Introduction
Democratization of machine learning has led to deployment of predictive models for critical applications like credit approval (Ghailan et al., 2016) and college application reviewing (Basu et al., 2019).Therefore, it is important to ensure that decisions made by these models are fair towards different demographic groups (Mehrabi et al., 2021).Fairness can be achieved by ensuring that the demographic information does not get encoded in the representations used by these models (Blodgett et al., 2016;Elazar and Goldberg, 2018;Elazar et al., 2021).
However, controlling demographic information encoded in a model's representations is a challenging task for textual data.This is because natural language text is highly indicative of an author's demographic attributes even when it is not explicitly mentioned (Koppel et al., 2002;Burger et al., Figure 1: Illustration of unconstrained debiasing using FaRM.Representations are color-coded (in blue, red and green) according to their protected attribute class.Before debiasing (left), representations within each class are similar to each other (intra-class information content is low).Debiasing enforces the within class representations to be uncorrelated by increasing their information content.2011; Nguyen et al., 2013;Verhoeven and Daelemans, 2014;Weren et al., 2014;Rangel et al., 2016;Verhoeven et al., 2016;Blodgett et al., 2016).
In this work, we debias information about a protected attribute (e.g.gender, race) from textual data representations.Previous debiasing methods (Bolukbasi et al., 2016;Ravfogel et al., 2020) project representations in a subspace that does not reveal protected attribute information.These methods are only able to guard protected attributes against an attack by a linear function (Ravfogel et al., 2020).Other methods (Xie et al., 2017;Basu Roy Chowdhury et al., 2021) adversarially remove protected information while retaining information about a target attribute.However, they are difficult to train (Elazar and Goldberg, 2018) and require a target task at hand.
We present a novel debiasing technique, Fairness-aware Rate Maximization (FaRM), that removes demographic information by controlling the rate-distortion function of the learned representations.Intuitively, in order to remove information about a protected attribute from a set of representations, we want the representations from the same protected attribute class to be uncorrelated to each other.We achieve this by maximizing the number of bits (rate-distortion) required to encode representations with the same protected attribute.Figure 1 illustrates the process.The representations are shown as points in a two-dimensional feature space, color-coded according to their protected attribute class.FaRM learns a function φ(x) such that representations of the same protected class become uncorrelated and similar to other representations, thereby making it difficult to extract the information about the protected attribute from the learned representations.
We perform rate-distortion maximization based debiasing in the following setups: (a) unconstrained debiasing -we remove information about a protected attribute g while retaining remaining information as much as possible (e.g.debiasing gender information from word embeddings), and (b) constrained debiasing -we retain information about a target attribute y while removing information pertaining to g (e.g.removing racial information from representations during text classification).In the unconstrained setup, debiased representations can be used for different downstream tasks, whereas for constrained debiasing the user is interested only in the target task.For unconstrained debiasing, we evaluate FaRM for removing gender information from word embeddings and demographic information from text representations that can then be used for a downstream NLP task (we show their utility for biography and sentiment classification in our experiments).Our empirical evaluations show that representations learned using FaRM in an unconstrained setup leak significantly less protected attribute information compared to prior approaches against an attack by a non-linear probing network.
For constrained debiasing, FaRM achieves stateof-the-art debiasing performance on 3 datasets, and representations are able to guard protected attribute information significantly better than previous approaches.We also perform experiments to show that FaRM is able to remove multiple protected attributes simultaneously while guarding against intersectional group biases (Subramanian et al., 2021).To summarize, our main contributions are: • We present Fairness-aware Rate Maximization (FaRM) for debiasing of textual data representations in unconstrained and constrained setups, by controlling their rate-distortion functions.
• We empirically show FaRM leaks significantly less protected information against a non-linear probing attack, outperforming prior approaches.
• We present two variations of FaRM for debiasing multiple protected attributes simultaneously, which is also effective against an attack for intersectional group biases.

Related Work
Removing sensitive attributes from data representations for fair classification was initially introduced as an optimization task (Zemel et al., 2013).Subsequent works have used adversarial frameworks (Goodfellow et al., 2014) for this task (Zhang et al., 2018;Li et al., 2018;Xie et al., 2017;Elazar and Goldberg, 2018;Basu Roy Chowdhury et al., 2021).However, adversarial networks are difficult to train (Elazar and Goldberg, 2018) and cannot function without a target task at hand.Unconstrained debiasing frameworks focus on removing a protected attribute from representations, without relying on a target task.Bolukbasi et al. (2016) demonstrated that GloVe embeddings encode gender information, and proposed an unconstrained debiasing framework for identifying gender direction and neutralizing vectors along that direction.Building on this approach, Ravfogel et al. (2020) proposed INLP, a robust framework to debias representations by iteratively identifying protected attribute subspaces and projecting representations onto the corresponding nullspaces.However, these approaches fail to guard protected information against an attack by a non-linear probing network.Dev et al. (2021) showcased that nullspace projection approaches can be extended for debiasing in a constrained setup as well.
In contrast to prior works, we present a novel debiasing framework based on the principle of ratedistortion maximization.Coding rate maximization was introduced as an objective function by Ma et al. (2007) for image segmentation.It has also been used in explaining feature selection by deep networks (Macdonald et al., 2019).Recently, Yu et al. (2020) proposed maximal coding rate (MCR 2 ) based on rate-distortion theory, a representationlevel objective function that can serve as an alternative to empirical risk minimization methods.Our work is similar to MCR 2 as we learn representations using a rate-distortion framework, but instead of tuning representations for classification we remove protected attribute information from them.
Our framework performs debiasing by making representations of the same protected attribute class uncorrelated.To achieve this, we leverage a principled objective function called rate-distortion, to measure the compactness of a set of representations.In this section, we introduce the fundamentals of rate-distortion theory.1 Rate Distortion.In lossy data compression (Cover, 1999), the compactness of a random distribution is measured by the minimal number of binary bits required to encode it.A lossy coding scheme encodes a finite set of vectors Z = {z 1 , . . ., z n } ∈ R n×d from a distribution P (Z), such that the decoded vectors {ẑ i } n i=1 can be recovered up to a precision 2 .The rate-distortion function R(Z, ) measures the minimal number of bits per vector required to encode the sequence Z.
In case the vectors {z i } n i=1 are i.i.d.samples from a zero-mean multi-dimensional Gaussian distribution N (0, Σ), the optimal rate-distortion function is given as: where 1 n ZZ T = Σ is the estimate of covariance matrix Σ for the Gaussian distribution.As the eigenvalues of the matrices ZZ T and Z T Z are equal, the rate-distortion function R(Z, ) is the same for both of them (Ma et al., 2007).In most setups d n, therefore we use Z T Z for efficiently computing R(Z, ).
In rate-distortion theory, we need nR(Z, ) bits to encode n vectors of Z.The optimal codebook also depends on data dimension (d) and requires dR(Z, ) bits to encode.Therefore, a total of (n + d)R(Z, ) is bits required to encode the sequence Z. Ma et al. (2007) showed that this provides a tight bound even in cases where the underlying distribution P (Z) is degenerate.This enables the use of this loss function for real-world data, where the underlying distribution may not be well defined.
In general, a set of compact vectors (low information content) would require a small number of bits to encode, which would correspond to a small value of R(Z, ) and vice versa.Rate Distortion for a mixed distribution.In general, the set of vectors Z can be from a mixture distribution (e.g.feature representations for multiclass data).The rate-distortion function can be computed by splitting the data into multiple subsets: Z = Z 1 ∪Z 2 . ..∪Z k , based on their distribution.For each subset, we can compute the R(Z i , ) (Equation 1).To facilitate the computation, we define a membership matrix Π = {Π j } k j=1 as a set of k matrices to encode membership information in each subset Z j .The membership matrix Π j for each subset is a diagonal matrix defined as: where π ij ∈ [0, 1] denotes the probability of a vector z i belonging to the j-th subset and n is the number of vectors in the sequence Z.The matrices satisfy the constraints: The expected number of vectors in the j-th subset Z j is tr(Π j ) and the corresponding covariance matrix: 1 tr(Π j ) ZΠ j Z T .The overall ratedistortion function is given as: For multi-class data, where a vector z i can only be a member of a single class, we restrict π ij = {0, 1}, and therefore the covariance matrix for j-th subset is Z j Z j T .In general, if the representations within each subset Z j are similar to each other, they will have low intra-class variance, and it would correspond to a small R c (Z, |Π) and vice versa.

Fairness-Aware Rate Maximization
In this section, we describe FaRM to debias representations in unconstrained and constrained setups.

Unconstrained Debiasing using FaRM
In this setup, we aim to remove information about a protected attribute g from data representations X while retaining the remaining information.To achieve this, the debiased representations Z should have the following properties: (a) Intra-class Incoherence: Representations belonging to the same protected attribute class should be highly uncorrelated.This would make it difficult for a classifier to extract any information about g from the representations.Π g = ConstructMatrix(G) retrieve membership matrix using G 5: Update φ using gradients ∇ φ J u (Z, Π g ) 6: end for 7: Z debiased = φ(X) debiased representations 8: return φ debiasing network should be maximally informative about the remaining information.
Assuming there are k protected attribute classes, we can write Z = Z 1 ∪ . . .∪ Z k .To achieve (a), we need to ensure that the representations in a subset Z j belonging to the same protected class are dissimilar and have large intra-class variance.An increased intra-class variance would correspond to an increase in the number of bits to encode samples within each class and the rate-distortion function R c (Z, |Π g ) would be large.For (b), we want the representations Z to retain maximal possible information from the input X. Increasing information content in Z, would require a larger number of bits to encode it.This means that the rate-distortion R(Z, ) should also be large.
FaRM achieves (a) and (b) simultaneously by maximizing the following objective function: where the membership matrix Π g , is constructed using the protected attribute g (see Equation 2).
The unconstrained debiasing routine is described in Algorithm 1.We use a deep neural network φ as our feature map to obtain debiased representations z = φ(x).The objective function J u is sensitive to the scale of the representations.Therefore, we normalize the Frobenius norm of the representations to ensure individual input samples have an equal impact on the loss.We use layer normalization (Ba et al., 2016) to ensure that all representations have the same magnitude and lie on a sphere z i ∈ S d−1 (r) of radius r.The feature encoder φ is updated using gradients from the objective function J u .The debiased representations are retrieved by feeding input data X through the trained network φ.An illustration of the debiasing process in the unconstrained setup is shown in Figure 1.

Constrained Debiasing using FaRM
In this setup, we aim to remove information about a protected attribute g from data representations X while retaining information about a specific target attribute y.The learned representations should have the following properties: (a) Target-Class Informativeness: Representations should be maximally informative about the target task attribute y.(b) Inter-class Coherence: Representations from different protected attribute classes should be similar to each other.This would make it difficult to extract information about g from Z.
Our constrained debiasing setup is shown in Figure 3, where representations are retrieved from a feature map φ followed by a target task classifier f .In this setup, we achieve (a) by training f to predict the target class ŷ = f (z) and minimize the cross-entropy loss CE(ŷ, y), where y is the ground-truth target label.For (b), we need to ensure that representations from different protected classes are similar and overlap in the representation space.This is achieved by maximizing the rate R c (Z, |Π g ) while minimizing R(Z, ).Maximizing R c (Z, |Π g ) ensures samples in the same protected class are dissimilar and have large intraclass variance.However, simply increasing intraclass variance does not guarantee the overlap of different protected class representations -as the overall feature space can expand and representations can still be discriminative w.r.t g.Therefore, we also minimize R(Z, ) ensuring a lower number of bits are required to encode all representations Z, thereby making the representation space compact.This process is illustrated visually in Figure 2. The blue and red circles correspond to representations from two protected classes.The gray arrows are induced by the term R c (Z, |Π g ) that encourages Figure 3: Constrained debiasing setup using FaRM.
Representation z retrieved from the feature map φ is used to predict the target label and control the ratedistortion objective function.
the representations to be dissimilar to samples in the same protected class.The green arrows induced by R(Z, ) try to make the representation space more compact.To achieve this objective, FaRM adds a rate-distortion based regularization constraint to the target classification loss.Overall, FaRM achieves (a) and (b) simultaneously by maximizing the following objective function: where ŷ is the target prediction label, y is the ground-truth label and λ is a hyperparameter. 2We select the hyperparameters using grid search and discuss the hyperparameter sensitivity of FaRM in Section 8. We follow a similar routine to obtain debiased representations in the constrained setup as shown in Algorithm 1.

Experimental Setup
In this section, we discuss the datasets, experimental setup, and metrics used for evaluating FaRM.

Datasets
We evaluate FaRM using several datasets.Among these, the DIAL and Biographies datasets are used for evaluating both constrained and unconstrained debiasing.PAN16 and GloVe embeddings are used only for constrained and unconstrained debiasing respectively.We use the same train-test split as prior works for all datasets.
(a) DIAL (Blodgett et al., 2016) is a Twitter-based sentiment classification dataset.Each tweet is associated with sentiment and mention labels (treated as the target attribute in constrained evaluation) and "race" information (protected attribute) of the author.The sentiment labels are "happy" or "sad" and the race categories are "African-American English" (AAE) or "Standard American English" (SAE  2020) to debias the most common 150,000 Glove word embeddings (Zhao et al., 2018).For training, we use the 7500 most malebiased, female-biased, and neutral words (determined by the magnitude of the word vector's projection onto the gender direction, which is the largest principal component of the space of vectors formed using the difference gendered word vector pairs).

Implementation details
We use a mutli-layer neural network with ReLU non-linearity as our feature map φ in the unconstrained setup.This setup is optimized using stochastic gradient descent with a learning rate of 0.001 and momentum of 0.9.For constrained debiasing, we used BERT base as φ, and a 2-layer neural network as f .Constrained setup is optimized using AdamW (Loshchilov and Hutter, 2019) optimizer with a learning rate of 2×10 −5 .We set λ = 0.01 for all experiments.Hyperparameters were tuned on the development set of the respective datasets.Our models were trained on a single Nvidia Quadro RTX 5000 GPU.

Probing Metrics
Following (Elazar and Goldberg, 2018;Ravfogel et al., 2020;Basu Roy Chowdhury et al., 2021), we evaluate the quality of our debiasing by probing the learned representations for the protected attribute g and target attribute y.In our experiments, we probe all representations using a non-linear classifier.We use an MLP Classifier from the scikitlearn library (Pedregosa et al., 2011).We report the Accuracy and Minimum Description Length (MDL) (Voita and Titov, 2020) for predicting g and y.A large MDL signifies that more effort is needed by a probing network to achieve a certain performance.Hence, we expect debiased representations to have a large MDL for protected attribute g and a small MDL for predicting target attribute y.Also, we expect a high accuracy for y and low accuracy for g.

Group Fairness Metrics
TPR-GAP.Based on the notion of equalized odds, De-Arteaga et al. ( 2019) introduced TPR-GAPwhich measures the true positive rate (TPR) difference of a classifier between two protected groups.TPR-GAP for a target attribute label y is: where y is the target attribute, g is a binary protected attribute with possible values g, ḡ, and ŷ denotes the predicted target attribute.Romanov et al. (2019) proposed a single bias score for the classifier called Gap RMS g , which is defined as: where is Y is the set of target attribute labels.Demographic Parity (DP).DP measures the difference in prediction w.r.t to protected attribute g.
where g, ḡ are possible values of the binary protected attribute g and Y is the set of possible target attribute labels.Bickel et al. (1975) illustrated that notions of demographic parity and equalized odds can strongly differ in a real-world scenario.For representation learning, Zhao and Gordon (2019) demonstrated an inherent tradeoff between the utility and fairness of representations.TPR-GAP described above is not a good indicator of fairness if y and g are correlated, as debiasing would lead to a drop in target task performance as well.For our experiments, we compare models using both metrics for completeness.However, like prior works, in some cases we observe conflicting results due to the tradeoff.6 Results: Unconstrained Debiasing We evaluate FaRM for unconstrained debiasing in three different setups: word embedding debiasing, and debiasing text representations for biographies and sentiment classification.For the classification tasks, we retrieve text representations from a pretrained encoder, debias them using FaRM (without taking the target task into account) and evaluate the debiased representations by probing for y and g.In all settings, we train the feature encoder φ, and evaluate the retrieved representations Z debiased = φ(X).All tables mention the expected trend of a metric using ↑ (higher) or ↓ (lower).

Word Embedding Debiasing
We revisit the problem of debiasing gender information from word embeddings introduced by Bolukbasi et al. (2016).
Setup.We debias gender information from GloVe embeddings using a 4-layer neural network with ReLU non-linearity as the feature map φ(x).We discuss the choice of the feature map φ in Section 8.
Results.Table 1 presents the result of debiasing word embeddings for baseline INLP (Ravfogel et al., 2020) and FaRM.We observe that when compared with INLP, FaRM reduces the accuracy of the network by an absolute margin of 32.4% and achieves a steep increase in MDL.FaRM is able to guard the protected attribute against an attack by a non-linear probing network (near-random accuracy).We also report the rank of the resulting word embedding matrix.The information content of a matrix is captured by its rank (maximal number of linearly independent columns).An increase in rank of the resultant embedding matrix indicates that FaRM is able to retain more information in the representations, in general, compared to INLP.
Visualization.We visualize the t-SNE (Van der Maaten and Hinton, 2008) projections of Glove embeddings before and after debiasing in Figures 4a  and 4b respectively.Female and male-biased word vectors are represented by red and blue dots re- spectively.The figures clearly demonstrate that the gendered vectors are not separable after debiasing.
In order to quantify the improvement, we perform k-means clustering with K = 2 (one for each gender label).We compute the V-measure (Rosenberg and Hirschberg, 2007) -a measure to quantify the overlap between clusters.V-measure in the original space drops from 99.9% to 0.006% using FaRM (compared to 0.31% using INLP).This indicates that debiased representations from FaRM are more difficult to disentangle.We further analyze the quality of the debiased word embeddings in Section 8.

Biography Classification
Next, we evaluate FaRM by debiasing text representations in an unconstrained setup and using the representations for fair biography classification.
Setup.We obtain the text representations using two methods: FastText (Joulin et al., 2017) and BERT (Devlin et al., 2019).For FastText, we sum the individual token representations in each biography.For BERT, by retrieving the final layer hidden representation above the [CLS] token from a pretrained BERT base model.We choose the feature map φ(x) as a 4-layer neural network with ReLU non-linearity.
Results.  the accuracy for identifying professions (target attribute) using the debiased embeddings.3This is possibly because in this dataset, gender is highly correlated with the profession and removing gender information results in loss of profession information.Zhao and Gordon (2019) identified this phenomenon by noting the tradeoff between learning fair representations and performing well on target task, when protected and target attributes are correlated.The results in this setup (Table 2) demonstrate this phenomenon.In unconstrained debiasing, we remove information about protected attribute from the representations without taking into account the target task.As a result target task performance suffers from more debiasing. 4This calls for constrained debiasing for such datasets.In Section 7, we show that FaRM is able to retain target performance while debiasing for this dataset in the constrained setup.

Controlled Sentiment Classification
Lastly, for the DIAL dataset, we perform unconstrained debiasing in a controlled setting.
Setup.Following the setup of Barrett et  AAE / 80% SAE (AAE and SAE are protected class labels mentioned in Section 5.1).We train DeepMoji (Felbo et al., 2017) followed by a 1-layer MLP for sentiment classification.We retrieve representations from the DeepMoji encoder and debias them using FaRM.For debiasing, we choose the feature map φ(x) to be a 7-layer neural network with ReLU non-linearity.After debiasing, we train a non-linear MLP to investigate the quality of debiasing.We evaluate the debiasing performance of FaRM in various stages of label imbalance.
Results.The results of this experiment are reported in Table 3.We see that FaRM is able to achieve the best fairness scores -an improvement in Gap RMS g (≥ 12.5%) and DP (≥ 21%) across all setups.Considering the accuracy of identifying the protected attribute -race -we can see that FaRM significantly reduces leakage of race information by an absolute margin of 11%-17% across different target class splits.FaRM also achieves similar performance to INLP in sentiment (target attribute) classification.We observe that the fairness score for FaRM deteriorates with an increasing correlation between the protected attribute and the target attribute.In cases where the target and the protected attributes are highly correlated (split=70% and 80%), we observe a low sentiment classification accuracy (for both INLP and FaRM) compared to the original classifier.This is similar to the observation made for the Biographies dataset and shows that it is difficult to debias information about protected attribute while retaining overall information about the target task when the protected attribute is highly correlated with the target attribute.In the constrained setup, we observe FaRM is able to retain target performance (Section 7).
7 Results: Constrained Debiasing In this section, we present the results of constrained debiasing using FaRM.For all experiments, we use a BERT base model as φ and a 2-layer neural network with ReLU non-linearity as f (Figure 3).

Single Attribute Debiasing
In this setup, we focus on debiasing a single protected attribute g while retaining information about the target attribute y.
Setup.We conduct experiments on 3 datasets: DIAL (Blodgett et al., 2016), PAN16 (Rangel et al., 2016), andBiographies (De-Arteaga et al., 2019).We experiment with different target and protected attribute configurations in DIAL (y: Sentiment/Mention, g: Race) and PAN16 (y: Mention, g: Gender/Age).For Biographies, we use the same setup as described in Section 6.2.For the protected attribute g, we report ∆F1 -the difference between F1-score and the majority baseline.We also report fairness metrics: Gap RMS g and Demographic Parity (DP) of the learned classifier.We compare FaRM with the state-of-the-art AdS (Basu Roy Chowdhury et al., 2021), BERT base sequence classifier, and pre-trained BERT base representations.Results.Table 4 presents the results of this experiment.We observe that in general, FaRM achieves good fairness performance while maintaining target performance.In particular, it achieves the best DP scores across all setups.In PAN16, FaRM achieves perfect fairness in terms of protected attribute probing accuracy ∆F1 = 0 with comparable performance to AdS in terms of MDL of g.In the Biographies dataset, the task accuracy of FaRM is the same as AdS but FaRM outperforms AdS in fairness metrics.We also observe that for this dataset, some baselines performed very well on one (but not both) of the two fairness metrics which can be attributed to the inherent trade-off between them (see Section 5.4).However, FaRM achieves a good balance between the two metrics.Overall, this shows that FaRM is able to robustly remove sensitive information about the protected attribute while achieving good target task performance.we evaluate the approaches for two different configurations of target and proteccted variables, and report the performances in each setting.FaRM outperforms AdS (Basu Roy Chowdhury et al., 2021) in DP metric in all setups, while achieving comparable target task performance.

Multiple Attribute Debiasing
In this setup, we focus on debiasing multiple protected attributes g i simultaneously, while retaining information about target attribute y.We evaluate FaRM on the PAN16 dataset with y as Mention, g 1 as Gender, and g 2 as Age.Subramanian et al. (2021) showed that debiasing a categorical attribute can still reveal information about intersectional groups (e.g. if age (young/old) and gender (male/female) are two categorical protected attributes, then (age=old, gender=male) is an intersectional group).We report the ∆F1/MDL scores for probing intersectional groups.
Approach.We present two variations of FaRM to remove multiple attributes simultaneously in a constrained setup.Assuming there are N protected attributes, the variations are discussed below: (a) N -partition: In this variation, we compute a membership matrix Π g i for each protected attribute g i .We modify Equation 4 as follows: ) 1-partition: Unlike the previous setup, we can consider each protected attribute g i as an indepen-dent variable and combine them to form a single protected attribute G.For each input instance, we can represent the i th protected attribute as a one-hot vector g i ∈ R |g i | (where |g i | is the dimension of protected attribute g i ).Then the combined vector G ∈ R (|g 1 |+...+|g N |) can be obtained by concatenating individual vectors g i .Since G is a concatenation of multiple vectors, we normalize G such that all of its elements sum to 1. Therefore each element of G is either 0 or 1 N .We use G to construct the partition function Π G , which captures information about N attributes simultaneously.Each component of Π G satisfies: N j=1 Π G j = I n×n and π ij ∈ {0, 1 N }.The resultant objective function takes the same form as in Equation 4with the modified parition function J c (Z, Y, Π G ).
Results.We present the results of debiasing multiple attributes in Table 5.We observe that FaRM improves upon AdS' ∆F1-score of age and gender, with N -partition and 1-partition setups performing equally well.The performance on the target task is comparable with AdS, although there is a slight rise in MDL.It is important to note that even though AdS performs decently well in preventing leakage about g 1 and g 2 , it still leaks a significant amount of information about the intersectional groups.In both of its configurations, FaRM is able to prevent leakage of intersectional biases while considering Table 5: Evaluation results for debiasing multiple protected attributes using FaRM.Both configurations of FaRM outperform AdS (Basu Roy Chowdhury et al., 2021) in guarding protected attribute and intersectional group biases.the protected attributes independently.This shows that robustly removing information about multiple attributes helps in preventing leakage about intersectional groups as well.

Model Analysis
In this section, we present several analysis experiments to evaluate the functioning of FaRM.
Robustness to label corruption.We evaluate the robustness of FaRM by randomly sub-sampling instances from the dataset and modifying the protected attribute label.In Figure 5a, we report the protected attribute leakage (∆F1 score) from the debiased word embeddings with varying fractions of training set label corruption.We observe that FaRM's performance degrades with an increase in label corruption.This is expected as, at high corruption ratios, most of the protected attribute labels are wrong, resulting in poor performance.
In the constrained setup (Figure 5b), we observe that FaRM is able to debias protected attribute information (y-axis scale in Fig. 5b and 5a are different) even at high corruption ratios.We believe this enhanced performance (compared to unconstrained setup) is due to the additional supervision in the form of target loss, which enables FaRM to learn robust representations even with corrupted protected attribute labels.Sensitivity to λ.We measure the sensitivity of FaRM's performance w.r.t λ (Equation 4) in the constrained setup.In Figure 6, we show the MDL of the target attribute y (in blue) and protected attribute g (in red) for DIAL and PAN16 for different λ.We observe that when 10 −4 ≤ λ ≤ 1, the performance of FaRM does not change much.For λ = 10, MDL for y is quite large, showcasing that the model does not converge on the target task.This is expected as the regularization term (Equation 4) is much larger than CE(ŷ, y) term, and boosting it further with λ = 10 makes it difficult for the target task loss to converge.Similarly, when λ ≤ 10 −5 , the regularization term is much smaller compared to CE(ŷ, y), there is a substantial drop in MDL for g.However, we show that FaRM achieves good performance over a broad spectrum of λ.Therefore, reproducing the desired results does not require extensive hyperparameter tuning.Probing Word Embeddings.A limitation of using FaRM for debiasing word embeddings is that distances in the original embedding space are not preserved.The Mazur-Ulam theorem (Fleming and Jamison, 2003) states that isometry for a mapping φ : V → W is preserved only if the function φ is affine.FaRM uses a non-linear feature map φ(x).Therefore, distances cannot be preserved.A linear map φ(x) is also not ideal because it does not guard protected attributes against an attack by a non-linear probing network.We investigate the utility of debiased embeddings by performing the following experiments: (a) Word Similarity Evaluation: In this experiment, we evaluate the debiased embeddings on the fol- lowing datasets: SimLex-999 (Hill et al., 2015), WordSim-353 (Agirre et al., 2009a), and MTurk-771 (Halawi et al., 2012).In Table 6, we report the Spearman correlation between the gold similarity scores of word pairs and the cosine similarity scores obtained before (top row) and after (bottom row) debiasing Glove embeddings.We observe a significant drop in correlation with gold scores, which is expected since debiasing is removing some information from the embeddings.In spite of the drop, there is a reasonable correlation with the gold scores indicating that FaRM is able to retain a significant degree of semantic information.(b) Part-of-speech tagging: We evaluate debiased embeddings for detecting POS tags in a sentence using the Universal tagset (Petrov et al., 2012).GloVe embeddings achieve an F1-score of 95.2% and FaRM achieves an F1-score of 93.0% on this task.This shows FaRM's debiased embeddings still possess a significant amount of morphological information about the language.(c) Sentiment Classification: We perform sentiment classification using word embeddings on the IMDb movies dataset (Maas et al., 2011).GloVe embeddings achieve an accuracy of 80.9%, while debiased embeddings achieve an accuracy of 74.6%.The drop in this task is slightly more compared to POS tagging, but FaRM is still able to achieve reasonable performance on this task.These experiments showcase that even though exact distances aren't preserved using FaRM, the debiased embeddings still retain relevant information useful in downstream tasks.Evolution of loss components.We evaluate how FaRM's loss components evolve during training.In the unconstrained setup for GloVe debiasing, we evaluate how the evolution of components -R(Z, ) (in red) and R c (Z, |Π g ) (in black).In Figure 7a, we observe that both loss terms start increasing simultaneously, with their difference remaining constant in the final iterations.Next in the constrained setup, the evolution of target loss CE(ŷ, y) and bias loss R(Z, ) − R c (Z, |Π g ) for DIAL dataset is shown in Figure 7b.We observe that the bias term converges first followed by the  target loss.This is expected as the magnitude of rate-distortion loss is larger than target loss, which forces the model to minimize it first.
Limitations.A limitation of FaRM is that we lack a principled feature map φ(x) selection approach.
In the unconstrained setup, we relied on empirical observations and found that a 4-layer ReLU network sufficed for GloVe and Biographies, while a 7layer network was required for DIAL.For the constrained setup, BERT base proved to be expressive enough to perform debiasing in all setups.Future works can explore white-box network architectures (Chan et al., 2022) for debiasing.

Conclusion
We proposed Fairness-aware Rate Maximization (FaRM), a novel debiasing technique based on the principle of rate-distortion maximization.FaRM is effective in removing protected information from representations in both unconstrained and constrained debiasing setups.Empirical evaluations show that FaRM outperforms prior works in debiasing representations by a large margin on several datasets.Extensive analysis showcase that FaRM is sample efficient, and robust to label corruptions and minor hyperparameter changes.Future works can focus on leveraging FaRM for achieving fairness in complex tasks like language generation.

Ethical Considerations
In

A Appendix
A.1 Ablation: Unconstrained Debiasing In this section, we present an ablation of FaRM where we utilize protected information loss of the unconstrained setup to mimic constrained debiasing.The modified objective function for unconstrained debiasing is shown as: We evaluate this objective function for word embedding debiasing task.We obtain gender prediction accuracy of 49.8% from the debiased embeddings, which slightly is better than the reported results in Table 1.However, rank of the debiased embedding matrix is 2 (original dimension: 300), which shows that most of the information (including gender) has been destroyed during debiasing.
This shows the importance of maximizing the overall rate-distortion term R(Z, ), which helps in retaining diverse information from the original embedding space.

A.2 Word Similarity Scores
We evaluate the debiased word embeddings retrieved from FaRM on a large set of word similarity datasets.In Table 7, we report the similarity scores compared to GloVe embeddings.In all datasets, we observe a significant drop in correlation with the similarity scores, which is expected as debiasing using FaRM does not retain the subspace structure of the representations.We observe that debiased word embeddings from FaRM still retains a significant degree of semantic information, and can be useful in downstream tasks as demonstrated in Section 8.

Figure 2 :
Figure 2: Visualization for regularization loss in J c for constrained debiasing.The red and blue circles represent 2D representations from two different protected class.The gray arrows are induced by R c (Z, |Π g ) term and the green ones are induced by R(Z, ) term.

Figure 4 :
Figure 4: Projections of Glove embeddings before (left) and after (right) debiasing.Intial female and male biased representations are shown in red and blue respectively.

Figure 5 :
Figure 5: Performance of FaRM with varying fraction of corrupted training set labels in (a) unconstrained and (b) constrained debiasing setups.

Figure 7 :
Figure 7: Loss evolution in the unconstained setup (left) where both terms -R(Z, ) (red) and R c (Z, |Π g ) (black) start increasing simultaneously.In the constrained setup (right) with λ = 0.01 -bias loss (black) starts converging earlier than the target loss (red).

Table 4 :
Evaluation results for constrained debiasing on DIAL, PAN16 and Biographies.For DIAL and PAN16,

Table 6 :
Word similarity scores before and after debiasing GloVe embeddings using FaRM.
This may not be ideal while removing protected information about gender, which can extend beyond binary categories.Currently, we lack datasets with fine-grained gender annotation.It is important to collect data and develop techniques, that would benefit everyone in our community.the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pages 427-431, Valencia, Spain.Association for Computational Linguistics.Timothy Baldwin, and Trevor Cohn.2018.Towards robust and privacy-preserving text representations.In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Papers), pages 25-30, Melbourne, Australia.Association for Computational Linguistics.Ilya Loshchilov and Frank Hutter.2019.Decoupled weight decay regularization.In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019.OpenReview.net.Minh-Thang Luong, Richard Socher, and Christopher D Manning.2013.Better word representations with recursive neural networks for morphology.In Proceedings of the seventeenth conference on computational natural language learning, pages 104-113.Yi Ma, Harm Derksen, Wei Hong, and John Wright.2007.Segmentation of multivariate mixed data via lossy data coding and compression.IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(9):1546-1562.Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts.2011.Learning word vectors for sentiment analysis.In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 142-150, Portland, Oregon, USA.Association for Computational Linguistics.