On the Relationships Between the Grammatical Genders of Inanimate Nouns and Their Co-Occurring Adjectives and Verbs

We use large-scale corpora in six different gendered languages, along with tools from NLP and information theory, to test whether there is a relationship between the grammatical genders of inanimate nouns and the adjectives used to describe those nouns. For all six languages, we find that there is a statistically significant relationship. We also find that there are statistically significant relationships between the grammatical genders of inanimate nouns and the verbs that take those nouns as direct objects, as indirect objects, and as subjects. We defer a deeper investigation of these relationships for future work.


Introduction
In many languages, nouns possess grammatical genders. When a noun refers to an animate object, its grammatical gender typically reflects the biological sex or gender identity of that object (Zubin and Köpcke, 1986;Corbett, 1991;Kramer, 2014). For example, in German, the word for a boss is grammatically feminine when it refers to a woman, but grammatically masculine when it refers to a man-Chefin and Chef, respectively. But inanimate nouns (i.e., nouns that refer to inanimate objects) also possess grammatical genders. Any German speaker will tell you that the word for a bridge, Brücke, is grammatically feminine, even though bridges have neither biological sexes nor gender identities. Historically, the grammatical genders of inanimate nouns have been considered more idiosyncratic and less meaningful than the grammatical genders of animate nouns (Brugmann, 1889;Bloomfield, equal contribution in this scientific whirlwind 1933; Fox, 1990;Aikhenvald, 2000). However, some cognitive scientists have reopened this discussion by using laboratory experiments to test whether speakers of gendered languages reveal gender stereotypes (Sera et al., 1994)-for example, and most famously, when choosing adjectives to describe inanimate nouns .
Although laboratory experiments are highly informative, they typically involve small sample sizes. In this paper, we therefore use large-scale corpora and tools from NLP and information theory to test whether there is a relationship between the grammatical genders of inanimate nouns and the adjectives used to describe those nouns. Specifically, we calculate the mutual information (MI)-a measure of the mutual statistical dependence between two random variables-between the grammatical genders of inanimate nouns and the adjectives that describe them (i.e., share a dependency arc labeled AMOD) using large-scale corpora in six different gendered languages (specifically, German, Italian, Polish, Portuguese, Russian, and Spanish). For all six languages, we find that the MI is statistically significant, meaning that there is a relationship.
We also test whether there are relationships between the grammatical genders of inanimate nouns and the verbs that take those nouns as direct objects, as indirect objects, and as subjects. For all six languages, we find that there are statistically significant relationships for the verbs that take those nouns as direct objects and as subjects. For five of the six languages, we also find that there is statistically significant relationship for the verbs that take those nouns as indirect objects, but because of the small number of noun-verb pairs involved, we caution against reading too much into this finding.
To contextualize our findings, we test whether there are statistically significant relationships between the grammatical genders of inanimate nouns and the cases and numbers of these nouns. A priori, we do not expect to find statistically significant relationships, so these tests can be viewed as a baseline of sorts. As expected, for each of the six languages, there are no statistically significant relationships.
To provide further context, we also repeat all tests for animate nouns-a "skyline" of sorts-finding that for all six languages there is a statistically significant relationship between the grammatical genders of animate nouns and the adjectives used to describe those nouns. We also find that there are statistically significant relationships between the grammatical genders of animate nouns and the verbs that take those nouns as direct objects, as indirect objects, and as subjects. All of these relationships have effect sizes (operationalized as normalized MI values) that are larger than the effect sizes for inanimate nouns.
We emphasize that the practical significance and implications of our findings require a deeper investigation. Most importantly, we do not investigate the characteristics of the relationships that we find. This means that we do not know whether these relationships are characterized by gender stereotypes, as argued by some cognitive scientists. We also do not engage with the ways that historical and sociopolitical factors affect the grammatical genders possessed by either animate or inanimate nouns (Fodor, 1959;Ibrahim, 2014).

Grammatical Gender
Languages lie along a continuum with respect to whether nouns possess grammatical genders. Languages with no grammatical genders, like Turkish, lie on one end of this continuum, while languages with tens of gender-like classes, like Swahili (Corbett, 1991), lie on the other. In this paper, we focus on six different gendered languages for which large-scale corpora are readily available: German, Italian, Polish, Portuguese, Russian, and Spanishall languages of Indo-European descent. Three of these languages (Italian, Portuguese, and Spanish) have two grammatical genders-masculine and feminine-while the other two have three grammatical genders-masculine, feminine, and neuter.
All six languages exhibit gender agreement, meaning that words are marked with morphological suffixes that reflect the grammatical genders of their surrounding nouns (Corbett, 2006 Because the German word for a fork, Gabel, is grammatically feminine, the German translation uses the feminine determiner, die. Had Gabel been masculine, the German translation would have used the masculine determiner, der. Similarly, because the Spanish word for a fork, tenedor, is grammatically masculine, the Spanish translation uses the masculine determiner, el, instead of the feminine determiner, la. As we explain in Section 3, we lemmatize each corpus to ensure that our tests do not simply reflect the presence of gender agreement.

Grammatical Gender & Meaning
Although some scholars have described the grammatical genders possessed by inanimate nouns as "creative" and meaningful (Grimm, 1890; Wheeler, 1899), many scholars have considered them to be idiosyncratic (Brugmann, 1889;Bloomfield, 1933) or arbitrary (Maratsos, 1979, 317). In an overview of this work, Dye et al. (2017) wrote, "As often as not, the languages of the world assign [inanimate] objects into seemingly arbitrary [classes]. . . William of Ockham considered gender to be a meaningless, unnecessary aspect of language." Bloomfield (1933) shared this viewpoint, stating that "[t]here seems to be no practical criterion by which the gender of a noun in German, French, or Latin [can] be determined." Indeed, adult language learners often have particular difficulty mastering the grammatical genders of inanimate nouns (Franceschina 2005, Ch.4, DeKeyser 2005Montrul et al. 2008), which suggests that their meanings are not straightforward.
In particular, the grammatical genders possessed by inanimate nouns might affect the ways that speakers of gendered languages conceptualize the objects referred to by those nouns (Jakobson, 1959;Clarke et al., 1981;Ervin-Tripp, 1962;Konishi, 1993;Sera et al., 1994Sera et al., , 2002Vigliocco et al., 2005;Bassetti, 2007)-although we note that this viewpoint is somewhat contentious (Hofstätter, 1963;Bender et al., 2011;McWhorter, 2014). Neo-Whorfian cognitive scientists hold a particularly strong variant of this viewpoint, arguing that that the grammatical genders possessed by inanimate nouns prompt speakers of gendered languages to rely on gender stereotypes when choosing adjectives to describe those nouns (Boroditsky and Schmidt, 2000;Boroditsky et al., 2002;Boroditsky, 2003;Semenuks et al., 2017). Most famously,  claim to have conducted a laboratory experiment showing that speakers of German choose stereotypically feminine adjectives to describe, for example, bridges, while speakers of Spanish choose stereotypically masculine adjectives, reflecting the fact that in German, the word for a bridge, Brücke, is grammatically feminine, while in Spanish, the word for a bridge, puente, is grammatically masculine.  took these findings to be a relatively strong confirmation of the existence of a stereotype effect-i.e., that speakers of gendered languages reveal gender stereotypes when choosing adjectives to describe inanimate nouns. That said, the experiment has not gone unchallenged. Indeed, Mickan et al. (2014) reported two unsuccessful replication attempts.

Laboratory Experiments vs. Corpora
Traditionally, studies of grammatical gender and meaning have relied on laboratory experiments. This is for two reasons: 1) laboratory experiments can be tightly controlled, and 2) they enable scholars to measure speakers' immediate, real-time speech production. However, they also typically involve small sample sizes and, in many cases, somewhat artificial settings. In contrast, large-scale corpora of written text enable scholars to measure even relatively weak correlations via writers' text production in natural, albeit less tightly controlled, settings. They also facilitate the discovery of correlations that hold across languages with disparate histories, cultural contexts, and even gender systems. As a result, large-scale corpora have proven useful for studying a wide variety of languagerelated phenomena (e.g., Featherston and Sternefeld, 2007;Kennedy, 2014;.
In this paper, we assume that a writer's choice of words in written text is as informative as a speaker's choice of words in a laboratory experiment, despite the obvious differences between these settings. Consequently, we use large-scale corpora and tools from NLP and information theory, enabling us to test for the presence of even relatively weak relationships involving the grammatical genders of inanimate nouns across multiple different gendered languages. We therefore argue that our findings complement, rather than supersede, laboratory experiments.

Related Work
Our paper is not the first to use large-scale corpora and tools from NLP to investigate gender and language. Many scholars have studied the ways that societal norms and stereotypes, including gender norms and stereotypes, can be reflected in representations of distributional semantics derived from large-scale corpora, such as word embeddings (Bolukbasi et al., 2016;Caliskan et al., 2017;Garg et al., 2018;Zhao et al., 2018). More recently,  found that the grammatical genders of inanimate nouns in eighteen different languages were correlated with their lexical semantics. Dye et al. (2017)   tion theory to reject the idea that the grammatical genders of nouns separate those nouns into coherent categories, arguing instead that grammatical genders are only meaningful in that they systematically facilitate communication efficiency by reducing nominal entropy. Also relevant to our paper is the work of Kann (2019), who proposed a computational approach to testing whether there is a relationship between the grammatical genders of inanimate nouns and the words that co-occur with those nouns, operationalized via word embeddings. However, in contrast to our findings, they found no evidence for the presence of such a relationship.

Data Preparation
We use the May, 2018 dump of Wikipedia to create a corpus for each of the six different gendered languages (i.e., German, Italian, Polish, Portuguese, Russian, and Spanish). Although Wikipedia is not the most representative data source, this choice yields language-specific corpora that are roughly parallel-i.e., they refer to the same objects, but are not direct translations of each other (which could lead to artificial word choices). We use UDPipe 1.0 to tokenize each corpus (Straka et al., 2016). 1 We dependency parse the corpus for each language using a language-specific dependency parser (Andor et al., 2016;Alberti et al., 2017), trained using Universal Dependencies treebanks (Nivre et al., 2017). An example dependency tree is shown in Figure 1. We then extract all noun-adjective pairs (dependency arcs labeled AMOD) and noun-verb pairs from each of the six corpora; for verbs, we extract three types of pairs, reflecting the fact that 1 http://ufal.mff.cuni.cz/udpipe nouns can be direct objects (dependency arcs labeled DOBJ), indirect objects (dependency arcs labeled IOBJ), or subjects (dependency arcs labeled NSUBJ) of verbs. We discard all pairs that contain a noun that isn't present in WordNet (Princeton University, 2010). 2 We label the remaining nouns as "animate" or "inanimate" according to WordNet.
Next, we lemmatize all words (i.e., nouns, adjectives, and verbs). Each word is factored into a set of lexical features consisting of a lemma, or canonical morphological form, and a bundle of three morphological features corresponding to the grammatical gender, number, and case of that word. For example, the German word for a fork, Gabel, is grammatically feminine, singular, and genitive. For nouns, we discard the lemmas themselves and retain only the morphological features; for adjectives and verbs, we retain the lemmas and discard the morphological features.
For adjectives and verbs, lemmatizing is especially important because it ensures that our tests do not simply reflect the presence of gender agreement, as we describe in Section 2.1. However, this means that if the lemmatizer fails, then our tests may simply reflect gender agreement despite our best efforts. To guard against this, we use a state-of-the-art lemmatizer (Müller et al., 2015), trained for each language using Universal Dependencies treebanks (Nivre et al., 2017). We expect that when the lemmatizer fails, the resulting lemmata will be low-frequency. We try to exclude lemmatization failures from our calculations by discarding low-frequency lemmata. For each language, we rank the adjective lemmata by their token counts and retain only the highest-ranked lemmata (in rank order) that account for 90% of the adjective tokens; we then discard all nounadjective pairs that do not contain one of these lemmata. We repeat the same process for verbs.
Finally, to ensure that our tests reflect the most salient relationships, we also discard low-frequency inanimate nouns and, separately, low-frequency animate nouns using the same process. We provide counts of the remaining noun-adjective and noun-verb pairs in Table 3 (for  inanimate nouns) and Table 4 (for animate nouns).

Methodology
For each language ∈ {de, it, pl, pt, ru, es}, we define V ADJ to be the set of adjective lemmata represented in the noun-adjective pairs retained for that language as defined above. We similarly define V VERB to be the set of verb lemmata represented in the noun-verb pairs retained for that language as described above. We then define V VERB-DOBJ ⊂ V VERB , V VERB-IOBJ ⊂ V VERB , and V VERB-SUBJ ⊂ V VERB to be the sets of verbs that take the nouns as direct objects, as indirect objects, and as subjects, respectively. We also define G to be the set of grammatical genders for that language (e.g., G es = {MSC, FEM}), C to be the set of cases (e.g., C de = {NOM, ACC, GEN, DAT}), and N to be the set of numbers (e.g., N pt = {PL, SG}). Finally, we define fourteen random variables: A i and A a are V ADJ -valued random variables, D i and D a are V VERB-DOBJ -valued random variables, I i and I a are V VERB-IOBJ -valued random variables, S i and S a are V VERB-SUBJ -valued random variables, G i and G a are G -valued random variables, C i and C a are Cvalued random variables, and N i and N a are Nvalued random variables. The subscripts "i" and "a" denote inanimate and animate nouns, respectively.
To test whether there is a relationship between the grammatical genders of inanimate nouns and the adjectives used to describe those nouns for language , we calculate the mutual information (MI)a measure of the mutual statistical dependence between two random variables-between G i and A i : where all probabilities are calculated with respect to inanimate nouns only. If G i and A i are independent-i.e., there is no relationship between them-then MI(G i ; A i ) = 0; if G i and A i are maximally dependent then MI(G i ; A i ) = min{H(G i ), H(A i )}, where H(G i ) is the entropy of G i and H(A i ) is the entropy of A i . For simplicity, we use plug-in estimates for all probabilities (i.e., empirical probabilities), deferring the use of more sophisticated estimators for future work. We note that MI(G i , A i ) can be calculated in O |G | · |V ADJ | time; however, |G | is negligible (i.e, two or three) so the main cost is |V ADJ |.
To test for statistical significance, we perform a permutation test. Specifically, we permute the grammatical genders of the inanimate nouns 10,000 times and, for each permutation, recalculate the MI between G i and A i using the permuted genders. We obtain a p-value by calculating the percentage of permutations that have a higher MI than the MI obtained using the non-permuted genders; if the p-value is less than 0.05, then we treat the relationship between G i and A i as statistically significant.
Because the maximum possible MI between any pair of random variables depends on the entropies of those variables, MI values are not comparable across pairs of random variables. We therefore also calculate the normalized MI (NMI) between G i and A i by normalizing MI(G i , A i ) to lie between zero and one. The most obvious choice of normalizer is the maximum possible MI-i.e., min{H(G i ), H(A i )}-however, various other normalizers have been proposed, each of which has different advantages and disadvantages (Gates et al., 2019). We therefore calculate six different variants of NMI(G i , A i ) using the following normalizers: where M i is the number of non-unique (inanimate) noun-adjective pairs retained for that language.
To test whether there are relationships between the grammatical genders of inanimate nouns and the verbs that take those nouns as direct objects, as indirect objects, and as subjects, we calculate MI(G i , D i ), MI(G i , I i ), and MI(G , S i ). Again, all probabilities are calculated with respect to inanimate nouns only, and we perform permutation tests to test for statistical significance. We also calculate six NMI variants for each of the three pairs of random variables, using normalizers that are analogous to those in Eq. (2) through Eq. (7).
As a baseline, we test whether there are relationships between the grammatical genders of inanimate nouns and the cases and numbers of those nouns-i.e., we calculate MI(G i , C i ) and MI(G i , N i ) using probabilities that are calculated with respect to inanimate nouns only. Again, we perform permutation tests (but we do not expect that there will be statistically significant relationships), and we calculate six NMI variants for each pair of random variables using normalizers that are analogous to those in Eq. (2) through Eq. (7).
Finally, we calculate MI(G a , A a ), MI(G a , D a ), MI(G a , I a ), MI(G a , S a ), MI(G a , C a ), and MI(G a , N a )) using probabilities calculated with respect to animate nouns only. The first five of these are intended to serve as a "skyline," while the last two are intended to serve as a sanity check (i.e., we expect them to be close to zero, as with inanimate nouns). Again, we perform permutation tests to test for statistical significance, and we calculate six NMI variants for each pair of random variables.

Results
In the first row of Table 1, we provide the MI between G i and A i for each language ∈ {de, it, pl, pt, ru, es}. For all six languages, MI(G i , A i ) is statistically significant (i.e., p < 0.05), meaning that there is a relationship between the grammatical genders of inanimate nouns and the adjectives used to describe those nouns. Rows 2-4 of Table 1 contain MI(G i , D i ), MI(G i , I i ), and MI(G , S i ) for each language. For all six languages, MI(G i , D i ) and MI(G i , S i ) are statistically significant (i.e., p < 0.05). For five of the six languages, MI(G i , I i ) is statistically significant, but because of the small number of noun-verb pairs involved, we caution against reading too much into this finding. We note that direct objects are closest to verbs in analyses of constituent structures, followed by subjects and then indirect objects (Chomsky, 1957;Adger, 2003). Finally, the last two rows of Table 1 contain MI(G i , C i ) and MI(G i , N i ), respectively, for each language. We do not find any statistically significant relationships for either case or number.
To facilitate comparisons, each subplot in Figure 2 contains six variants of NMI(G i , A i ), NMI(G i , D i ), and NMI(G i , S i ), calculated using normalizers that are analogous to those in Eq. (2) through Eq. (7), for a single language ∈ {de, it, pl, pt, ru, es}. (We omit NMI(G i , I i ) from each plot because of the small number of noun-verb pairs involved.) For ∈ {it, pl, pt, es}, NMI(G i , A i ) is larger than NMI(G i , D i ) and NMI(G i , S i ), regardless of the normalizer. For ∈ {it, pl}, NMI(G i , S i ) is larger than NMI(G i , D i ); NMI(G pt i , D pt i ) is larger than NMI(G pt i , S pt i ); and NMI(G es i , D es i ) and NMI(G es i , S es i ) are roughly comparable-again, all regardless of the normalizer. Meanwhile, are all roughly comparable for the other five normalizers. Finally, NMI(G ru i , A ru i ) and NMI(G ru i , D ru i ) are roughly comparable and larger than NMI(G ru i , S ru i ), regardless of the normalizer. In other words, the relationship between the grammatical genders of inanimate nouns and the adjectives used to describe those nouns is generally stronger than, but sometimes roughly comparable to, the relationships between the grammatical genders of inanimate nouns and the verbs that take those nouns as direct objects and as subjects. However, the relative strengths of the relationships between the grammatical genders of inanimate nouns and the verbs that take those nouns as direct objects and as subjects vary depending on the language.
In Table 2, we provide MI(G a , A a ), MI(G a , D a ), MI(G a , I a ), MI(G a , S a ), MI(G a , C a ), and MI(G a , N a ) for each language ∈ {de, it, pl, pt, ru, es}. As with inanimate nouns, we find that there is a statistically significant relationship between the grammatical genders of animate nouns and the adjectives used to describe those nouns. We also find that there are statistically significant relationships between the grammatical genders of animate nouns and the verbs that take those nouns as direct objects, as indirect objects, and as subjects. Again, the relationship for the verbs that take those nouns as indirect objects involves a small number of noun-verb pairs. As expected, we do not find any statistically significant relationships for either case or number.  (G a , A a ), NMI(G a , D a ), and NMI(G a , S a ), calculated using normalizers that are analogous to those in Eq. (2) through Eq. (7), for a single language ∈ {de, it, pl, pt, ru, es}. (As with inanimate nouns, we omit NMI(G a , I a ) from each plot because of the small number of noun-verb pairs involved.) For ∈ {de, it, pl, pt, es}, NMI(G i , A i ) is larger than NMI(G i , D i ) and NMI(G i , S i ), regardless  Figure 2: The normalized mutual information (NMI) between the grammatical genders of inanimate nouns and a) the adjectives used to describe those nouns b) the verbs that take those nouns as direct objects and as subjects for six different gendered languages. Each subplot contains six variants of NMI(G i , A i ), NMI(G i , D i ), and NMI(G i , S i )-one per normalizer-for a single language ∈ {de, it, pl, pt, ru, es}.  Figure 3: The normalized mutual information (NMI) between the grammatical genders of animate nouns and a) the adjectives used to describe those nouns b) the verbs that take those nouns as direct objects and as subjects for six different gendered languages. Each subplot contains six variants of NMI(G a , A a ), NMI(G a , D a ), and NMI(G a , S a )-one per normalizer-for a single language ∈ {de, it, pl, pt, ru, es}.  Table 1: The mutual information (MI) between the grammatical genders of inanimate nouns and a) the adjectives used to describe those nouns (top row), b) the verbs that take those nouns as direct objects, as indirect objects, and as subjects (rows 2-4, respectively), and c) the cases and numbers of those nouns (rows 5 and 6, respectively) for six different gendered languages. Statistical significance (i.e., a p-value less than 0.05) is indicated using bold. MI values are not comparable across pairs of random variables.  Table 2: The mutual information (MI) between the grammatical genders of animate nouns and a) the adjectives used to describe those nouns (top row), b) the verbs that take those nouns as direct objects, as indirect objects, and as subjects (rows 2-4, respectively), and c) the cases and numbers of those nouns (rows 5 and 6, respectively) for six different gendered languages. Statistical significance (i.e., a p-value less than 0.05) is indicated using bold. MI values are not comparable across pairs of random variables. of the normalizer. For ∈ {it, pl}, NMI(G i , S i ) is larger than NMI(G i , D i ); for ∈ {de, pt}, NMI(G i , D i ) is larger than NMI(G i , S i ); and NMI(G es i , D es i ) and NMI(G es i , S es i ) are roughly comparable-again, all regardless of the normalizer.
Meanwhile, NMI(G ru i , A ru i ) is larger than NMI(G ru i , D ru i ) which is larger than NMI(G ru i , S ru i ) for the normalizers in Eq. (2) and Eq. (3), while NMI(G ru i , A ru i ) and NMI(G ru i , D ru i ) are roughly comparable and larger than NMI(G ru i , S ru i ) for the other five normalizers.
Finally, each subplot in Figure 4 contains NMI(G i , A i ) and NMI(G a , A a ), calculated using a single normalizer, for each for each language ∈ {de, it, pl, pt, ru, es}. Each subplot in Figure 5 analogously contains NMI(G i , D i ) and NMI(G a , D a ), while each subplot in Figure 6 contains NMI(G i , S i ) and NMI(G a , S a ). The NMI values for animate nouns are generally larger than the NMI values for inanimate nouns. The only exception is Polish, where NMI(G pl i , A pl i ) is larger than NMI(G pl a , A pl a ), regardless of the normalizer.

Discussion
We find evidence for the presence of a statistically significant relationship between the grammatical genders of inanimate nouns and the adjectives used to describe those nouns for six different gendered languages (specifically, German, Italian, Polish, Portuguese, Russian, and Spanish). We also find evidence for the presence of statistically significant relationships between the grammatical genders of inanimate nouns and the verbs that take those nouns as direct objects, as indirect objects, and as subjects. However, we caution against reading too much into the relationship for the verbs that take those nouns as indirect objects because of the small number of noun-verb pairs involved. The effect sizes (operationalized as NMI values) for all of these relationships are smaller than the effect sizes for animate nouns. As expected, we do not find any statistically significant relationships for either case or number. We emphasize that our findings complement, rather than supersede, laboratory experiments, such as that of . We use large-  Figure 4: The normalized mutual information (NMI) between the grammatical genders of a) inanimate and b) animate nouns and the adjectives used to describe those nouns. Each subplot contains NMI(G i , A i ) and NMI(G a , A a ), calculated using a single normalizer, for each language ∈ {de, it, pl, pt, ru, es}.  Figure 5: The normalized mutual information (NMI) between the grammatical genders of a) inanimate and b) animate nouns and the verbs that take those nouns as direct objects. Each subplot contains NMI(G i , D i ) and NMI(G a , D a ), calculated using a single normalizer, for each language ∈ {de, it, pl, pt, ru, es}.  Figure 6: The normalized mutual information (NMI) between the grammatical genders of a) inanimate and b) animate nouns and the verbs that take those nouns as subjects. Each subplot contains NMI(G i , S i ) and NMI(G a , S a ), calculated using a single normalizer, for each language ∈ {de, it, pl, pt, ru, es}.
scale corpora and tools from NLP and information theory to test for the presence of even relatively weak relationships across multiple different gendered languages-and, indeed, the relationships that we find have effect sizes (operationalized as NMI values) that are small. In contrast, laboratory experiments typically focus on much stronger relationships by tightly controlling experimental conditions and measuring speakers' immediate, realtime speech production. Moreover, although we find statistically significant relationships, we do not investigate the characteristics of these relationships. This means that we do not know whether they are characterized by gender stereotypes, as argued by some cognitive scientists, including . We also do not know whether the relationships that we find are causal in nature. Because MI is symmetric, our findings say nothing about whether the grammatical genders of inanimate nouns cause writers to choose particular adjectives or verbs. We defer a deeper investigation of these both of these avenues for future work.
Finally, we note that each of our tests can be viewed as a comparison of the similarity of two clusterings of a set of items-specifically, a "clustering" of nouns into grammatical genders and a "clustering" of the same nouns into, e.g., adjective lemmata. Although (normalized) MI is a standard measure for comparing clusterings, it is not without limitations (see, e.g., Newman et al. (2020) for an overview). For future work, we therefore recommend replicating our tests using other informationtheoretic measures for comparing clusterings.