13 Equity, diversity, and inclusion in research
Guest author: Poppy Riddle
13.1 Introduction
Latonia Harris provided these definitions for a 2021 workshop on diversity, equity, inclusion, and anti-racism in STEMM organizations (Scherer 2021):
Diversity: the numerical representation of groups of individuals based on their primary and secondary characteristics and identities.
Equity: the treatment of individuals in terms of access, opportunity, and advancement.
Inclusion: the ability to meaningfully participate and contribute, both for the benefit of the individual and the organization.
Racism: the devaluation and the denial of rights, dignity, and value of individuals due to their race or geographical origin.
In this chapter, we examine the ways in which diversity, equity and inclusion are lacking, how racism may be/is present in research, and how we can address those issues. We first revisit the social stratification of science by looking at sociological theories as a means of understanding how citations are valued outside of the monetary compensation of labour. We then examine citational justice and epistemic justice as frameworks to understand how we create imbalances as part of our citational behaviours. We then examine some of the biases that have been investigated in bibliometrics and suggest what other biases may be present. We also look at a few data sources to gain an understanding of the limitations and advantages of such tools as an example of the critical perspective needed to understand the complexity of the citation reward system.
13.3 Measuring bias/disparities in research
From a sociological perspective, which acknowledges and seeks to understand cultural and structural context, biases can manifest in implicit or explicit ways that have long-term ramifications on perceptions of “confidence, capability, trustworthiness” among others (Scherer 2021). On an individual level, biases are learned behaviours and associations that happen quickly, and over time, unconsciously. Biases are also complex with two levels or layers, with the first based upon observable qualities or traits and the second including associations or connections with behaviours that are then compared with those of the observer. Contextual conditions, such as beliefs, can add validation and reinforce biased associations and permit assumptions. The goal of diversity science is to bring awareness of those biases to challenge assumptions so that acknowledgement of the “disparities in resources and opportunities across groups” can be addressed (Scherer 2021).
Studies examining gender bias in scholarly communication utilize algorithms to categorize names within gender categories (typically binary) based on geographic and cultural inferences. NamSor, genderize.io, GenderAPI, and Wiki-Gendersort are the main ones found in bibliometric studies investigating gender bias. The algorithms work by harvesting names from openly available databases and also collect other data such as the country of origin and the family name and language as cultural context identifiers. All names are assigned a gender with a certain probability calculated by the algorithms. These algorithms have some benefits: they are cheap, effective, and can be applied retrospectively to datasets. But they also have limitations, such as the fact that they rely on name-gender databases that may not include self-identification. Moreover, gender probabilities based on names and locations are obviously not perfect and may fail to attribute the right gender in some cases. That said, their accuracy remains acceptable at the aggregate level. While this still presents data along binary categories of gender (not to mention the common conflation of sex and gender as identity), the algorithms are often used to address and dismantle the historical and current oppression rather so that rejecting their use would deprive us from valuable knowledge around gender biases and disparities that exist at a large-scale. Here are several kinds of gender biases or disparities that have been observed in bibliometric studies.
A citation disparity is observed by simply comparing citations indicator between groups (Traag and Waltman 2022). Studies have shown that works by women tend to get less cited than work by men (Larivière et al. 2013) and that women represent only 14% of the highly cited researchers (the group of researchers who publish highly cited publications) in the Web of Science (Meho 2022).
A citation bias is observed when there is a causal relationship between a variable (e.g., gender) and the act of citing a paper (Traag and Waltman 2022). Causation is however difficult to demonstrate in bibliometrics because experiments are extremely rare in the field, and most studies are correlational. Because correlation does not imply causation, it is very difficult to demonstrate a bias (defined as a causal relationship) using bibliometric methods.
Citation homophily is observed when members of one group tend to cite members of the same group more than researchers from other groups. Ghiasi et al. (2018) found that citation homophily occurs in all fields of science but that it is stronger in the Social Sciences and Humanities.
While the examples above refer to citation disparities, biases, and homophily, the same situations or mechanisms can be observed for other indicators such as research outputs, collaboration, funding, awards, hiring, promotions, etc. Understanding how biases and discriminatory practices exist in academia is important for closing the gender gap. Furthermore, disparities, biases, and homophily can be observed for other variables than gender:
Biases based on ethnicity or race impose disadvantages on persons based on their perceived identity. Ethnical biases can include race, ethnicity, and nationality. Secondary associations with race, ethnicity, nationality, and their intersections can further create or maintain harmful stereotypes when authors’ works are perceived as less than those of another group.
Biases in the perceived value of works from certain countries or regions. Examples seen previously include ignoring or devaluing works that are from other countries or regions, the assumption that issues in X country are not applicable to one’s own situation, or assumptions of research quality or rigour if the author has institutional affiliations outside of the perceived ‘norm’. The Global North produces far more publications and receives more citations than the Global South, which also produces more local and geographically contextualized work than other geographic regions (Mongeon et al. 2022). Reinforcing this bias of geographic citations is the evaluation of works for quality, with Global North/Western authors possessing the privilege of not citing authors from other regions with any deleterious effect on their perceived quality, whereas non-Global North/Western authors must cite references from the Global North as evidence of their research quality (Chakrabarty 2007). Other studies grouped countries by income level and found that research in low to middle-income countries tends to be evaluated less favourably than those in high-income countries (Harris et al. 2017).
A devaluation or dismissal of work written in languages other than English exists in citation and pee-review (Lee et al. 2013). There are differences in acceptance rates of manuscripts from authors of English-speaking countries and those of non-English-speaking countries, and sometimes language and writing style is given as reasons for rejection when there is no other problem with the manuscript. Databases of Scopus and Web of Science have a disproportionate coverage of English articles, affecting fields such as social sciences and humanities, where there are more books in languages other than English due to their subject matter and regional specificity (Mongeon and Paul-Hus 2015). Compounding this is that US and English-speaking countries dominate web development, particularly academic web development, contributing to even more bias against non-English sources. As such, all indicators, including those that are web-based, are inherently biased toward English documents from database sources, social media outlets, or search tools (Mas-Bleda and Thelwall 2016).
In their study of peer-review biases, Lee et al. (2013) point to other forms of bias, including affiliation bias (evaluating more favourably work from prestigious institutions), content bias (favouring specific topics or methodologies), confirmation bias (favouring work that support one’s views), or publication bias (favouring positive results). Double-blind peer review has been found to be an effective mediation of these biases. However, manuscripts contain many identifiable characteristics that can provide a reviewer with enough information to correctly identify an author (Baggs et al. 2008), with highly specialized fields, such as bibliometrics, possibly making it easier.
13.4 How do we do better?
Ray et al. (2022) propose citation diversity statements as a reflexive tool to reinforce the commitment to your community of researchers. The following is an example citation diversity statement from Ray et al. (2022):
We are committed to promoting intellectual and social diversity in science and academic scholarship and took this commitment into consideration while researching and writing this article. We actively worked to promote diversity in our reference list while ensuring all the references cited were relevant and appropriate. We have included some references to enhance diversity but have not omitted any references for this purpose. To assess the diversity of our references, we obtained the predicted gender of the first and last author of each reference by using a database that stores the probability of a first name being carried by a woman (gender-api.com). Using this measure and removing self-citations, our references contain 30% woman(first)/woman(last), 11% man/woman, 15% woman/man, and 44% man/man. This method is limited in that a) names, pronouns, and social media profiles used to construct the database may not, in every case, be indicative of gender identity and b) it cannot account for intersex, non-binary, or transgender people. We look forward to future work that could help us to better understand how to support equitable practices in science.
Because it is easy to imagine how citation diversity statements could lead to tokenism (diversifying citations artificially for the sole purpose of “looking good”), Ray et al. (2022) insist on the ethical importance of citing works that provide information relevant to a paper, and not simply because of some box on a manuscript submission form that needs to be checked. That said, unconscious biases in citing behaviours may not support the best interests of researchers and their research community. Investment in thoughtful, purposeful citations of works one is engaged with will not only strengthen communities but, when done with an awareness of having diverse voices as a strengthening practice, will also improve the overall quality of scholarly works.
Given that disparities exist historically, basing decisions upon results such as these with stereotypes about the quality of all articles in the Global South (also problematic) contributes further to the disparity. From an emancipatory perspective, the path to fixing this is making time to explore, engage with, and understand scholarly production from geographic locations beyond the norm. This not only enriches one’s own writing through a more balanced view but also respects and recognizes advances by researchers.
Is citation technology compatible with “social equity, freedom, and cultural pluralism” or does its existence require centralized control through ownership, market forces, and power concentrations Winner (1980)? On the one hand, there is the rather functional view of the phenomena of social capital in which we see centers of power within scholarly communication and citations as part of the reward system of science, and that by citing, we associate our work with these centralized actors. On the other hand, there is an emancipatory view in which we view citations as a technology that enables us to redistribute and acknowledge those that we have engaged with, recognize, and proliferate ideas that are meaningful to us and our part within a community.
13.4.1 Citational justice
Kumar and Karusala (2021) introduce Iris Marion Young’s faces of oppression as a framework for understanding and addressing citational (in)justice. They define justice as “a relational value of the actions, structures, and institutions in which persons stand to each other as social and political subjects, be they structures of the production and distribution of material goods or of the exercise of political power”,and view the citation as “anti-racist, feminist technologies” (Kumar and Karusala 2021) with the potential to correct the imbalances have occurred. The authors present some examples of ways in which injustices have shown up in their own work and reviews, which may provide an opportunity for self-reflection upon your own citation practices.
Exploitation – occurs when the balance of work and compensation is leveraged, creating inequality and power dynamics. This supports the rich-get-richer aspect of Merton’s Matthew effect by leveraging power. The authors identify several types of citation behaviours found in their own work.
The Cite-Me Cite can occur when submitting papers to journals and the editors pressure the authors to cite their work in return for an acceptance. This is particularly a concern/signal of predatory journal practices.
The Name-Agnostic Cite occurs when hard-to-recognize/pronounce/read names are othered, as in “other authors have investigated…” whereas Western names are clearly cited.
The In-the-Global-South and Unrelated-to-the-North Cite falls along similar lines as othering or even making certain work irrelevant. See Linxen et al. (2021) for a study exploring this issue.
The Throwaway Cite occurs when citations are lumped together without individual attention or recognition, as in “studies in LIS have examined the effect of unicorns (Name, 1986; Name, 1993; Name, 2000; Name et al., 2002; Name et al., 2013).” While this practice may be an intent to be exhaustive yet concise, who is this benefiting and for what purpose?
The No Cite is when references are not made as conscious or unconscious decisions to omit. While addressing this type of non-cite requires greater rigour, not doing so is a privilege that is being assumed.
Marginalization – when a category of persons is excluded and thereby deprived, not only at the individual level but also at the collective level. This is evident in conferences that privilege some populations, such as conferences that have never been held in the Global South. Some universities dominate some disciplines, which can lead to a misperception of enhanced value, affecting acceptance and possibly citation. Women, scholars of colour, and gender diversity, also exhibit the effect of biases upon their communities, as evidenced by citation gaps.
Powerlessness – Those in the community that “lack significant power”, a voice, or opportunity to contribute to decision making. This occurs when assumptions are made, creating or reinforcing norms that we expect to be accepted. These assumptions, without critical inquiry, can shape not only our readers but ourselves. This can include the assumptions that work from certain groups lacks rigour, works published at certain venues or in certain journals is not of high quality, Wikipedia is not a valid source of knowledge, papers written in other languages are not relevant, etc.
Cultural imperialism – an interpretation of normal within a society that reflects the dominant society’s cultural values, at once othering other groups within society and reinforcing stereotypes that maintain this power imbalance. For example, work in the Global South focuses on the poorest communities, focus on novelty or differences within other cultures, the universality of Western ethical standards, and the expectation that English by those outside the Western North is of low quality. Cultural imperialism also occurs when the research of marginalized communities is interpreted within the frameworks of the dominant society for their own use.
Violence – here, it is important to bring in Young’s words from Kumar and Karusala (2021):
“While the frequency of physical attack on members of these and other racially or sexually marked groups is very disturbing, I also include in this category less severe incidents of harassment, intimidation, or ridicule simply for the purpose of degrading, humiliating, or stigmatizing group members.”
The authors continue to say that it’s less so the violent act than it is the social conditions which continue to permit it to happen. They illustrate this face of oppression in scholarly communications with evidence from reviews in which questions are called about the relevance of a health topic if it only affects a small population, the criticism of research aims of Black scholars as outside of typical scholarship, how disabled persons are singled out for research, or the long-term bias against low citation counts and the assumptions of quality or relevancy.
These five faces of oppression attributed to Young by Kumar and Karusala (2021) provide a framework for understanding and deconstructing biases in our choices and assumptions affecting citational justice in scholarly communication. This is not the only framework or examples that can be found. Unethical citation practices have been around for quite some time, so there are more resources for understanding how these exist.
13.4.1.1 Chicken and Egg
Kwon (2022) argues that citation-based evaluation of individual researchers in any context (e.g., funding, publishing, promotion, awards) needs to change or be reduced in importance and replaced by engaging, recognizing and valorizing ideas from diverse sources. Mott and Cockayne (2017) also recognize citations as a problematic technology but also suggest that citations can act as a feminist and anti-racist “technology of resistance” to correct the imbalance. There appears to be a tension between the need to reduce the influence of citations while at the same time exploiting that influence for the purpose of correcting the injustices perpetrated by them.
13.5 Conclusion
This chapter has examined the imbalances that occur in scholarly communication through Merton, Bourdieu, and Rossiter’s respective sociological theories and how our citation behaviours as authors can represent citational injustice or even epistemic injustice through our conscious and unconscious choices. We critically examined a few of the data sources and tools used for analysis within bibliometrics, such as gender-determining algorithms and global income categorization, for their limitations and advantages, as part of the ongoing attempts by the scientific community to address equitable imbalances within our scholarly communication system. I closed with thoughts on thinking about our citation system objectively as functionalists or as emancipatory activists.
In writing such a chapter, it must be acknowledged that it was written by a privileged white person who has settled in Canada. Some of the sources I draw from are by self-identified persons of colour, and these sources should be read fully so that my filtered version does not take away from the significance of their words and experiences. This filtration is typical in Western society and represents the endemic cultural imperialism in our education system. I appreciate the opportunity, as a queer, transgender woman, to provide my perspective. Still, I recognize it is a very narrow lens as a participant within a global community of knowledge producers.