Big Data, Big Deception

data infringement

In recent times, it appears that one tech leader after the other has headlined the news at a fairly consistent rhythm – one event may bring news of a day-long congressional testimony, while another event may bring news of a net worth that appears to meet no bounds. Coverage over the diversity of big tech talent, however, pales in comparison, despite the clear repercussions of a monolithic tech coalition.

Timnit Gebru, a Black female AI ethics researcher, recently announced that she was “abruptly fired from Google for sending an email criticizing the company’s treatment of minority employees”, according to The Washington Post. Gebru’s influential position as the co-leader of Google’s Ethical Artificial Intelligence Team was imperative to ensuring that diverse perspectives had a place at a tech conglomerate such as Google, having spearheaded research that helped maximize the company’s status “as a leader in assessing the technology’s fairness and risks.” It is emphatically clear that Gebru’s firing is just another addition to the nefarious pattern of Google silencing workplace diversity advocates who dare to challenge the company’s shoddy record.

Gebru’s story is not nascent to the conversation about diversity in the tech industry. A former diversity recruiter at Google named April Curley was also fired for a similar charge, stating that her vocalizations about racial justice ultimately led to her being fired from the company. The discrimination far preceded her dismissal from the company: she was reportedly told by her white skip-level manager that her Baltimore-accented speech was a “disability that [she] should disclose when meeting with folks internally.” Additionally, she was denied promotions and leadership opportunities, received compensation cuts, and was placed on performance improvement plans. Adding to Gebru’s sentiments of the company’s extensive accounts of anti-Black rhetoric and actions, Curley spoke on her first-hand experience working as a diversity recruiter in which Google executed tactics to “keep black and brown students out of the pipeline,” as she focused on increasing the number of hires from Historically Black Colleges and Universities (HBCUs). Moreover, she attested that candidates from these schools were given degrading feedback and then rejected at the hiring committee stage, mentioning that her advocacy for minority students resulted in “active abuse and retaliation from several managers who harassed [her] – and many other black women.”

While the issue of representation is not confined to just the tech field, the interrogation of equity in the data sector speaks to an additional struggle – the suppression and subsequent marginalization of racial minorities, particularly Black communities, using big data. The construction of algorithms is a relatively unknown process to non-tech workers, although their ubiquitous presence certainly necessitates a broader understanding of the algorithm’s functionality. It is critical to comprehend why the pervasiveness of demographic hierarchies in this industry catalyzes how these technologies are distributed to the general public because they create social implications reaching far past the limits of our smart devices. 

A profile of Yeshimabeit Milner in Forbes Magazine illuminates the process of datafication, one that Milner describes as technology influencing “our lives in ways we had no control over.” According to Milner, datafication can manifest itself into several different facets of our life that range from whether your Equifax and FICO credit scores indicate if you qualify for housing to the newsfeed that you receive on your Facebook. Milner found importance in utilizing such technology to analyze restrictive systemic structures, with the advent of Milner’s organization Data for Black Lives coming as a reaction to the trajectory of the tech sector (which has notably excluded Black and Latinx workers from ascending to higher positions). 

Milner describes her conversations with data scientists as being very “concerned about the direction the country was headed, and really concerned that their literal everyday jobs were being weaponized against vulnerable communities, in-particular Black and brown communities.” Her organization, which spans academics, tech talent, and everyday people arose organically to provide a place for those who felt that the tech industry was not forging a path for them. The very tenets of Data for Black Lives ultimately embrace concepts of movement and community building with the strategic processes of the tech sector to instigate positive, inclusive change in the industry.

What areas have become the most notorious for rampant discrimination? According to Data for Progress, one of the most prominent avenues for data weaponization has been in policing, where lack of transparency and accountability is robust. Ensuring that data from cases where clear misconduct has occurred is made public, from areas such as arrests, prosecution, law enforcement discipline, and incarceration is imperative. Nearly 66% of people surveyed about their opinion on public disclosure of misconduct either “support” or “somewhat support” public disclosure of investigations that “uncover evidence of wrongdoing.”

A Stanford Business article further corroborated discriminatory practices within credit-market algorithms, citing that, historically, “minorities have disproportionately been denied loans, mortgages, and credit cards, or charged higher rates than other customers.” Reporting on the findings of Stanford Professor Jann Speiss and Harvard University Doctoral Student Talia Gillis, the article states that Speiss found current credit-market algorithms to be “too focused on the role of humans in the process,” contemplating whether equation characteristics within algorithms (such as race and gender) were truly functioning under anti-discriminatory regulations. 

In theory, protecting homeowners from being considered strictly by these variables in the eyes of lenders is favorable, but Speiss posits that this framework holds ample prejudice as it is fails to factor in the abundant history of certain demographics not having basic access to credit, undermining whether their credit score is “creditworthy” or not.

Perhaps one of the most egregious instances of big data being utilized to target Black communities was during the 2016 Presidential election in which the Trump campaign targeted the historically disenfranchised area of Miami-Dade County’s Interstate 95 corridor. According to an article by the Miami Herald, the Trump campaign employed a “computer algorithm that analyzed huge sums of potential voters’** **personal data — things they’d said and done on Facebook, credit card purchases, charities they supported, and even personality traits (…).” Labelling this strategy as “deterrence”, the campaign utilized methods such as advertisements, disinformation, and misleading information to convince likely voters to not show up to the polls, with more than half of Black voters living in Miami-Dade being identified as selected for deterrence, reportedly at “almost twice the rate of deterrence for non-Black voters”. While most political campaigns, regardless of party, are contingent on big data to run advertisements, internal data from Cambridge Analytica suggests that the data from the Trump campaign was ultimately maneuvered to suppress voters, ergo a digital platform for disenfranchisement.

Extending to an effort known as “Project Alamo,” the Trump campaign worked closely with Cambridge Analytica, compiling voter data gathered by the RNC, personal information purchased from commercial providers, and political donor lists to sequence advertisements. The issue here is that some of these advertisements were purposefully constructed to contain falsehoods, such as a misinterpretation from First Lady Michelle Obama, with a pro-Trump super PAC writing that the ad was “very effective in persuading women in our principal audience not to vote for Hillary Clinton.” Shockingly, officials on both the Trump Campaign and Cambridge Analytica teams have admitted to this leverage of power, with Trump’s campaign chief data scientist Matthew Oczkowski describing deterrence voters as “folks that we hope don’t show up to vote.”

In lieu of these violations, several organizations have committed to prioritizing transparency and diversity in data in order to restrain big data companies from disproportionately wielding control over Black communities. Milner’s Data for Black Lives consists of a bevy of activists, organizers, and mathematicians that work to exercise crucial tech tools such as statistical modeling, data utilization, and crowd-sourcing for assembling progressive moments that promote civic engagement and dismantle bias. Identifying that some of the most exclusionary racial regulations have stemmed from the manipulation of tech such as redlining, predictive policing, risk-based sentencing, and predatory lending, Data for Black Lives works to host conferences and events to share research and take action to initiate equitable data-driven programs. 

However, many of these activist groups, such as an organization created by Gebru called* Black in AI*, face sharp criticism from moguls and titans in the industry who are resistant to such change. Her criticism of large language models was not received favorably by Google, as the company may one day “seek to capitalize on such systems in consumer-facing products that could generate convincing passages of text that are difficult to distinguish from human writing.” If we are to truly incite change and stymie the corruption of unethical data collection, then we must safeguard the platforms of diverse (primarily Black) voices within the industry.