The great technological progress we have witnessed in the last twenty years has allowed us to collect and generate a huge amount of data at an unprecedented rate, leading to the beginning of the so-called Big Data era. Nowadays we are relying more and more on algorithms and Big Data analytics to make decisions and evaluate services and employees.
But can we genuinely trust Big Data?
Since they are built on mathematical models and ran by machines, algorithms are thought to be objective and unbiased and, even if they potentially could be, the majority of them are currently far from being neutral ways to evaluate the reality.
To build an algorithm you need two things: data and a definition of success. However, the choice of which data to collect and which outcomes to consider as successes largely affects the integrity of the models and their neutrality. Therefore, being these models based on decisions made by fallible and biased human beings, many of them encode human prejudices and misunderstandings in their systems, basically transforming algorithms in opinions embedded in codes.
We often find different cognitive biases injected in models. When we search data to prove a hypothesis, for example, we tend to ask ourselves “Must I believe it?” when we find evidence in contrast with our beliefs, while we ask “Can I believe it?” when the evidence is in alignment with our view. This same brain pattern makes data scientists more incline to concentrate on the data that confirm their former beliefs than on the data that prove them wrong when they create new algorithms. Another common problem rises in the choice of the sample of data to analyze. In 2015 Google implemented a new automatic photo-tagging service for its photo app that labeled African Americans as gorillas. Some of the sources of this kind of error surely are the choice of an unrepresentative sample of people, called selection bias, and the availability bias that refers to the way in which people make decisions based on the readily available data they have.
In the worst-case scenario algorithms can become weapons of math destruction (WMDs), as the author Cathy O’Neil suggests in her latest book. She defines Weapons of Math Destruction all those mathematical models that have three specific characteristics: scale, damage and opaqueness. Starting from poisonous assumptions camouflaged by math, these models go largely unquestioned and tend to punish the poorer or more fragile segments of the society creating pernicious feedback loops. I will show how these toxic evaluating systems operate in three relevant areas of civic life: education, justice and politics.
In 2007 the mayor of Washington D.C. implemented a strict evaluation system known as IMPACT to get rid of low performing teachers in public schools of the city. The score was supposed to measure the effectiveness in teaching through value added modeling, that included classroom observations and improvement targets which students had to accomplish on standardized tests. While some variables taken into consideration in this model weren’t just under the control of teachers, some even enforced bad behaviors such as allowing students to cheat on tests to get higher scores. The algorithm eventually caused a great number of good teachers to be fired without the possibility to appeal or to know according to which indicators they were being evaluated.
Another example of WMD is the LSI-R model (Service Inventory-Revised) used all across the US to predict recidivism rates in prisoners. The model includes a questionnaire where inmates are asked questions that range from relevant to socially challenging topics, such as “the first time you were involved with the police” and “do you have friends or relatives with criminal records”, that damage convicts who grew up in poor neighborhoods in which the police is more likely to stop and the crime rate is more likely to be high. As they increase the chances for this kind of inmates to get longer sentences, they also create a detrimental feedback loop. Convicts who receive long sentences are indeed locked away for longer periods with other criminals, and they are then released into the same poor neighborhood with a criminal record that makes it harder for them to find a job. Therefore, at the end, it is the model itself that sustains its toxic cycle and validates its system by increasing the probability for its victims to commit other crimes.
Although they cannot be defined as Weapons of Math Destruction yet, social networks’ algorithms are potentially able to affect what we learn and for whom we vote. In the 2010 and 2012 American elections, Facebook conducted an experiment: the platform encouraged people to spread the word they had voted, stoking peer pressure and studying the impact of friends’ behavior on our own. The results were amazing, as researchers calculated the turnout to increase by more than 300.000 people, yet alarming. The platform has all the characteristics required to become a WMD: it’s massive, powerful and opaque. Is Facebook able to game the political system by twisting its algorithm and molding the news we see?
Nevertheless, there is still hope. Algorithms can be regulated and adapted to fight biases instead of enforcing them. In the first place, the modelers themselves should become more aware of the limits mathematical models have in describing the complexity of reality, and of the huge impact their algorithms have on the society and the economy.
Algorithms must be audited and questioned, data integrity has to be checked and hidden costs have to be measured. Human values have to be imposed to models, favoring fairness over efficiency. Furthermore, it’s necessary to rethink the definition of success from the society’s perspective: we have to stop putting a target on the back of poor and weak people and to measure success only in terms of profit and effectiveness.
Predictive models are the tools on which modern societies will rely on to run institutions, employ resources and manage lives. Being these tools constructed from the choices we make about which data to consider, we need to let them be not just about profit and logistic but mostly about morality.
Sources and references:
- Article: “Four cognitive biases that affect Big Data analysis”
- TED Talk: Cathy O’Neil: “The Era of blind faith in Big Data must end”
- Book “Weapons of Math Destruction” by Cathy O’Neil