You are here

Bad Data Makes a Joke of AI

Mainak Mazumdar is a Data and Research expert, and he gave a TED talk in October 2020. Here are six quotes from that presentation that must make us totally reassess how we are using AI and make data science our absolute first priority.  

  "AI could add 16 trillion dollars to the global economy in the next 10 years  

  • "However, when it comes to fair and equitable policy decision making, it has not lived up to its promise.  
  • "AI is only reinforcing and accelerating our biases at speed and scale, with societal implications  
  • "As a data scientist, I'm here to tell you it's not the algorithm, but the biased data.  
  • "We're spending time and money to scale AI at the expense of designing and collecting high quality and contextual data.  
  • "We need to stop the data or the bias that we already have and focus on three things: data, infrastructure, data quality and data literacy.  

In computer science, "garbage in, garbage out" is the concept that flawed, or nonsense data produces nonsense output or "garbage". It even has its own abbreviation – GIGO.  

TechTerms defines GIGO (pronounced guy – go) as a computer science acronym that implies bad input will result in bad output.  

The principle also applies to all analysis and logical thought processes. Arguments are unsound if their premises are flawed.  

Several references say that the first use of the phrase has been dated to a November 10, 1957, syndicated newspaper article about US Army mathematicians and their work with early computers. 

Now that we are progressing with Artificial Intelligence, the range of applications and the breadth of the factors that can be considered makes the conclusions seem more convincing even when they are tangential to what we might have expected from the analysis.   

The general mystery of supercomputing and how it works, and the fact that you can do billions of calculations a second, add magical or supernatural credibility to the output.  

But if the most advanced assessment processes are based on poor data and/or wrong assumptions the conclusions can be enormously misleading.   

Our unjustified faith in a process to iron out the problems of bad data surfaced at the beginning of the computer age. The person usually attributed as creating the first computer, Charles Babbage, was flummoxed on at least two occasions when asked:  

"Pray, Mr Babbage, if you put into the machine wrong figures, will the right answers come out?". Supercomputers and AI have to potential to amplify this belief.  

The first step must always be to commit resources to adequately ensure we have collection procedures to collect good data, that we check its veracity and finally to ensure it is representative of the breadth of what we think it is.  

But even “good” data will reflect a historical perspective and AI can reinforce biases rather than determine a more equitable solution; solutions that reflect modern standards rather than what has resulted in the past.  

One example was when AI was used to cull applicants for a senior position. There were no questions about a person’s gender. But the system saw that the historical data showed that there was a clear trend for people to be passed over if they had studied at an educational establishment that had “Women” or “Ladies” in its title e.g. “Ladies College of Advanced Education”.  

The judicial system has strived to use AI to determine equitable sentences and obviously does not ask a person’s race. But questions such as the area you live in and your family criminal history results in increasing severity of penalties based on broad disadvantaged situations.  

Quality of data has often been overlooked as governments and private industry look to minimise costs but contextual data is also a major concept that must push us towards a huge increase in commitment to data science.

Author

John Reid

Managing Director, Austraffic

From the beginning of his career in local government and then when he established Austraffic in 1983, John realised that data collection is not just about numbers but about understanding people and the activities that serve the community's needs.  Poor or even bad data is counter-productive.  Even if results fit our preconceived ideas that doesn’t mean it is accurate. John has seen how good data expands our perceptions and thinking and can be surprising in its results. Connect with John on LinkedIn.

John Reid