Open Data Needed To Build Artificial Intelligence We Can Trust

By Prathima Appaji


Much of the public discourse around Artificial Intelligence (AI) largely focuses on either the following two scenarios; the threat it poses to mankind or; the promise of scientific and technological leaps that can be achieved with the help of these machines. At the same time, privacy concerns are high, and data protection laws – such as the Data Protection Act of the UK – are being strengthened around the world.

However: our reality already involves superintelligent machines influencing our economics, societies, and politics, and we haven’t seen any examples of Terminators so far. In fact, AI has the potential be a positive force for change.

In a recent report by Accenture, the company stated that AI has to potential to double the growth rates of 12 developed countries by 2035. The report further claimed that AI is not merely going to increase productivity, but it is going to change how we formulate economic growth. With governments such as the UK investing £75 million in developing AI and regulating the challenges around AI, it becomes important to question what is AI, what goes into making AI, and do we need open data for AI?

There are four components to AI- computations power, human expertise, domain focus, and ‘a sea of data’. For AI to function efficiently, the algorithms require data for machine learning.  Currently, AIs work as ‘black boxes’, where the internal workings of the how the machine learning absorbs data and recognises patterns and correlation is unexplainable. With a large dataset, there is a possibility of adequately training the machine learning algorithms for AIs to get the information they need to function in the real world.

However, in the present scenario, AIs are largely fed insufficient and incomplete data, thus not providing them with all the necessary examples and patterns they need. AI algorithms are not inherently designed to take into account incomplete data. Incomplete data will mimic the biases of the human data and their creators. This has the likelihood to undermine the potential of machine learning and leading to discriminatory outcomes.

Although tech giants such as IBM, Google, and Facebook have opened their source codes, they continue to hoard their datasets as proprietary assets. Data is what is required to train AIs and with large datasets, these giants have an immense advantage over smaller companies trying to break into this industry.

An example of this is when Google which used an AI to show lower paying job ads to women than to men. Similarly, Amazon’s AI for its same-day delivery service completely bypassed black neighbourhoods.

According to Nigel Shadbolt, Principal of Jesus College, Oxford University and Professor of AI, we need open data to tackle underlying systemic inequalities. Uber has taken the lead in this initiative, with the Movement. The company provides all the urban traffic data it collects to city planners to help improve traffic conditions and city planning. Another example of this is President Obama’s initiative New Open Data Mandate, that encourages governments, businesses, and institutions to adopt open data initiatives in public domain fields of health, safety, energy, and transportation.

The issue being debated in this blog is not the morality of AIs but the requirement of open data in machine learning for the AIs already in use.  AI is being increasingly used to provide customer services, increase production and socio-economic value, and even in governance. Open data is the way to ensure that there is a possibility for innovative strides, avoiding systemic biases, levelling the competition for small start-ups, and for AIs that exist today are safe and ones we can trust.


Leave a Reply

Your email address will not be published. Required fields are marked *