By Stefaan Verhulst, Co-Founder and Chief Research and Development Officer of the Governance Laboratory @NYU (GovLab).
The “AI for Social Good” conference that recently took place at the Qatar Computing Research Institute examined the potential of Artificial Intelligence (AI) for good. It was widely agreed that the potential is real, and that AI could help jumpstart economic development and support humanitarian causes when used responsibly.
Yet equally, it was evident to all that increasing the adoption of AI faces certain challenges and constraints. In particular, AI (and the associated methods of machine learning, deep learning, data science, etc.) relies on access to vast amounts of data that can help train and develop new systems. Not only is this data often unavailable in emerging economies, but the relevant stakeholders may also lack the capacity (technical and otherwise) to make use of it.
This is where data collaboratives come in. Data collaboratives can help mitigate some of these challenges by providing stakeholders with more and non-traditional sources of data. At the GovLab, we have done substantial research on the potential offered by data collaboratives toward solving complex and seemingly intractable public problems. (See http://datacollaboratives.org/ ).
We have been strong advocates for using privately held data for the public good, and our research makes clear that there is an exponential return to the cross-organizational alignment of goals and pooling of resources that result from greater collaboration and partnerships. Collaboration also increases trust and ethics in the way data is handled and, importantly, the perceived legitimacy of such efforts. In other words, data collaboration is the key to unlocking the potential of data, data science and artificial intelligence while limiting its risks and potential harms.
But setting up data collaboratives comes with its own challenges and difficulties—especially given how new the concept is, and how fledging the field. In this post, we establish five principles for any organization considering establishing systematic, sustainable and responsible data collaboratives.
First, though, it may be worth addressing the question: What is a Data Collaborative?
The term “data collaborative,” introduced by the GovLab in 2015, refers to an emergent form of public-private partnership in which actors from different sectors exchange and analyze data (and/or provide data science insights and expertise) to create new public value and generate new insights. As evidenced by the 150+ case studies included in our Data Collaboratives Explorer, data collaborative have been used with increasing frequency, across sectors ranging from agriculture to telecoms to government, in a growing number of countries around the world.
Data collaboratives also come in many different forms. They include data pools, which agglomerate data from various sources and sectors; challenges and prizes, in which corporations and other data holders make data available to third parties who compete to develop new apps or discover innovative uses for the data; APIs, which allow developers and others to directly access private sector data; intelligence products, where shared data is used to build a tool, dashboard, report or other platform that can help support another organization’s objective; and a trusted intermediaries model, where corporations share data on a limited basis with known partners, typically for the purposes of data analysis and modeling. Data collaboratives can also take the form of more traditional research or business partnerships that allow organizations to share information and expertise.
The value of data collaboratives stems from the fact that the supply of and demand for data are generally widely dispersed—spread across government, the private sector, and civil society—and often poorly matched. This failure (a form of “market failure”) results in tremendous inefficiencies and lost potential. Much data that is released is never used. And much data that is actually needed is never made accessible to those who could productively put it to use.
Data collaboratives, when designed responsibly, are the key to addressing this shortcoming. They draw together otherwise siloed data and a dispersed range of expertise, helping match supply and demand, and ensuring that the correct institutions and individuals are using and analyzing data in ways that maximize the possibility of new, innovative social solutions.
Roadmap for Data Collaboratives
Despite their clear potential, the evidence base for data collaboratives is thin. There’s
an absence of a systemic, structured framework that can be replicated across projects and geographies, and there’s a lack of clear understanding about what works, what doesn’t, and how best to maximize the potential of data collaboratives.
At the GovLab, we’ve been working to address these information shortcomings. For emerging economies considering the use of data collaboratives, whether in pursuit of Artificial Intelligence or other solutions, we present six steps that can be considered in order to create data collaborative that are more systematic, sustainable, and responsible.
1) Increase Evidence and Awareness
As mentioned, the fledging and ill-defined nature of the field poses challenges to the adoption of data collaboratives. Simply put, if organizations and individuals don’t know about the potential of data and collaboration, then their actual use and impact will be limited. In order to spur greater adoption of data and the data collaboration model, we may need to create and document evidence regarding the value of collaboration and raise awareness among key target communities.
2) Increase Readiness and Capacity
For all the evident benefits of data, many organizations continue to display a certain reluctance or reticence. This is true when it comes to using data in general, and especially true when it comes to data collaboration, where mistrust, misaligned incentives or priorities continue to impede progress. A lack of technical capacity is also a major obstacle to greater uptake of data, especially for smaller organizations that lack the type of specialized knowledge often necessary to store and analyze data.
For all these reasons, there is an urgent need to develop new capacities and a new readiness within and across organizations. This can be done with more emphasis on training and skill-building, as well as with greater cross-sectoral collaboration (so that data specialists in the private sector, for instance, may contribute some of their skills to civil society organizations). It’s also worth mentioning that some of these goals (though by no means all) may be aided by the awareness building discussed above.
3) Address Data Supply and Demand Inefficiencies and Uncertainties
There are two sides to the data collaboration equation: supply and demand. To amplify the benefits of collaboration we need to address inefficiencies and “market failures” on both sides. This involves pro-actively reaching out to potential supply-side organizations, especially in the private sector, and working with them to minimize concerns over competitiveness or reputation, and to define data responsibility approaches taking into account both the value proposition and potential risks of collaboration.
Equally, understanding the demand for data is a vital part of establishing a responsible and impactful data collaboration process. Yet here too, as on the supply side, there are a number of inefficiencies limiting impact. Many of these are related to readiness, capacity and awareness—issues which we have addressed above.
4) Establish a New “Data Stewards” Function
We also believe that certain institutional changes are required to maximize the potential of data collaboration. In particular, the GovLab has proposed the establishment of a new role that would be embedded in any organization dealing or considering dealing with data: data stewards.
Data stewards would be the individual (or individuals) within an organization responsible for setting data policy and for steering and encouraging collaborative approaches. Data stewards also play a central role in ensuring that data is shared and handled responsibly, to address some of the inherent risks of open data and data collaboration. We have written extensively about data stewards elsewhere. Essentially, they would function as the linchpin of a new, more systemic framework for responsible data collaboration.
5) Develop and strengthen policies and governance practices for data collaboration
Establishing the new role of data stewards is but one aspect of a broader task: developing a clearer and better-defined governance framework for data collaboration in the social sector. There is ample evidence across sectors that clearly articulated governance mechanisms, along with auditing and accountability mechanisms, enhance trust and the usage of data. While some policies and mechanisms are universal and can be transposed, others may require sectoral adaptation and changes. The overall goal is to develop a well-defined set of guidelines and principles that cover the entire data lifecycle.
6) Strengthen the Ecosystem
Few successful initiatives (technical or otherwise) emerge in isolation. They are always supported, incubated and nurtured by a well-developed ecology that includes thought-leaders, funding, institutional and regulatory support, as well as a variety of other components. One of the challenges confronting data collaboratives is that the current ecosystem is weak, perhaps especially in emerging economies. There is a need to build capacities and networks of association where knowledge can be shared and funding sources identified. Likewise, there is a need for a robust body of evidence (case studies, examples, etc) that can provide lessons and best practices, and that can serve as the foundation for new projects and initiatives.
Put together, these six steps amount to a roadmap for establishing successful data collaboratives—whether the ultimate goal is fostering Artificial Intelligence solutions, or some other desired outcome. The GovLab is convinced that data collaborative, if made systematic, sustainable and responsible, offer a unique opportunity to harness the potential of data and AI o help solve important and complex public problems.