On the State of Open Data: does it face an identity crisis?
What is the current state of open data around the world? Is open data facing an identity crisis? These are some of the questions that a recent book and its launch event tried to answer.
Six months ago, a book contemplating the state of open data around the world was released by the Open Data for Development (OD4D) initiative. The OD4D is
a global partnership that supports southern leadership and locally-led data ecosystems around the world as a way to spur positive social change and sustainable development – OD4D’s website
The program is hosted by the International Development Research Centre (IDRC) of Canada. The IDRC has also published the book, in partnership with African Minds, a non-profit, open access book publisher. As an open access book, State of Open Data: Histories and Horizons is available for free reading online.
To accompany the launch of the book, the World Bank has hosted an event, both local and online, so that people could ask Tim Davis, who is one of the editors of the book, and Anat Lewin, a Senior ICT Policy Specialist at the World Bank, for questions related to the book and the topics it covers. Named “Let’s Talk Data: Does Open Data Have an Identity Crisis?”, it took place on May 20th, 2019, and I had the opportunity to watch, take notes and ask some questions.
Privacy and personal data challenges in an open data context
Tim got through each chapter of the book in reverse order. He highlighted the need to face the challenge of dealing with privacy and personal data without reinforcing the old practices of holding on to data under a power and fear rationale.
I can say that this challenge is not new, but it is aggravated as governments and companies alike are collecting more and more personal information on citizens around the clock. While some cases are pretty clear-cut, other times it can be indeed difficult to weigh the scale in favor of either the personal privacy of people that some kind of relationship with the state, such as receiving public funds in some way, or the need for disclosure and transparency to the public to hold them accountable. Sometimes, different societies will weigh those needs differently and arrive at different outcomes.
A recent example of one of these differences is the decision of an EU court to keep secret €4,416-a-month worth of expenses of public money spent by the MEPs. On the other hand, similar expenses from the Brazilian Chamber of Deputies are released as open data (and also its Senate counterpart). In fact, for a few years there has been a crowdfunded, open source project to track these expenses and find possible discrepancies by using a network of dedicated activists and artificial intelligence tools. Project Serenata de Amor has received accolades of the press and even has received attention from the World Bank, in the form of an article written by two of the founders of the project, Yasodara Córdova and Eduardo Cuducos, displayed as an example of good governance for development.
The reasonable middle ground seems to be to disclose as open data the information related to expenditures of public money, such as the wages of public officials, or the list of beneficiaries of the Bolsa Família social program and the stipends received, but not other personal information about them that is completely unrelated to the expenditure of public funds, such as their home addresses or medical records. That has been the stance that has prevailed in Brazil, while so many other countries have been lagging behind in the transparency of public expenditure.
Speaking of fiscal transparency, Tim Davies mentioned his work on the Fiscal Data Package, which is a data standard for improved reusability and interoperability for budget and spending data that has been increasingly adopted by countries.
As for the excuses to not open data under power and fear rationales, most of those have been represented pretty early on the Open Data Excuses Bingo, along with the argumentation on why the excuse is unfounded or in which ways it could be addressed. This OD Bingo has since been translated to Italian and Portuguese, brought to my attention by Fernanda Campagnucci on episode 17 of the Pizza de Dados podcast.
Still on the topic of privacy, Anat Lewin made a very pertinent remark that aggregated data that you would normally not think of privacy issues become so depending on the sample size of the dataset, as having fewer samples in an aggregated cross-section of a dataset could more easily lead to re-identification of individuals.
Organizational and governance challenges
Some topics commented by Anat Lewin in the event were related to organizational and governance challenges faced by governments when implementing open data:
- open data policy needs to be a whole-of-government approach, and not isolated silos
- people need to work with open data as a part of their expected job, not an additional thing on top of the regular job
- capacity building for both the supply side and demand side of data
- data literacy needs to be part of the mainline education program
I agree with all of those points, so let’s unwrap those one at a time. First, experience has shown us that isolated open data initiatives in public administration do not last for long and die when the people sponsoring them leave the administration. So yes, it has to be a whole-of-government directive, sponsored by top-level stakeholders, in order to be given enough priority, to set procedures, norms and responsibilities in place that will make the initiative persist in time. But, as we learned during a series of international workshops on open data planning organized by the Public Administration Division of the UN Department of Economic and Social Affairs in 2016 and 2017, the procedures, norms and responsibilities need to be reach out to the leaves of the organizational tree, especially in large countries, for it to reach relevant data subjects that are under the responsibilities of different government institutions. The lessons learned are summarized in a Guide to Open Data Planning for Sustainable Development.
The need for clearly defined responsibilities and roles in the data opening pipeline is also very much in line with Anat’s second observation, noting that people need to produce open data as part of their expected job.
Also, none of the advances that we achieved in Brazil in those years would have been possible without the extensive capacity building program we implemented, training over 700 public servants, in person, on how to build open data plans, and almost two thousand people through the online course. What we did on the supply side of open data was complemented by the local chapter of the W3C in the form of guides and online courses on using open data.
In recent years, in the wake of the perceived need for more data scientists, data engineers and related data professionals in almost every field, I believe that many online courses and other resources have, in a way, catered to the need for more training in data literacy. However, data usage skills are surely not yet widespread enough to consider that open data is something usable by everybody. But will that ever be the case? That is something to think about.
Eight years ago, Tom Steinberg, founder of My Society, wrote an article on his personal blog, commented on by The Guardian stating clearly that open data isn’t, at least directly, for everybody, but for people with the required skills. It seems obvious, though, that nevertheless the general public can benefit from open data, even if they can’t use it directly. That is a topic worth revisiting in a future discussion. Considering that now that an increasing number of people are acquiring data skills, will open data ever be accessible to a wide audience?
On the topic of data use by society for civic purposes, someone from the audience questioned, in the context of government accountability, whether or not the theory of citizen “armchair auditors” would ever be viable. Tim’s answer was that the “armchair auditor” is possible in a long tail form. Whenever people get frustrated enough with government they will dive into the data to find out information to hold them accountable.
Outlier leaders, Latin America and the challenges of measuring open data impact
The book offers, at the same time, a broad overview of the state of open data in the world, while also going pretty in-depth on several key issues involving the open data landscape. It covers how open data fares in specific sectors, such as government finances, agriculture, education, national statistics, etc. It also deals with current issues related to open data such as data literacy, data infrastructure, algorithms, artificial intelligence and privacy. Additionally, the roles of key stakeholders in the open data ecosystem, such as governments, civil society, private sector, academia and journalism. There is also an overview of each region of the world on Section 4.
Here we take a look at Latin America and the Caribbean. The authors of this chapter are Silvana Fumega, Research and Policy Director of the Latin America Initiative for Open Data (ILDA), and Maurice McNaughton, Director at the Mona School of Business & Management, at the University of the West Indies.
There, the authors highlight the growth of the open data movement and strong civil society networks. Regional events, such as Desarrollando America Latina and Condatos – Abrelatam, have been instrumental in shaping open data progress in Latin America. Condatos congregates participants from governments in the region, while its sister event Abrelatam, focuses on discussions on open data by the region’s civil society organizations. On the other hand, countries that do not share a language in the region, such as Brazil, have been more regularly engaged with the international open data community instead, taking part in global events such as the International Open Data Conference, in lieu of being more integrated with their regional peers.
Another key finding in the region has been that, while participation and engagement by civil society organizations in the region has been strong, the private sector has been slow in adopting the use of open data for its own benefit.
Although there is a recognition that private sector companies using open data need to be included in regional discussions and activities, substantial efforts to make this happen have not yet materialised. There are several businesses working with open data and civic technology in the region, but only a small number work with civil society and governmental actors in the open data community, with firms such as Properati, Junar, and Dymaxion Labs acting as the exception rather than the rule. – State of Open Data, Chapter 4
In part this could be attributed to a lack of awareness by companies of the opportunities to use open government data as an asset in their own business analytics and machine learning activities to gain better insight, find more prospective clients for their products and services, cut operational costs by increasing efficiency in production and logistics, and also innovate in new data-driven business. It is well known that companies have been struggling to find enough qualified professionals with data science, data engineering and related backgrounds.
Perhaps this is the real “identity crisis” of open data around the world: the need to clearly acknowledge and raising awareness that in order to enable many emerging activities today such as data science, AI modelling and training, and smart human cities we need not just data, but open data. – Augusto Herrmann
Even in those circles, people often struggle to find the data they need, even as they take the pragmatic (and legally dangerous) stance of just finding some data to use and do not care about the licensing details and its implications, being focused instead in the immediate need of achieving some analysis or integration with said data that will bring results to the core business, without caring to the legal consequences of, e.g., unauthorized use. Sometimes people do care if open data is easy to find and to use, but are not aware of the decade long struggle and activism for open data, and do not call open data by its name, naming it just “data”. That is something I often see in message groups related to the data science community in Brazil. Perhaps this is the real “identity crisis” of open data around the world: the need to clearly acknowledge and raising awareness that in order to enable many emerging activities today such as data science, AI modelling and training, and smart human cities we need not just data, but open data.
On the other hand, another factor could be influencing the lack of documented cases of open data impact around the world. There are so few obstacles to using open government data, as it is freely available to everyone, that sometimes companies do use the data but refrain from telling the outside world the fact that they are using it. The rationale is that, if they were to disclose their use of open data, they would alert the competition to make the same use of data, and then lose a competitive edge. I have been arguing about this possibility for a while now in my talks about the open data ecosystem. Judging from a comment Tim Davies made during his talk, he seems to agree with that point, emphasising how the low barrier to entry for using open data is.
Another piece of evidence that companies using open data do not want to attract attention to themselves for doing it is that efforts to openly map such use cases, either by using web based surveys or detailed case studies, often do not get a lot of respondents. For example, the Open Data’s Impact Report, an in-depth analysis on the impact of open data around the world by evaluating case studies led by the GovLab of the New York University Tandon School of Engineering, features only 4 cases of open data impact in Latin America.
We can see a similar picture when we look at another survey that is more broad, open for anyone to submit use cases. The Open Data Impact Map is also funded by the IDRC and the World Bank, but also has dozens of regional supporters. In the regional report section, we can see that, excluding Mexico, there were only 72 cases of open data impact in Latin America. The reason not to count Mexico is that it is a clear outlier in Latin America, accounting alone for 95 impact cases, which is more than all other countries in the region combined.
Actually, this is a trend that can be observed in other regions of the world as well: an outlier has a disproportionately high number of open data impact cases while every other country in the region has a lot less documented cases. Besides Mexico in Latin America & Caribbean, there is the UK in Europe and Central Asia, the United States in North America and India in South Asia (those are some of the regional partitions of the world used in that study).
Final thoughts
All things considered, the State of Open Data book is worth a read for anyone concerned with open data around the world, while it’s still up to date. Make sure to check it out.
As for the supposed identity crisis of open data, do open data policies need a rebranding? Will open data ever reach a broader awareness in society? When even people that do need and use data every day are often not familiar with the basic concepts of openness, these are some of the emerging questions we are left with to ponder.