Open Data and Personal Information: A Smart Disclosure Approach Based on OAuth 2.0
Giuseppe Ciaccio, Antonio Pastorino and Marina Ribaudo DIBRIS, Università di Genova, Italy giuseppe. ciaccio @ unige. it antonio. pastorino @ gmail. com marina. ribaudo @ unige. it
Abstract: Currently, public administration is undergoing significant transformations, driven by a greater demand for transparency and efficiency in a participative framework involving nonprofit organizations, enterprises, and citizens, with the modern network infrastructure as a common medium. The Open Data movement is considered one of the keys to this change. To the best of our knowledge, the current generation of Open Data has to date provided only static datasets in which no data concerning specific individuals could be included, due to obvious privacy issues. Public administrations hold a great deal of data of a personal kind, as do many private entities. Consider, for instance, the huge amount of personal data contributed to the various online social networks, or the electricity consumption data collected and stored by energy providers, or the telephone and internet data collected by telecommunications companies. The lack of such personal data in the Open Data realm, and the static nature of the released datasets, are weaknesses of the current generation of Open Data. Without personal data and without timeliness, it is impossible to build useful services tailored to the actual needs of a given individual at a given time. We argue that, by segregating or“ protecting” our personal data, those public and private entities become the“ owners” of our data. This means they hold a monopoly on services, while we, the legitimate owners of the data, must abide by their terms and conditions concerning how our data are treated and used. By unleashing personal data“ into the wild”, such a monopoly would collapse and a new ecosystem of personal services based on these data could flourish. Of course nobody wants personal data to enter the public domain without any control. We argue that an appropriate policy for online disclosure of personal data is one where the individuals are restored to their role of“ data owners” and are allowed to exert online control over data accesses being performed by third parties. This idea of“ smart disclosure” of personal data is expected to be one of the forthcoming evolutions of Open Data. Based on the above arguments, we propose a possible implementation of“ smart disclosure” that takes advantage of the OAuth 2.0 authorization framework. If properly implemented, OAuth 2.0 guarantees access to selected personal data upon authorization by the individual data owner. An implementation is presented together with possible use cases.
Keywords: open data, smart disclosure, OAuth
1. Introduction and motivation
The Memorandum on Transparency and Open Government signed by the US president Barack Obama( Obama 2009) has fostered a new era for the public sector in which transparency, participation, collaboration and, ultimately, Open Government should become central in the democratic decision process. Administrations should become more transparent and promote the use of new technologies to ensure that the data they routinely produce and manage are made available online so that they can be leveraged by any party: enterprises, private citizens, public entities, and other branches of the public administration. This document marked the official onset of the so called Open Data movement.
Shortly afterwards, several public administrations in the USA and UK started releasing massive amounts of Open Data in the form of aggregated datasets made available on their websites. The first catalog of Open Data was published in May 2009 by the US Government( http:// data. gov), followed by the UK( http:// data. gov. uk). The British Government is now at the forefront in Europe, engaging in an unparalleled effort towards widespread adoption of the Open Data paradigm.
Quoting( UK Government Cabinet Office 2012),“ Data is the 21st century ' s new raw material”: by means of handheld devices, social networks, cash dispensers, credit cards, people are directly or indirectly generating an unprecedented volume of data that is deemed to transform our very lives( Hoffman 2012).
The current wave of Open Data released by public administrations is largely made of formatted datasets of a static nature, i. e., they will not reflect changes occurring after the release date. Diverse fields are involved: politics, traffic, local transportation, tourism and culture, the environment, healthcare and welfare, cartography, and many others. Such datasets are roughly of two kinds, namely: aggregated and anonymized data( e. g. number of children in each school of the region); and identification data of public entities( e. g.
135