Martin De Saulles
4. Evaluating open government data initiatives
On the surface, it is relatively easy to measure and describe the structure and contents of the data. gov. uk portal. The site itself provides statistics on the number of datasets it contains, who the publishers are and how many visits the site has had. Some of these numbers have been presented above. However, evaluating the data sets on the basis of their function and format is less straightforward. The importance of machine‐readable formats has already been described so whether a dataset is presented as a PDF document, HTML page or XML file will make a difference as to what third parties can realistically do with it. The nature of the content of the dataset is also important; a database of crime statistics which includes times, dates and map coordinates of the incidents may be put to very different uses than a document outlining the strategic objectives of a regional health authority. This is not to argue than one data set has a higher inherent social value than another but that its potential uses may be different. Yu and Robinson( 2012) point out that the language used to describe initiatives such as data. gov. uk and other similar programmes around the world can mean different things to the technicians that build the systems and the policy makers that and politicians that promote them,
“ The popular term“ open government data” is, therefore, deeply ambiguous – it might mean either of two very different things. If“ open government” is a phrase that modifies the noun“ data”, we are talking about politically important disclosures, whether or not they are delivered by computer. On the other hand, if the words“ open” and“ government” are separate adjectives modifying“ data”, we are talking about data that is both easily accessed and government related, but that might not be politically important.”( Yu and Robinson, 2012 p 181‐182)
If this ambiguity is to be removed and a more concrete understanding of what“ open government” actually means is to be gained then the authors believe it is important to separate the characteristics of the data from the reasons for which it is being disclosed and ultimately used. To achieve this they propose a stylised framework which describes the government data across two dimensions. The first dimension describes the structure of the data and how it is published and runs from‘ adaptable’ to‘ inert’. Adaptable data being that which can be easily manipulated and repurposed while inert data is presented in a format which makes further changes by third parties difficult or even impossible. The second dimension describes the data on a spectrum running between‘ service delivery’ and‘ accountability’. Yu and Robinson give the example of machinereadable bus timetable data that may provide convenience to individuals, aid commerce and generally help provide a higher quality of life as being on the service delivery end of the spectrum. Data that discloses details of political funding or expenditure by public bodies would be on the accountability end of the spectrum as it could be seen to be increasing transparency. They acknowledge that their definitions may be rather binary and that some data will not neatly fall at one end or the other of these dimensions but as a framework within which to consider initiatives such as data. gov. uk it provides a useful starting point for evaluating public sector information.
The following section describes an analysis of a subset of data. gov. uk which uses Yu and Robinson’ s framework in an attempt to determine how useful it is as an evaluative tool.
5. Research methodology
The primary objective of this research was to consider the value of Yu and Robinson’ s framework as a way to measure the characteristics of data sets in initiatives such as data. gov. uk. When the research was carried out there were more than 8,000 data sets contained within data. gov. uk and an analysis of all was not feasible for reasons of time. Therefore, a sample of 100 was analysed with the selection based on random sampling using a random number generator. While 100 data sets is relatively small compared to the total, it was considered sufficient to apply Yu and Robinson’ s framework and provide the basis for a discussion of its utility.
Once the 100 data sets had been identified they were evaluated according to the extent that they were adaptable or inert and whether their primary purpose appeared to be to improve service delivery or public accountability. A data set was considered inert if it was presented in a static format such as PDF, HTML or Microsoft Word or adaptable if it was presented in a more malleable format such as Microsoft Excel, CSV or XML. It was also considered inert if restrictions beyond those contained in the Open Government Licence governed its re‐use. Whether it was designed for service delivery or public accountability was a little more complex and required a more value‐based judgement. A relatively large number of datasets detailed the expenditure above a specific threshold of public bodies and others contained salary levels of public employees
163