Sébastien Martin et al.
While most efforts currently focus on public data, Deloitte in its analysis( Deloitte, 2012) suggests that the main value of Open Data will result from the combination of public data, business data, and personal data. This will likely lead to even more complex situations, regarding rights and licenses over datasets.
Table 5: Summary of legal risks
Identified risk Licence is not open enough
Heterogeneous licences across datasets Stacking of rights Privacy
6. Risks related to data
Contingency actions Release data complying with the definition of openness; Collect the concerns of re‐users and modify licenses if the barriers are too constraining Sensitization of stakeholders; Strengthen the role of the agency that organizes Open Data Governance choices Data anonymisation
The data also represent risks related in particular to their reliability, their quality, and their format.
6.1 Data accuracy and bias
The dependence of data producers on public funding can raise suspicions on the accuracy of the data. Some data can be sensitive to political pressure( e. g., unemployment figures) and the context in which they were created may raise concerns regarding potential manipulations by the State.
6.2 Data quality as a result of a high quality production process
The discontinued funding of certain activities represents significant risks for the quality of the data. The case of the Netherlands cadaster shows the sensitivity of the datasets to financial aspects. Entirely dependent on funds provided by the State, these were cut repeatedly over 1990’ s, which has led to a sharp deterioration in its quality( Uhlir, 2009).
Open Data advocates discard the risks by showing the opportunities of involving users in the process of data improvement. By identifying errors and warning the data curators, re‐users as well as any citizen can contribute to maintain high quality datasets through crowdsourcing mechanisms.
6.3 Data available in heterogeneous formats
In order to efficiently access datasets, users must identify the appropriate software to read the data and work with them, then to choose the best format according to their needs. Some formats are proprietary and the combination of Open Data in proprietary formats incompatible with each other already raises conversion difficulties. This represents an entry barrier for re‐users who wish to access the data but could not acquire the required software.
Data are made available in a variety of formats. The Klessmann report( 2012) indicates that approximately 90 % of the datasets in Germany are in PDF format, which presents the greatest problems for reuse, but a large part( up to 56 % depending on the organization) contains structured information that could be made reusable by converting it for instance to the CSV format.
Access and reuse can therefore be facilitated if they are produced by software whose code is open( open source software) and published in an open well documented format. In order to ensure that the format in which data are made available is not an obstacle, Rennes and Berlin make many datasets available in multiple formats( Table 6).
Table 6: Formats by dataset in Rennes and Berlin Datasets Minimum Maximum Average
Rennes 137 1 8 5.4 Berlin 61 1 9 2.3
306