Sébastien Martin et al.
This policy is however not systematic and depends on data creators. Tim Berners‐Lee 15 proposes to evaluate Open Data according to criteria that give each dataset a rank based on its openness and its reuse abilities. One limitation of the current Open Data is that most of the data obtained at most three out of five stars, which in the frame of Tim Berners Lee limits the success of the releases and restrains the value of the data.
Table 7: Summary of data issues
Identified risk Data accuracy and bias Data quality as a result of a high quality production process Data available in heterogeneous formats
7. Risks related to metadata
Contingency actions Clarify the context of the data creation process Stabilize funding for the creation of data and promote crowdsourcing
Publish datasets in various formats
Metadata are assigned to describe datasets. They are very important for the retrieval and reuse of datasets.
7.1 Lack of single standard to describe datasets
For the description of datasets, metadata are most often formatted according to the Dublin Core 16 and DCAT vocabularies 17. However, there is no single standard to describe Open Datasets. Re‐users have to deal with multiple vocabularies. Coordination efforts are then necessary to overcome the difficulties raised by the heterogeneity of metadata models used to describe Open datasets. In France an initiative has been launched to harmonize metadata from practices identified at the local level 18.
7.2 Incomplete metadata
The lack of metadata, the lack of mechanisms to ensure the quality of metadata, and the lack of information on the objectives and means that have led to their creation or their aggregation also represent risks for the efficient reuse of Open Datasets. For example there is often no information on how data were used in the first instance. Generally, the documentation of data provenance 19 and context which would allow interpreting the data is critical. In Berlin, Both( 2012) demonstrates the key role of metadata for the future of Open Data and even suggests tracing reuses through metadata.
Table 8: Summary of metadata related risks
Identified risk Lack of single standard to describe datasets Incomplete metadata
8. Risks related to access
Contingency actions Participate in the harmonization of metadata between Open Data catalogues
Gather metadata needs from re‐users; implement mechanisms to trace the provenance and use of datasets.
Open Data should be accessed by both humans( end‐users) and machines( through re‐users). When setting up an access interface, some platforms request users to register and log in to access the data. This can discourage potential re‐users by establishing tedious procedures. On the opposite if the platform does not impose any identification, it becomes very difficult to know who is accessing what data and reusing it.
More and more, platforms enable access through APIs( e. g., data. gov in the United States) for re‐users who can then automatically access and update the datasets. They relieve service creators of the task of updating data. By ensuring that data used by service creators is up‐to‐date, the data providers increase the quality of services. Nevertheless, the proportion of data accessible through APIs is still low, only five in Rennes. These datasets are also among the most used by applications created from public data. Although it is unclear that this is due to the presence of an API( as they also happen to belong to the domain of mobility, highly popular among data re‐users), it suggests that APIs can indeed support the reusability of data.
15 http:// 5stardata. info 16 http:// dublincore. org / documents / dces / 17 http:// www. w3. org / TR / vocab‐dcat / 18 http:// opendata. montpelliernumerique. fr / Vers‐une‐harmonisation‐des 19 http:// www. w3. org / 2011 / prov / wiki / Main _ Page
307