Deliverable 2.1 now available
Subject: State-of-the-art Report and Evaluation of Existing Open Data Platforms
Document produced by: Insight Centre for Data Analytics, National University Ireland Galway (NUIG)
Edobor Osagie, firstname.lastname@example.org, Waqar Mohammad, email@example.com, Arkadiusz Stasiewicz, firstname.lastname@example.org, Islam Ahmed Hassan, email@example.com, Lukasz Porwol, firstname.lastname@example.org, Adegboyega Ojo, email@example.com
Due date: 31 May 2015
Opening up government data to the public has been recognized to have a significant impact on enhancing transparency and openness of public sector entities while promoting new forms of accountability and improving citizens’ trust in governments. In response to the European Public Sector Information (PSI) directive, many European Union (EU) member states have launched their Open Data initiatives with over 8,000 datasets available on the EU Open Data Portal. However, due to persistent barriers such as: 1) limited access and use of Open Sata by citizens and third-parties, 2) limited budget and resources on the part of government agencies to publish new datasets of high value and 3) weak legislative framework to enable ethical reuse of available datasets, the high expectations have not been satisfied. Therefore a need for new approaches arises, to improve accessibility and understandability of Open Data. One of the potentially successful methods involves explicit support for social interaction over published datasets as a means to increase data and government transparency. Such next generation platform could be realized through the integration of Web 2.0 or Social Media technologies with traditional Open Data platforms. However, efforts in this direction are just beginning to gather interest and momentum in the Open Data community. The Route-To-PA Project is a specific instantiation of this kind of methodology.
The Route-To-PA project (Raising Open and User-friendly Transparency-Enabling Technologies for Public Administration) which provides the context for this study; aims to enable the transition into the next generation Open Data platform paradigm by creating tools that will enable citizens to social directly engage over Open Data resources. The project also aims to provide specific explanatory tools that could be incorporated into existing Open Data platforms to deliver greater transparency, quality and understandability of the datasets. Building such tools and technologies is contingent on investigation and evaluation of state-of-the-art Open Data platforms to determine current set of capabilities and possible expandability of the platforms. While there are a few existing studies on Open Data Platforms, none of these studies specifically address the affordances of these platforms with respect to the quality and transparency of Open Data published on these platforms. In addition, none of these reports also look at the social aspects of Open Fata. This report attempts to bridge this knowledge gap.
This report aims to provide better understanding and evaluation of Open Data platforms with respect to:
- The degree of availability of features that enables Public Authorities and other Open Government Data providers publish high quality datasets with respect to transparency attributes such as: accessibility, usability, understandability, informativeness and auditability, as well as social interaction and collaboration on datasets;
- The shortcomings based on the perceptions of different categories of stakeholders, such as data publishers, data consumers and mediators
- The platform features, desirable by Open Data stakeholders with respect to dataset transparency, social interaction and collaboration on datasets and
- The degree to which these platforms provide mechanisms to allow modification of their behaviour and to facilitate the development of additional capabilities on the platform.
The study relies on the analysis of information gathered from review of literature, survey of eleven state-of-the-art open data platforms, stakeholder interviews, and stakeholder workshops in Dublin (Ireland) and Prato (Italy). The platforms reviewed and evaluated include: CKAN, DKAN, Socrata, PublishMyData, Information Workbench, Enigma, Junar, DataTank, OpenDataSoft, Callimachus, DataTank and Semantic MediaWiki. The first and fourth objectives were addressed through data from platform evaluation (described in Section 3 of the deliverable), while the second and third objectives were based on data contributed from interviews and workshops on perception of stakeholders (described in Section 4 of the deliverable).
To address the first objective, the platforms were evaluated based on a set of criteria that enable direct and indirect support for dataset transparency and socialisation on datasets. These criteria include availability of: 1) Metadata, Data and File Format Standards and Schemas, 2) Flexible search facility for datasets, 3) Social Media, Collaboration and Social Sharing tools, 4) Dataset Publishing workshop, 5) Harvesting, Federation and Cataloguing, 6) Data Analysis tools, 7) Visualisation tools, 8) Personalisation tools and 9) Customisation tools, 10) Dataset licensing service, 11) Accessibility and 12) Extensibility mechanisms. These criteria are defined in Section 3. The fourth objective is addressed by considering additional information on whether the platform: 1) is open source, 2) provides concrete extension mechanisms for end-users and developers, 3) provides a guide to support extension activities and 4) allows publishers to customise metadata schemas. Objective 2 is addressed by analysing the barriers contributed by stakeholders that are related data transparency, social and collaboration activities on datasets. Objective 3 is addressed by evaluating the features and solutions to identified barriers and shortcomings of Open Data platforms suggested by stakeholders during interviews and workshop sessions.
The findings from the results are as follows:
Availability of Features to Support Transparency of Datasets and Social Interaction
Socrata, CKAN, DKAN and Semantic MediaWiki standout from other platforms by providing full-fledged features that support at least 9 of the 12 criteria used in the evaluation (see Table 1). Other platforms support between 1 to 7 fully-fledged features. Overall, while the platforms’ support for the use of Social Media channels, customisation and personalisation are common features in state-of-the-art platforms, support for metadata schema adaptation, options for visualisation of datasets and accessibility (including at granular level) to datasets are limited. However, it must be noted that in terms of Social Media integration, these platforms simply allow a link to specific Social Media accounts. Personalisation in the context of this evaluation is only limited to end-user ability to change the behaviour of the platform based on preferences and does not extend to the aspects like the recommendations of datasets to end-users based on relationships with other users or preferences.
Shortcomings of State-of-the-art Open Data Platforms based Perceptions of Stakeholders
Our analysis showed that the most common barrier to the use of Open Data platforms and Open Data itself is the perceived poor quality of datasets available on the platforms. Poor data quality according to stakeholders is associated with poor metadata, failure to use the right format for different audience and difficulty in locating data of interest. Other barriers identified are related to non-relevancy of available datasets, usability of platforms and data available on the platforms as well as the lack of good examples of prior use of available datasets.
The figure below presents the associated transparency issues that are related to the above barriers:
Desired Features for Open Data Platforms Features by Stakeholders
The desired features contributed by stakeholders for the next generation Open Data platforms were captured under two categories: 1) Social and Collaboration, and 2) Understandability, Usability and Decision making needs. Dataset rating and feedback on datasets, Wall style feedback, collaborative curation of datasets, prioritization and voting on dataset requests, reward system and gamification are some of the features expressed under the social and collaborative needs. To enable better understandability, usability and better decision making with next generation platforms, users requested for customisable dashboards, data mining tools and custom visualization tools, support for Linked Data and map based search as well as question and answering features. The cloud-tag below in Figure 3 was generated from the contributed solutions and features to identified stakeholder needs and barriers.
Extensibility of Open Data Platforms
Based on the four detailed criteria for extensibility of platforms, CKAN, DKAN and Semantic MediaWiki are the most extensible providing free and open source codes, rich set of extension mechanisms and open architecture, guide to support developers in building such extensions and support for additional fields in the metadata schema. However, Callimachus and DataTank being open source could also be modified as desired albeit at a much higher cost compared to the above that provide explicit extension mechanisms. The detailed table of extension features is presented in Table 2 below.
Conclusion and Recommendations
Guided by the findings we conclude as follows:
- That a few state-of-the-art Open Data platforms such as CKAN, Socrata, DKAN, Semantic MediaWiki provide well-developed features to support good data transparency and quality when publishing datasets. While three of these platforms are open-source and provide extension mechanisms, they arguably standout as choice base platforms for building next generation open data platforms. CKAN, DKAN and Semantic MediaWiki in particular have a very vibrant developer community that could provide the necessary support in any further development of these platforms.
- Despite the features provided by some of these platforms as highlighted above, from the end-user perspective, there are still significant challenges that must be tackled for these platforms to be adopted and used as desired by public administrations and other stakeholders. One of the significant barriers is the perceived poor quality of datasets published on these platforms. Consequently, platforms developers would have to directly address aspects of Open Data quality such as poor context and provenance of published datasets and non-viable data feeds. Feature to explicitly rate datasets in different data quality dimensions could be useful in this regard.
From the stakeholders’ perspectives, social features such as dataset rating, voting and wall-style feedback on datasets and advanced analytics tools such as customisable dashboards, custom visualisation tools should be considered in future enhancement of Open Data portals. This is congruent with findings from technical evaluation of state-of-the-art platform features.
Open and extensible base technology platforms are available for innovation relating the development of next generation Open Data platforms with features described above. In particular, CKAN, DKAN and Semantic MediaWiki are candidate base platform for such innovation activities.