Abstract
The steady adoption of Linked Data in recent years has led to a significant increase in the volume of RDF datasets. The potential of this Semantic Big Data is under-exploited when data management is based on traditional, human-readable RDF representations, which add unnecessary overheads when storing, exchanging and consuming RDF in the context of a large-scale and machine-understandable Semantic Web. HDT tackles this issue by proposing a binary representation for RDF data. HDT can be seen as a compressed, self-contained triple store for RDF. On the one hand, HDT represents RDF with compact data structures that enable the storage, parsing and loading of Big Semantic Data in compressed space. At the same time, “the HDT data are the index”, and thus it can be used as a graph data store that reports competitive querying performance. In this tutorial we will focus on providing a hands-on experience with HDT. We will also welcome external presentations on related topics and a discussion on next steps for the interested community.
Motivation
Although the amount of RDF data has grown impressively over the last decade, traditional RDF representations are dominated by a document-centric, human-readable view, hence they suffer from scalability problems due to the huge space they need, the powerful resources required to manage them, and the large time required for data retrieval on the Web.
This scenario calls for efficient and functional representation formats for RDF as an essential tool for RDF preservation, sharing, and management. HDT fills this gap and proposes a compact data structure and binary serialization format that keeps big datasets compressed, saving space while maintaining search and browse operations without prior decompression. This makes it an ideal format for storing and sharing RDF datasets on the Web.
HDT has been adopted by the Semantic Web community because of its simplicity and its performance for data retrieval operations. It is worth noting that it is successfully deployed in projects like Linked Data Fragments, which provides a uniform and lightweight interface to access RDF in the Web, indexing/reasoning systems like HDT-FoQ or WaterFowl, recommender systems, mobile applications, and it is the main store behind the LOD Laundromat project serving a crawl of a very big subset of the Linked Open Data Cloud. Thus, we expect this tutorial will be of particular relevance to ISWC, since it raises awareness of a practical technology for managing and serving Big Semantic Data.
Detailed Description
We propose a full-day tutorial. It will be held in an innovative format in order to elicit unanswered questions about the HDT technology and the Big Semantic Data community. Thus, the program will be split into two parts, with a knowledge sharing session in the morning (tutorial) and participant presentations and open discussion in the afternoon (workshop).
The knowledge sharing session will be composed of practical and hands-on lessons to acquire the following skills:
- Foundations of HDT: representation, data management and data retrieval operations.
- Development libraries, practical tools and use of HDT within Apache Jena.
- Use of HDT through the Triple Pattern Fragments API.
- Use of HDT within LOD Laundromat and the LOD Lab.
The interactive workshop in the afternoon will consists of short presentations and demos from invited speakers, as well as the response to our call for papers and call for action. The last session will be dedicated to establishing collaborations and defining concrete next steps towards better organizing our community and materialising actions to increase the impact on Linked Data management.
Time | Activity |
---|---|
9:00 - 9:10 | Welcome and introduction by organizers |
9:15 - 9:30 | Participant presentations: Getting acquainted session |
9:30 - 10:30 | HDT foundations |
10:30 - 11:00 | Coffee Break |
11:00 - 12:30 | Practical uses: Linked Data Fragments and LOD Laundromat |
12:30 - 14:00 | Lunch |
14:00 - 15:30 | Presentations - Different perspectives on the topic |
15:30 - 16:00 | Coffee Break |
16:00 - 16:50 | Discussion: concrete next steps |
16:50 - 17:00 | Closing Remarks |
Tutorial Material
Material will be composed of slides, code snippets, datasets and existing libraries in https://github.com/rdfhdt. All tutorial material will be available to anyone on Github and the project website http://rdfhdt.org.
Audience
We expect to attract Linked Data researchers and practitioners, in particular data publishers and consumers. The audience will benefit from attending this tutorial by learning about ways to scale up large semantic data management and data retrieval, as well as by being able to discuss their expectations, requirements and experiences with current RDF representations and triple stores at large scale. We aim for an audience of at least 20 people.Requirements
The organisation of the tutorial requires standard, basic needs (projector and Internet connection).Presenters
Wouter Beek
VU University Amsterdam, The Netherlands
Wouter Beek received his Master’s in Logic from the Institute for Logic, Language and Computation (ILLC). He is currently PhD researcher at VU University Amsterdam (VUA), working in the Knowledge Rep- resentation & Reasoning (KR&R) group. His research focuses on the development, deployment and analysis of large-scale heterogeneous knowledge bases and the way in which they enable unanticipated and innovative reuse. Wouter is the principle developer of the LOD Laundromat and LOD Lab. He has taught over ten courses in Artificial Intelligence and Philosophy.
Javier D. Fernández (primary contact)
Vienna University of Economics and Business, Austria
https://www.wu.ac.at/en/infobiz/team/fernandez/
Javier D. Fernández holds a PhD in Computer Science by the University of Valladolid (Spain), and the University of Chile (Chile). His thesis addressed efficient management of Big Semantic Data, proposing HDT, a binary RDF representation for scalable publishing, exchanging and consumption in the Web of Data. Dr. Javier D. Fernandez is currently a post-doctoral research fellow under an FWF (Austrian Science funds) Lise-Meitner grant. His current research focuses on efficient management of Big Semantic Data, RDF streaming, archiving and querying dynamic Linked Data. He has published more than 40 articles in international conferences and workshops and was editor of the HDT W3C Member Submission.
Ruben Verborgh
Ghent University – imec, Belgium
Ruben Verborgh is a researcher in semantic hypermedia at Ghent University – imec, Belgium and a postdoctoral fellow of the Research Foundation Flanders. He explores the connection between Semantic Web technologies and the Web’s architectural properties, with the ultimate goal of building more intelligent clients. Along the way, he became fascinated by Linked Data, REST/hypermedia, Web APIs, and related technologies. He’s a co-author of two books on Linked Data, and has contributed to more than 200 publications for international conferences and journals on Web-related topics.
Program Committee
The following people could be potential PC members for the call for papers and would help in the dissemination of the tutorial.
- Bryon Jacob, data.world
- Miguel A. Martínez-Prieto, University of Valladolid
- Axel Polleres, Vienna University of Economics and Business
- Juan Sequeda, Capsenta
- Miel Vander Sande, Ghent University – imec
- Herbert Van de Sompel, Los Alamos National Laboratory
- Ruben Taelman, Ghent University – imec
- Claudio Gutiérrez, Universidad de Chile
- Oscar Corcho, Universidad Politécnica de Madrid
- Laurens Rietveld, Triply
- Nieves Brisaboa, University of A Coruña
- Antonio Fariña, University of A Coruña
- Mario Arias, Mario Arias Software
- Stefan Schlobach, VU University Amsterdam