In recent years, an increasing number of organizations and individuals have contributed to the Semantic Web by publishing data according to the Linked Data principles. In addition, a significant body of Semantic Web research exists that studies various aspects of knowledge representation and automated reasoning over collections of such data. However, a challenge that is crucial for achieving the vision of a Semantic Web – but that has not yet been studied to a comparable extent – is to enable automated software agents to operate directly on decentralized Linked Data that is distributed over the WWW. In particular, fundamental questions related to querying this data on the WWW have received very limited research attention.
This book contributes towards filling this gap by studying the foundations of declarative queries over Linked Data on the WWW. Our particular focus in this book are approaches to use the SPARQL query language and execute queries by traversing Linked Data live during the query execution process. More specifically, we first provide formal foundations to adapt SPARQL to the given context. Thereafter, we use an abstract machine model to formally show computational feasibility and related properties of the resulting types of SPARQL queries. Additionally, we investigate fundamental properties of applying the traversal-based approach to query execution that is tailored to the use case of querying Linked Data directly on the WWW.
The author, Olaf Hartig, has won the Semantic Web Science Association (SWSA) Distinguished Dissertation Award for this publication.
During recent years a set of best practices for publishing and connecting structured data on the World Wide Web (WWW) has emerged. These best practices are referred to as the Linked Data principles and the resulting form of Web data is called Linked Data. The increasing adoption of these principles has lead to the creation of a globally distributed space of Linked Data that covers various domains such as government, libraries, life sciences, and media. Approaches that conceive this data space as a huge distributed database and enable an execution of declarative queries over this database hold an enormous potential; they allow users to benefit from a virtually unbounded set of up-to-date data. As a consequence, several research groups have started to study such approaches. However, the main focus of existing work is to address practical challenges that arise in this context. Research on the foundations of such approaches is largely missing. This dissertation closes this gap.
This dissertation first establishes a well-defined framework for defining and studying queries over Linked Data on the WWW. In particular, we introduce a data model that enables us to formally conceive Linked Data on the WWW as a (distributed) database and a computation model that captures the capabilities of a query execution system for this database. Based on these models, we adapt the declarative query language SPARQL to the given scenario. More precisely, we define a full-Web query semantics and a family of reachability-based query semantics such that each of these query semantics presents a well-defined basis for using SPARQL to query Linked Data on the WWW. Thereafter, we show theoretical properties of queries under these query semantics. Perhaps the most important result of this study formally verifies the common assumption that a computation of query results that are complete w.r.t. all Linked Data on the WWW, is not feasible. However, we also identify classes of queries for which the computational feasibility is less limited.
After analyzing queries over Linked Data independent of specific approaches for executing such queries, this dissertation focuses on a particular execution approach and studies fundamental aspects thereof. The studied approach presents a general strategy for executing queries by integrating traversal-based data retrieval into the result construction process. To analyze this notion of traversal-based query execution formally, we define it in the form of an abstract query execution model. In addition, we discuss a concrete implementation approach for our execution model; this approach is based on the well-known concept of iterators. Our analysis of both the execution model and the iterator-based implementation shows that (i) for one of our reachability-based query semantics, the given notion of traversal-based query execution, in general, is sound and complete, whereas (ii) for the same query semantics, the specific, iterator-based implementation approach cannot guarantee completeness of query results. Based on an experimental analysis we verify that the latter limitation has a significant impact in practice.
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
Tel.: +1 703 830 6300
Fax: +1 703 830 2300 firstname.lastname@example.org
(Corporate matters and books only) IOS Press c/o Accucoms US, Inc.
For North America Sales and Customer Service
West Point Commons
Lansdale PA 19446
Tel.: +1 866 855 8967
Fax: +1 215 660 5042 email@example.com