Ebook: Emerging Topics in Semantic Technologies
This book includes a selection of thoroughly refereed papers accepted at the Satellite Events of the 17th Internal Semantic Web Conference, ISWC 2018, held in Monterey, CA in October 2018. The key areas addressed by these events include the core Semantic Web technologies such as knowledge graphs and scalable knowledge base systems, ontology design and modelling, semantic deep learning and statistics. Furthermore, several novel applications of semantic technologies to the topics of Internet of Things (IoT), healthcare, social media and social good are discussed. Finally, important topics at the interface of the Semantic Web technologies and their human users are addressed, including visualization and interaction paradigms for Web Data as well as crowdsourcing applications.
The 17th International Semantic Web Conference (ISWC 2018) hosted several workshops on a range of emerging and established topics related to the general theme of the conference such as theoretical, analytical and empirical aspects of the Semantic Web.
ISWC workshops are the primary venue for the exploration of emerging ideas, as well as for the discussion of novel aspects of established research topics in the area of Semantic Web. These workshops provide a setting for focused, intensive scientific exchange among researchers and practitioners interested in a specific topic. The ISWC 2018 Workshop and Tutorial chairs carefully selected a total of 16 workshop proposals, each workshop focusing on a specific area and organized by internationally renowned experts in their respective fields.
The key areas covered by the ISWC 2018 workshop programme include the core Semantic Web technologies such as knowledge graphs and scalable knowledge base systems, ontology design and modelling, semantic deep learning and statistics, and well as novel applications of semantic technologies to IoT, healthcare, social media and social good topics. Furthermore, several events addressed the topics on the interface of Semantic Web technologies and humans, including visualization and interaction paradigms for Web Data as well as crowdsourcing applications.
This book includes a selection of the best papers from 13 of the workshops co-located with the ISWC 2018 conference, namely:
• Visualization and Interaction for Ontologies and Linked Data (VOILA 2018)
• International Workshop on Semantic Web Technologies for Health Data Management (SWH 2018)
• The 13th International Workshop on Ontology Matching (OM-2018)
• Decentralizing the Semantic Web (DeSemWeb2018)
• Augmenting Intelligence with Humans-in-the-Loop (HumL@ISWC2018)
• The 2nd Workshop on Enabling Open Semantic Science (SemSci 2018)
• The 9th Workshop on Ontology Design and Patterns (WOP 2018)
• The Fourth International Workshop on Natural Language Interfaces for the Web of Data (NLIWOD) and the 9th Question Answering over Linked Data (QALD) challenge
• The 9th Workshop on Semantic Sensor Networks (SSN2018)
• The 4th Workshop on Semantic Deep Learning (SemDeep-4)
• Semantic Web for Social Good (SWSG)
• The 12th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS2018)
• The 6th International Workshop on Semantic Statistics (SemStats 2018)
Each workshop's organizing committee evaluated the papers accepted for their workshop to propose those to be included in this volume. The authors of the selected papers improved their original submissions, taking into account the review comments. As a result, 18 papers were selected for this volume.
We would like to take this opportunity to thank the workshop organizers and authors for their invaluable and inspiring contributions to the ISWC 2018 workshop programme.
Some of the papers included in this volume also appear as part of the corresponding volumes in CEUR Workshop Proceedings. In this case, a reference to the corresponding CEUR volume is provided. We would like to thank Manfred Jeusfeld for cooperation and support.
We would also like to thank the Organizing Committee of ISWC 2018 and especially the local organizers for supporting the day-to-day operation and execution of the workshops.
October 2018
Elena Demidova
Amrapali Zaveri
Elena Simperl
The emergence of several ontology modeling tools is motivated by the growing attention ontologies receive in scientific and industrial contexts. The available tools implement different ontology modeling paradigms, including text-based editors, graphical user interfaces with hierarchical trees and form widgets, and visual modeling approaches based on node-link diagrams. In this paper, we present an empirical user study comparing a visual ontology modeling approach, based on node-link diagrams, with a modeling paradigm that uses hierarchical trees and form widgets. In particular, the user study compares the two ontology modeling tools: Protégé and WebVOWL Editor, each implementing one modeling paradigm. The involved participants were given the tasks of ontology modeling and also answered reflective questions for the individual tools. We recorded completion times of the modeling tasks, the errors made, and users' understanding of the conceptual spaces. The study indicates that visual ontology modeling, based on node-link diagrams, is comparatively easy to learn and is recommended especially for users with little experience with ontology modeling and its formalizations. For more experienced users, no clear performance differences are found between the two modeling paradigms; both seem to have their pros and cons depending on the type of ontology and modeling context.
The use of virtual reality games, known as “exergaming”, is gaining more and more interest as a mobilization tool and as a key piece in the delivery of quality health, especially in elderly people. Mobility tracking of elderly people facilitates the extraction of useful spatiotemporal characteristics regarding their activities and behavior at home. Currently, the analysis of human mobility is based on expensive technologies. In this paper, we propose a semantic interoperability agent which exploits mobility tracking and spatiotemporal characteristics to extract human profiling and give incentives for mobilization at home. The agent exploits an extended ontology which facilitates the collation of evidence for the effects of exergaming on the movement control of older adults. In order to provide personalized monitoring services, a number of rules are individually defined to generate incentives. To evaluate the proposed semantic interoperability agent, human mobility data are collected and analyzed based on daily activities, their duration and mobility patterns. We show that the proposed agent is robust enough for activity classification, and that the recommendations for mobilization are accurate. We further demonstrate the agent's potential in useful knowledge inference regarding personalized elderly people home care.
Top-level ontologies play an important role in the construction and integration of domain ontologies, providing a well-founded reference model that can be shared across domains. While most efforts in ontology matching have been particularly dedicated to domain ontologies, the problem of matching domain and top-level ontologies has been addressed to a lesser extent. This is a challenging task in the field, specially due to the different levels of abstraction of these ontologies. This paper addresses this problem by proposing an approach that relies on existing alignments between WordNet and top-level ontologies. Our approach explores word sense disambiguation and word embedding models. We evaluate our approach in the task of matching DOLCE and SUMO top-level ontologies to ontologies from three different domains.
Since the proposal of RDF as a standard for representing statements about entities, diverse interfaces to publish and strategies to query RDF data have been proposed. Although some recent proposals are aware of the advantages and disadvantages of state-of-the-art approaches, no work has yet tried to integrate them into a hybrid system that exploits their, in many cases, complementary strengths to process queries more efficiently than each of these approaches could do individually. In this paper, we present HYBRIDSE, an approach that exploits the diverse characteristics of queryable RDF interfaces to efficiently process SPARQL queries. We present a brief study of the characteristics of some of the most popular RDF interfaces (brTPF and SPARQL endpoints), a method to estimate the impact of using a particular interface on query evaluation, and a method to use multiple interfaces to efficiently process a query. Our experiments, using a well-known benchmark dataset and a large number of queries, with result sizes varying from 1 up to 1 million, show that HYBRIDSE processes queries up to three orders of magnitude faster and transfers up to four orders of magnitude less data.
Decentralizing the Web means that users gain the ability to store their data wherever they want, users can store their data in their web browsers. Web browsers allow decentralized applications to be deployed in one click and enable interaction with end-users. However, querying data stored in a large network of web browsers remains challenging. A network of web browsers gathers a very large number of browsers hosting small data. The network is highly dynamic, many web browsers are now running on mobile devices with limited resources, this raises issues on energy consumption. In this paper, we propose Snob, a query execution model for SPARQL query over RDF data hosted in a network of browsers. The execution of a query in a browser is similar to the execution of a federated SPARQL query over remote data sources. In Snob, direct neighbours in the network are the data sources and results received from neighbours are stored locally as intermediate results. As data sources are renewed every shuffling, the query continues to progress and could produce new results after each shuffling without congesting the network. To speed up query execution, browsers processing similar queries are connected through a semantic overlay network. Experimentation shows that the number of answers produced by queries grows with the number of executed queries in the network while the number of exchanged messages is always bounded per user.
In this paper we present RaScAL, an active learning approach to predicting real-valued scores for items given access to an oracle and knowledge of the overall item-ranking. In an experiment on six different datasets, we find that RaScAL consistently outperforms the state-of-the-art. The RaScAL algorithm represents one step within a proposed overall system of preference elicitations of scores via pairwise comparisons.
In the past year we have seen the release of a huge volume of open bibliographic citation data, thanks primarily to the efforts of the Initiative for Open Citations (I4OC, https://i4oc.org). However, the incomplete coverage of these data is one of the most important issues that the open scholarship community is currently facing. In this paper we present an approach for creating open citation data while supporting journal editors in their task of curating the reference lists contained within articles submitted to them for publication. Our contributions are twofold: (a) a basic workflow that supports editors in the management and correction of bibliographic references, and (b) a tool, called BCite, that creates open citation data compliant with an existing RDF-based citation repository: the OpenCitations Corpus.
Scientific figures, captions and accompanying text provide a valuable resource that comprise the evidence generated by a published scientific study. Extracting information pertaining to that evidence requires a pipeline made up of several intermediate steps. We describe machine reading analysis applied to papers that had been curated into the European Bioinformatics Institute's INTACT database describing molecular interactions. We unpack multiple steps in an extraction pipeline that ultimately attempts to identify the type of experiments being performed automatically. We apply machine vision and natural language processing to classify figures and their associated text based on the type of methods used in the experiment to a level of accuracy that can likely support future biocuration tasks.
As Linked Data available on the Web continue to grow, understanding their structure and assessing their quality remains a challenging task making such the bottleneck for their reuse. ABSTAT is an online semantic profiling tool which helps data consumers in better understanding of the data by extracting data-driven ontology patterns and statistics about the data. The SHACL Shapes Constraint Language helps users capturing quality issues in the data by means of constraints. In this paper we propose a methodology to improve the quality of different versions of the data by means of SHACL constraints learned from the semantic profiles produced by ABSTAT.
While the formal modeling of part-whole relationships has been of interest, and studied, in many fields including ontology modeling, as of yet there has been no dedicated ontology design pattern which goes beyond the modeling of an absolute minimum. We correct this by providing two patterns based on Winston's landmark paper, “A Taxonomy of Part-Whole Relations.”
Question-Answering systems enable users to retrieve answers to factual questions from various kinds of knowledge sources, but do not address how to respond cooperatively. We present INK, an initial inquiry system for RDF knowledge graphs that aims to return relevant responses, even when an answer cannot be found. It assembles knowledge relevant to the entities mentioned in the question without translating the input question into a query language. A user study indicates responses are found to be intelligible and relevant. Evaluation of questions with known answers gives high recall of 0.70 averaged on three QA datasets.
This paper addresses the problem of discovering Web of Things (WoT) agents, also known as servients, that can interact in a mediated or peer-to-peer fashion to form compound systems. We develop a frame-work that relies on the W3C Thing Description (TD) and Semantic Sensor Network (SSN) ontologies, which provide semantics for the physical world entities WoT servients are associated with.
We formulate the problem of WoT discovery as an abductive reasoning problem over knowledge bases expressed in terms of TD and SSN concepts, where new semantic relationships between existing systems lead to the creation of other, larger systems. We then address the specific case of
We illustrate the feasability of our approach on an experimental industrial workstation, equipped with micro-controllers capable of storing and exchanging RDF data in binary format (μRDF store with EXI4JSON serialization).
The joint OGC and W3C standard SOSA/SSN ontology describes sensors, observations, sampling, and actuation. The W3C Thing Description ontology under development in the W3C WoT working group describes things and their interaction patterns. In this paper we are interested in combining these two ontologies for modeling Smart-Sensors. Along with basic sensors, a Smart-Sensor contains a micro-controller that can run different algorithms adapted to the context and a communicating system that exposes the Smart-Sensor on some network. For example, a smart accelerometer can be used to measure cycling cadence, step numbers or a variety of other things. The SOSA/SSN ontology is only able to model partially the adaptation capabilities of Smart-Sensors to different contexts. Thus, we design an SOSA/SSN extension, called the Semantic Smart Sensor Network (S3N) ontology. S3N answers several competency questions such as how to adapt the Smart-Sensor to the current context of use, that is to say selecting the algorithms to provide the right sensors outputs and the micro-controller capabilities.
We propose a method for creating a semantic profile of user's interests emerging from pictures by application of neural networks trained for object recognition. We use BabelNet, an online encyclopaedic dictionary, to generalise object names into categories of interests. Our method is evaluated with ground-truth data based on social tagging mechanism. Experiments are conducted entirely on original data containing 60,000 images crawled from Flickr, evenly distributed among 300 users. Results show that object recognition methods combined with object category generalisation can be effectively used to predict user's interests. The accuracy of the presented method seems to change with the neural network used for object recognition (5 NN tested in total), therefore it has a strong potential for further development. ResNet-50 turned out to be the most accurate network in our experiment.
In scientific disciplines where research findings have a strong impact on society, reducing the amount of time it takes to understand, synthesize and exploit the research is invaluable. Topic modeling is an effective technique for summarizing a collection of documents to find the main themes among them and to classify other documents that have a similar mixture of co-occurring words. We show how grounding a topic model with an ontology, extracted from a glossary of important domain phrases, improves the topics generated and makes them easier to understand. We apply and evaluate this method to the climate science domain. The result improves the topics generated and supports faster research understanding, discovery of social networks among researchers, and automatic ontology generation.
Querying the Web of Data is highly motivated by the use of federation approaches mainly SPARQL query federation when the data is available through endpoints. Different benchmarks have been proposed to exploit the full potential of SPARQL query federation approaches in real world scenarios with their limitations in size and complexity. Previously, we introduced LargeRDFBench – a billion-triple benchmark for SPARQL query federation. In this work, we pinpoint some of of the limitation of LargeRDFBench and propose an extension with 8 additional queries. Our evaluation results of the state-of-the-art federation engines revealed interesting insights, when tested on these additional queries.
As the Linked Open Data Cloud is constantly evolving, both at schema and instance level, there is a need for systems that efficiently support storing and querying of such data. However, there is a limited number of such systems and even fewer benchmarks that test their performance. In this paper, we describe in detail the Semantic Publishing Versioning Benchmark (SPVB) that aims to test the ability of versioning systems to efficiently manage versioned Linked Data datasets and queries evaluated on top of these datasets. We discuss the benchmark data and SPARQL query generation process, as well as the evaluation methodology we followed for assessing the performance of a benchmarked system. Finally, we describe a set of experiments conducted with the R43ples and Virtuoso systems using SPVB.
Statistics data is often published as tabular data by statistics offices and governmental agencies. In last years, many of these institutions have addressed an interoperable way of releasing their data, by means of semantic technologies. Existing approaches normally employ ad-hoc techniques to transform this tabular data to the Statistics Knowledge Graph (SKG) and materialize it. This approach imposes the need of periodical maintenance to ensure the synchronization between the dataset and the transformation result. Using R2RML, the W3C mapping language recommendation, the generation of virtual SKG is possible thanks to the capability of its processors. However, as the size of the R2RML mapping documents depends on the number of columns in the tabular data and the number of dimensions to be generated, it may be prohibitively large, hindering its maintenance. In this paper we propose an approach to reduce the size of the mapping document by extending RMLC, a mapping language for tabular data. We provide a mapping translator from RMLC to R2RML and a comparative analysis over two different real statistics datasets.