Ebook: Multi-modal Data Fusion based on Embeddings

Series

Studies on the Semantic Web

Volume

Published

2019

Authors

Steffen Thoma

ISBN

978-1-64368-028-6 (print) | 978-1-64368-029-3 (online)

Subject(s)

Semantic Web

Description

Many web pages include structured data in the form of semantic markup, which can be transferred to the Resource Description Framework (RDF) or provide an interface to retrieve RDF data directly. This RDF data enables machines to automatically process and use the data. When applications need data from more than one source the data has to be integrated, and the automation of this can be challenging. Usually, vocabularies are used to concisely describe the data, but because of the decentralized nature of the web, multiple data sources can provide similar information with different vocabularies, making integration more difficult.

This book, Multi-modal Data Fusion based on Embeddings, describes how similar statements about entities can be identified across sources, independent of the vocabulary and data modeling choices. Previous approaches have relied on clean and extensively modeled ontologies for the alignment of statements, but the often noisy data in a web context does not necessarily adhere to these prerequisites. In this book, the use of RDF label information of entities is proposed to tackle this problem. In combination with embeddings, the use of label information allows for a better integration of noisy data, something that has been empirically confirmed by experiment. The book presents two main scientific contributions: the vocabulary and modeling agnostic fusion approach on the purely textual label information, and the combination of three different modalities into one multi-modal embedding space for a more human-like notion of similarity.

The book will be of interest to all those faced with the problem of processing data from multiple web-based sources.

↓ more

↑ less

Order

Contents

Front Matter and Contents

Pages

i - xxii

Category

Front Matter

Preface

Many web pages include structured data in the form of semantic markup which can be transferred to RDF or provide an interface to retrieve RDF data directly. This RDF data enables machines to automatically process and use the data. When applications need data from more than one source, typically since the data in one source is incomplete or the sources cover different aspects, the data has to be integrated. For describing the data in a concise way, vocabularies are used. But because of the decentralized nature of the web, multiple data sources can provide similar information with different vocabularies. The use of different vocabularies and modeling choices on the data provider side makes integration difficult. In this thesis’ approach, similar statements about entities are identified across sources, independent of the vocabulary, and data modeling choices.

Previous approaches rely on clean and extensively modeled ontologies for aligning statements. But in a web context, data is usually noisy and does not necessarily adhere to these prerequisites. To tackle this problem, the use of RDF label information of entities is proposed which allows a better integration of noisy data. The presented experiments in this thesis confirm that. Traditional alignment approaches rely on string similarity measures on a purely syntactic level. They can neither handle synonyms nor detect semantic relationships between words. For incorporating a measure of semantic similarity, the use of textual embeddings is investigated which shows superior results.

However textual embeddings are restricted to the information reported in text and thereby are neglecting for human self-evident facts that usually are not captured in text. To mitigate this reporting bias, we investigate the incorporation of information from other modalities: We explore the potential of complementing the textual knowledge via learning of a shared latent representation by integrating information across three modalities: images, text, and knowledge graphs. Thereby, we leverage the results from years of research in different domains: Computer Vision, Computational Linguistics, and Semantic Web. In Computer Vision, visual object features are learned from large image collections, in Computational Linguistics, word embeddings are extracted from huge text corpora capturing their distributional semantics, and in the Semantic Web, embeddings of knowledge graphs effectively capture explicit relational knowledge about entities. This thesis investigates if by fusing the single-modal representations into a multi-modal one, a more holistic representation can be attained. Therefore, the problem of aligning and combining modalities is investigated. The holistic representation is demonstrated to better identify similarities as it contains the different aspects of an entity covered by the different modalities, e.g. visual attributes of entities cover shape and color information that is not easily covered in other modalities.

While the beneïňĄts of multi-modal embeddings have become clear, they are limited to a small number of concepts: The fusion is restricted to concepts with cross-modal alignments which are only available for a few concepts. Since alignments over different modalities are rare and expensive to create, an extrapolation approach to translate entity representations outside of the training corpus to the shared representation space is developed as the ïňĄnal contribution of this thesis.

↓ more

↑ less

Multi-modal Data Fusion based on Embeddings

Authors

Steffen Thoma

Pages

1 - 150

DOI

10.3233/SSW190008-mono

Ebook: Multi-modal Data Fusion based on Embeddings

This website uses cookies

This website uses cookies