
Ebook: Efficient Source Selection for SPARQL Endpoint Query Federation

Efficient source selection is one of the most important optimization steps in federated SPARQL query processing. An overestimation of sources increases the network traffic, leads to irrelevant intermediate results, and can significantly affect the overall query processing time. Previous works have focused on generating optimized query execution plans for fast result retrieval. However, devising join-aware source selection approaches has not received much attention. Similarly, only little attention has been paid to the effect of duplicated data on federated querying. This book presents solutions to the join-aware source selection as well as duplicate-aware federated querying over the Web of Data.
Benchmarking is indispensable when aiming to assess technologies with respect to their suitability for given tasks. While several benchmarks have been developed to evaluate federated SPARQL engines and triple stores, they mostly provide a one-fits-all solution to the benchmarking problem. This approach to benchmarking is however unsuitable to evaluate the performance of a triple store for a given application with particular requirements. We address these drawbacks by presenting an automatic approach for the generation of benchmarks out of real query logs.
The book will be of interest to all those working on these two key areas of federated SPARQL query processing. The tools presented in this book are open source.
First of all I would like to thank my supervisors, Dr. Axel-Cyrille Ngonga Ngomo and Prof. Klaus-Peter Fähnrich, without whom I could not have started my Ph.D. at the Leipzig University.
Special thanks to my direct supervisor Dr. Axel-Cyrille Ngonga Ngomo, with whom I have started work on my Ph.D. proposal submitted to Deutscher Akademischer Austauschdienst (DAAD), in order to pursue my Ph.D. at the Agile Knowledge Engineering and Semantic Web (AKSW) group. He has continuously supported me throughout my Ph.D. work, giving advices and recommendations for further research steps. His comments and notes were very helpful for me particularly during the writing of the papers we published together. I would like to thank him also for proofreading this thesis and for his helpful feedback, which led to improving the quality of that thesis. Special thanks also to thank Prof. Sören Auer, whom I have first contacted asking for a vacancy to conduct my Ph.D. research at his research group. I am thankful to Dr. Jens Lehman for valuable discussions during my Ph.D. work
I would like also to thank Prof. Klaus-Peter Fähnrich, for the regular follow-up meetings he has managed in order to evaluate the performance of all Ph.D. students. During these meetings, he has proposed several directions for me and for other Ph.D. students as well, on how to deepen and extend our research points. I would like to thank all of my colleagues in the Semantic Abstraction (SIMBA) group, for providing me with their useful comments and guidelines, especially during the initial phase of my Ph.D.
I would like to dedicate this work to the souls of my parents, without whom I could not do anything in my life. With their help and support, I could take my first steps in my scientific career. Special thanks goes to all my family members and my friend Ahsan Rasheed.