Ebook: Information Modelling and Knowledge Bases XXXIII
The technology of information modelling and knowledge bases addresses the complexities of modelling in digital transformation and digital innovation, reaching beyond the traditional borders of information systems and academic research in computer science.
This book presents 21 papers from the 31st International conference on Information Modeling and Knowledge Bases (EJC 2021), hosted by the Department Informatik of the University of Applied Sciences in Hamburg, Germany, and held as a virtual event from 7 to 9 September 2021 due to restrictions caused by the Corona virus. The conference provides a research forum for academics and practitioners dealing with information and knowledge to exchange scientific results and experiences, and EJC 2021 covered a wide range of themes extending knowledge discovery through conceptual modeling, knowledge and information modeling and discovery, linguistic modeling, cross-cultural communication and social computing, environmental modeling and engineering, and multimedia data modeling and systems. As always, the conference was open to new topics related to its main themes, meaning the content emphasis of the EJC conferences is always able to adapt to the changes taking place in the research field, and the 21 papers included here after rigorous review, selection and upgrading are the result of presentations, comments, and discussions during the conference.
Providing an up to the minute overview of the technology of information modeling and knowledge bases, the book will be of interest to all those working in the field.
Information Modelling and Knowledge Bases has become an important technology contributor to academic and industry research in the 21st century. It addresses the complexities of modelling in digital transformation and digital innovation, reaching beyond the traditional borders of information systems and computer science academic research.
The amount and complexity of information itself, the number of abstraction levels of information, and the size of databases and knowledge bases are continuously growing. The diversity of data sources combines data from traditional legacy sources to stream-based unstructured data requiring backwards modelling. Conceptual modelling is one of the sub-areas of information modelling. The aim of this conference is to bring together experts from different areas of computer science and other disciplines who have a common interest in understanding and solving problems of information modelling and knowledge bases, as well as applying the results of research to practice. We also aim to recognize and study new areas on modelling and knowledge bases to which more attention should be paid. Therefore, philosophy and logic, cognitive science, knowledge management, linguistics, and management science as well as machine learning and AI are relevant areas, too.
The international conference on information modelling and knowledge bases originated from the co-operation between Japan and Finland in 1982 as the European-Japanese conference (EJC). Professor Ohsuga in Japan and Professors Hannu Kangassalo and Hannu Jaakkola from Finland (Nordic countries) then did the pioneering work for this long tradition of academic collaboration. Over the years, the conference has gradually expanded to include European and Asian countries, and gradually spread through networks of previous participants to other countries. In 2014, with this expanded geographical scope, the European-Japanese part in the title was replaced by International. The conference characteristics include opening with a keynote session followed by presentation sessions with enough time for discussions. The limited number of participants is typical for this conference.
The 31st International conference on Information Modelling and Knowledge Bases (EJC 2021) held in Hamburg, Germany constitutes a research forum for the exchange of scientific results and experiences and attracting academics and practitioners dealing with information and knowledge. The main topics of EJC 2021 cover a wide range of themes, extending knowledge discovery through Conceptual Modelling, Knowledge and Information Modelling and Discovery, Linguistic Modelling, Cross-Cultural Communication and Social Computing, Environmental Modelling and Engineering, and Multimedia Data Modelling and Systems. The conference has also been open to new topics related to its main themes. In this way, the content emphases of the conferences have been able to adapt to the changes taking place in the research field.
The conference was hosted by the Department Informatik of University of Applied Sciences Hamburg, Germany. Due to regulations caused by Corona virus, the conference was transformed into a virtual event this year, held on September 7–9, 2021. The conference had three categories of presentations: full papers, short papers and position papers. The contributions in this proceedings feature twenty-one reviewed, selected, and upgraded contributions that are the result of presentations, comments, and discussions during the conference. We thank all colleagues for their support in making this conference a success, the program committee, organization committee, and the program coordination team in particular, especially Professor Naofumi Yoshida, who maintained the paper submission and reviewing systems and compiled the files for this book.
Editors
Marina Tropmann-Frick
Hannu Jaakkola
Bernhard Thalheim
Yasushi Kiyoki
Naofumi Yoshida
A powerful new complement to traditional synchronous teaching is emerging: intelligent tutoring systems. The narrative: A learner interacts with a digital agent. The agent reviews, selects and proposes individually tailored educational resources and processes – i.e. a meaningful succession of instructions, tests or groupwork. The aim is to make personal tutored learning the new norm in higher education – especially in groups with heterogeneous educational backgrounds. The challenge: Today, there are no suitable data that allow computer-agents to learn how to take reasonable decisions. Available educational resources cannot be addressed by a computer logic because up to now they have not been tagged with machine-readable information at all or these have not been provided uniformly. And what’s worse: there are no agreed conceptual and structured models of what we understand by “learning”, how this model-to-be could be implemented in a computer algorithm and what those explicit decisions are that a tutoring system could take. So, a prerequisite for any future digital agent is to have a structured, computer-accessible model of “knowledge”. This model is required to qualify and quantify individual learning, to allow the association of resources as learning objects and to provide a base to operationalize learning for AI-based agents. We will suggest a conceptual model of “knowledge” based on a variant of Bloom’s taxonomy, transfer this concept of cognitive learning objectives into an ontology and describe an implementation into a web-based database application. The approach has been employed to model the basics of abstract knowledge in engineering mechanics at university-level. This paper addresses interdisciplinary aspects ranging from a teaching methodology, the taxonomy of knowledge in cognitive science, over a database-application for ontologies to an implementation of this model in a Grails service. We aim to deliver this web-based ontology, its user-interfaces and APIs into a research network that qualifies AI-based agents for competence-based tutoring.
Computers were originally developed for executing complex calculations fast and effectively. The intelligence of computer was based on arithmetic capabilities. This has been the mainstream in the development of computers until now. In the middle of 1950s a new application area, Artificial Intelligence (AI), was introduced by researchers. They had interest to use computers to solve problems in the way intelligent beings do. The architecture, which supported calculations, were conquered to perform tasks associated with intelligence beings, to execute inference operations and to simulate human sense. Artificial intelligence has had several reincarnation cycles; it has reappeared in different manifestations since this research area became interesting for the researchers. All the time a lot of discussion about intelligence of these systems has been going on – are the AI based systems and robots intelligent, what is the difference of human and machine intelligence, etc. Abilities related to intelligence cover ability to acquire and apply knowledge and skills, as well as ability to learn. AI provides different manifestations to the term “intelligence”: the human intelligence is a wide variety of different types of intelligence, as well as the meaning of artificial intelligence has varied over time. In our paper we will look to this term, especially to provide means for comparing human and artificial intelligence and have a look to the learning capability related to it.
We study possibilities and ways to increase automation, efficiency, and digitization of industrial processes by integrating knowledge gained from UAV (unmanned aerial vehicle) images with systems to support managerial decision-making. Here we present our results in the secondary wood processing industry. First, we present a deployed solution for repeated area and volume estimated calculations of wood stock areas from our UAV images in the customer’s warehouse. Processing with the commercial software we use is time-consuming and requires annotation by humans (each time aerial images are processed). Second, we present a partial solution where for computing areas of woodpiles, the only human activity is annotating training images for deep neural networks’ supervised learning (only once in a while). Third, we discuss a multicriterial evaluation of possible improvements concerning the precision, frequency, and processing time. The method uses UAVs to take images of woodpiles, deep neural networks for semantic segmentation, and an algorithm to improve results. (semantic segmentation as image classification at a pixel level). Our experiments compare several architectures, backbones, and hyperparameters on real-world data. To calculate also volumes, the feasibility of our approach and to verify it will function as envisioned is verified by a proof of concept. The exchange of knowledge with industrial processes is mediated by ontological comparison and translation of OWL into UML. Furthermore, it shows the possibility of establishing communication between knowledge extractors from images taken by UAVs and managerial decision systems.
Computer programming is popularized in 21st century education in terms of allowing intensive logical thinking for students. Artificial Intelligent and robotic field is considered to be the most attractive for programming today. However, for the first-time learners and novice programmers, they may encounter a difficulty in understanding the text-based style programming language with its special syntax, sematic, libraries, and the structure of the program itself. In this work, we proposed a visual programming environment for artificial intelligent and robotic application using Google Blockly. The development framework is a web application which is capable of using Google Blockly to create a program and translate the result of visual programming style to conventional text-based programming. This allows almost instant programming capability for learners of programming in such a complex system.
In this study, we will examine how the speech sounds generated by one-to-one or one-to-n human communication are not only simple exchanges of intentions and opinions, but are also clearly divided into linguistic expression and linguistic understanding. In this course, we will discuss the problems of interaction between interlocutors and the complex interplay of phenomena that occur in individual interlocutors. We propose a method to determine the semantic frame of a dialogue contextually by integrating and calculating the features of speech and the features of the meaning of words.
Models are used everywhere, in daily life, sciences, engineering, and thoughts. They represent, support, and enable our thinking, acting, reflecting, communication, and understanding. They are universal instruments. Reasoning through and by models is, however, different from those that we use in ‘exact’ sciences and is far less understood. The notion of model is becoming nowadays well-accepted. Reasoning through models is far less understood and a long lacuna. This keynote aims at closing this gap.
It is highlighted for machine learning models implementing functions based on data training without program coding. Artificial neural network is one of the efficient machine learning models. Different from the other machine learning models like artificial neural network, we have presented semantic computing models which represent “meaning” of machine learning results. In our model, semantic spaces are created based on training data sets. Data calculations are performed on the space. Data are mapped to semantic spaces and presented as points in semantic spaces. The mapped positions of data represent the “meaning” of data. In this paper, we first present our new discovery in the formation of semantic spaces. We use the word “matter” to represent features of semantic spaces which are related to the non-temporal data. As the same time, we use the word “dark-matter” to represent the features of semantic spaces which are temporally changed. We use the word “energy” to represent matrixes which are used in the semantic computations to generate output data. We reveal that the “dark-matter” is the spatiotemporal matrix and present a mechanism of “memory” for implementing the semantic computation. The most important contribution of this paper is that we developed a new mechanism for implementing machine learning with “knowledge” in the “memory.” In the paper, we use case studies to illustrate the concepts and the mechanism. At the beginning, we present an example on creating a semantic space from a “chaotic state” to an “ordered state.” After that, we use examples to illustrate the mechanism of the “memory” and the semantic computation. The space expansion and the space division are also illustrated by examples.
Global warming and climate change affect not only all living things but also many non-living things. Furthermore, those phenomena caused extreme disasters that become impossible to ignore. Coral bleaching is a phenomenon to show ocean warming due to climate change. This paper presents the analysis and visualization of the coral health levels database by using 5D World Map System. Coral health levels are analyzed using a coral-knowledge image that includes coral with a coral health chart. We use image processing and color semantic distance to interpret coral health levels. We have implemented an actual space integration system to access environmental information resources with coral health levels and image analysis that the results have been shown on the 5D World Map System. As for the experiment study, the study areas of coral health levels analysis are located in the ocean close to Thailand’s islands as Ko Ha (Five Island), Ko Bon, Ko Hin Ngam, Ko Tarutao, Ko Thalu, and Ko Samaesarn.
Semantic space creation and computing are essentially significant to realize semantic interpretations of situations and symptoms in human-health. We have presented a semantic space creation and computing method for domain-specific research areas. This method realizes semantic space creation with domain-oriented knowledge and databases. This paper presents a semantic space creation and computing method for “Human-Health Database” with the implementation process for “Human-Health-Analytical Semantic Computing”. This paper also presents a new knowledge base creation method for personal health data for preventive care and potential risk inspection with global and geographical mapping and visualization in 5-Dimensional World Map System. This method focuses on the analysis of personal health and potential-risk inspection and realizes a set of semantic computing functions for semantic interpretations of situations and symptoms in human-health. This method is applied to “Human-Health-Analytical Semantic Computing” to realize world-wide evaluation for (1) multi-parameterized personal health data, such as various biomarkers, clinical physical parameters, lifestyle parameters, other clinical/physiological or human health factors, etc., for health monitoring, and (2) time-series multi-parameterized health data in the national/regional level for global analysis of potential cause of disease. This Human-Health-Analytical Semantic Computing method realizes a new multidimensional data analysis and knowledge sharing for a global-level health monitoring and disease analysis. The computational results are able to be visualized in the time-series difference of the values in each place, the difference between the values of multiple places in a focused area, and the time-series differences between the values of multiple places to detect and predict a potential-risk of diseases.
Modern district heating (DH) systems are complex engineering structures that play an essential role in large city infrastructures. DH networks have many sensors, nodes, and methods for monitoring the status of the DH network. Sensing, processing, analytical actuation (SPA) of incoming information handled by the SPA semantic Computing method can be applied to similar problems. The SPA Semantic Computing method searches for correlations between the sets of incoming data and to identify the correct scenario to respond to events. This article explores the integration of SPA functions to analyze multivariate sensing data, including data from multivariable sensors and infrared images, for creating a monitoring system for DH networks. The focus is to assess whether the SPA approach is a suitable candidate to use to monitor the emergency events of the DH network. Specific target data for the assessment are [1] multi-parameter DH network sensor data, such as water temperature, sweat rate, energy delivered, etc., and [2] infrared image data from a camera mounted on the unmanned aerial vehicle (UAV) for monitoring the location of the underground DH network leaks. A multivariate computational model, a mathematical model of meaning (MMM), and a spatial image filtering method are proposed for integrating SPA semantic computing for emergency leak detection in DH networks.
Modern information technology makes it possible to redesign the ways people work. In the future, machines can carry out intelligence-requiring tasks, which previously were done by people. It is thus good to develop methodologies for designing intelligent systems. An example of such methods is cognitive mimetics, i.e. imitating human information processing. Today, machines cannot by themselves navigate in archipelagos. However, the fact that people can take care of ship steering and navigation means that there is an information process, which makes it possible to navigate ships. This information process takes place inside the minds of navigating people. If we are able to explicate the information processing in the navigator’s mind, the knowledge of it can be used in designing intelligent machines. Replicating physical objects and industrial processes by means of digital computers is called digital twinning. Digital twins (DTs), which are digital replicas of physical systems and processes, have recently become tools for working with complex industrial processes. A crucial question for DTs is should human actions be added to them? As the answer is positive, such models of human information processing can be called human digital twins (HDTs). The knowledge of human tacit and explicit information processes can be represented by human digital twins. Models can be used in the search for a deeper understanding of human intelligent information processes. Human digital twins can thus be used as methodological tools in cognitive mimetics. In our present study, we modeled paper machine operators’ thinking. Specifically, we developed an ideal-exception-correction (IEC) model for paper operators’ control logic. The model illustrates how research and HDT-modeling can be used for explicating the subconscious or tacit information processing of people for the design of intelligent systems. In this article a model for design processes using cognitive modelling will be suggested. The concepts of cognitive mimetics and human digital twins enable us to outline a model for using the long tradition of simulating human thinking as a tool in designing intelligent systems.
The paper deals with the introduction of TILUS tool for the needs of appropriate textual information sources retrieval and natural language processing. TILUS tool presupposed up to now that all the data are formalized in TIL-Script, the computational variant of Transparent Intensional Logic (TIL). We outline the general proposal of utilizing the Stanford typed dependencies representation for semi-automate conversion of natural language into TIL-Script. In order to be able to correctly solve this problem, we also introduce our universal conceptualization which is able to cover the thematic variations of processed texts.
Privacy is a fundamental human right and is widely end extensively protected in the western industrialized world. The recent advances in technologies, especially in the use of applications developed and designed for mobile devices, have led to the rise of its abuse on one hand and a higher awareness of the importance of privacy on the other side. Legal texts protecting privacy have attempted to rectify some of the problems, but the ecosystem giants and mobile apps developers adapted. In this paper, we analyze which data mobile apps developers are collecting. We have focused on a sample of apps in the medical and health field. The research was done using collocations analysis. A relationship between a base word and its collocative partners was sought. The initial visual results have led us to more detailed studies that unveiled some worrying patterns. Namely, applications are collect data about the users and their activities, but also about their family members, medical diagnoses, treatments, and alike, going well beyond the “need to function” / functionality threshold.
This paper deals with an optimization of methods for recommending relevant text sources. We summarize methods that are based on a theory of Association Rules and Formal Conceptual Analysis which are computationally demanding. Therefore we are applying the ‘Iceberg Concepts’, which significantly prune output data space and thus accelerate the whole process of the calculation. Association Rules and the Relevant Ordering, which is an FCA-based method, are applied on data obtained from explications of an atomic concept. Explications are procured from natural language sentences formalized into TIL constructions and processed by a machine learning algorithm. TIL constructions are utilized only as a specification language and they are described in numerous publications, so we do not deal with TIL in this paper.
The ongoing COVID-19 pandemic brings new challenges and risks in various areas of our lives. The lack of viable treatments is one of the issues in coping with the pandemic. Developing a new drug usually takes 10-15 years, which is an issue since treatments for COVID-19 are required now. As an alternative to developing new drugs, the repurposing of existing drugs has been proposed. One of the scientific methods that can be used for drug repurposing is literature-based discovery (LBD). LBD uncovers hidden knowledge in the scientific literature and has already successfully been used for drug repurposing in the past. We provide an overview of existing LBD methods that can be utilized to search for new COVID-19 treatments. Furthermore, we compare the three LBD systems Arrowsmith, BITOLA, and SemBT, concerning their suitability for this task. Our research shows that semantic models appear to be the most suitable for drug repurposing. Nevertheless, Arrowsmith currently yields the best results, despite using a co-occurrence model instead of a semantic model. However, it achieves the good results because BITOLA and SemBT currently do not allow for COVID-19 related searches. Once this limitation is removed, SemBT, which uses a semantic model, will be the better choice for the task.
Maximum Entropy Model (MEM) [1] [4] estimates probability distribution functions, by which current state of knowledge is described in the context of prior data. Here we examine Generalized Iterative Scaling (GIS) [1] algorithm to determine optimum feature weights with feature selection during learning. Maximum Entropy principle [1] provides us with all the characteristics of the data given in advance and we could expect robust distribution against outlier. However it takes much time until convergence because the computation depends heavily on the number of classes. We introduce a novel approach random sampling of Monte Carlo method into GIS for improved computation.
Today, vast amounts of data are collected from the internet, and the general public generates most data using social networks. There is a need to have a comprehensive approach to characterize the quality of such user-generated data collection from the internet. The data quality characteristics accepted among database and computer science communities have definitions that are not domain-specific. Therefore, there is no clear understanding of the data quality characteristics specific to user-generated content. This research examines different user-generated content platforms against the general data quality characteristics to determine which quality characteristics are essential for user-generated content. The research contributes to a list of definitions of those data quality characteristics specific to user-generated content. These definitions help identify quality characteristics useful for user-generated content platforms and their implementations. The quality of the content of Atlas of Living Australia, Twitter, YouTube, Wikipedia, and WalkingPaths is evaluated to assess the essence of the quality characteristics defined in this research.
In this paper, we deal with the question of how the variety of trip opportunities can be modeled in – possibly complex – recreational trail networks (such as hiking paths or cycling ways). In order to quantify the variety of possible loop trips starting from specific trailheads (starting nodes accessible from outside the network) and the variety of connecting trips between specific origin-destination pairs, two novel measures of Loop Trip Variety Index (LTVI) and Connecting Trip Variety Index (CTVI) are proposed preliminarily and informally in [12], respectively, in the frame of assessing the impacts of some recent trail network developments. This paper establishes the formal definitions of improved variants of these measures, shows their well-definedness, presents the algorithms of their computation, investigates on their properties and benefits, and gives reasons of how and to what extent they can be treated as models of trip variety. Possible uses, application areas and future improvements are sketched especially for visitor management planning and profile-based trip recommendation systems.
The authors have developed the Geo CPS platform, which incorporates the advantages of cyber-physical systems, geographic information systems, and tangible user interfaces, and provides a platform to connect the three components for interactive sensing, processing, and actuation in smart city development. It can be also applied for education in environmental and disaster management, as a tool for technical training, or as a testbed for business solutions.
Dynamic routing with combinations of mobility and activity is expected as new methodology for supporting sensitivity to various contexts for traveling. It is important to realize the dynamics by integrating “mobility and activity” in physical and cyber spaces. This paper presents a mobility and activity integration system for making routing plans from an original point to a destination with a scenario as “sensitivity to context” on the route. The “sensitivity to context” expresses reactions to the intentions and situations of a moving user. This system applies semantic computing to find out the appropriate mobility and activity, that dynamically calculates semantic associations between user’s intentions and mobility services. This system makes a moving plan reflecting “sensitivity to context” created by query creation operators for synthesizing and expressing “everyday intention” and “mobility situation.” This system has a distance calculation function for “feature value vectors” expressing the means of mobility and the features of facility spots, and outputs some expected means of moving towards the destination with activities on the route.
Increasingly, educational institutes are migrating into mobile platforms and mobile app technology to communicate, advertise, and dissimilate education-related information to their stakeholders using virtual campus journey mobile apps. Campus journey mobile apps generally provide standardized generic customization to their user base, incorporating a list of favorite touchpoints based on the users’ frequent behaviors. In the literature, personalization, and customization, definitions are mixed-up and inter-changed with no proper separation of these two concepts. In this research, the personalized virtual journey is examined according to the user’s preference. It includes creating and updating the personalized virtual campus journey path as an activity of the user and having it as an integral part of the personalized virtual campus journey application. This research presents the concept structure, design, implementation, and evaluation results of the personalized virtual campus journey mobile app development according to the user’s preferences with the user given the ability to control his or her own virtual journey experience.
For dynamically integrating professional knowledge of curators, a multidatabase system architecture for an art museum, Artizon Cloud is proposed. A location based data provision is defined in the architecture for visitors. A system and applications are implemented and provided in an actual museum, and heterogeneous archives that were independently implemented as databases with Web UIs are dynamically extracted, integrated, and staged in visitors’ devices.