
Ebook: Information Modelling and Knowledge Bases XXXVI

Information modeling and knowledge bases have become increasingly important for academic communities working with information systems, computer science, and artificial intelligence, and the volume and complexity and levels of abstraction, together with the size of databases and knowledge bases, continue to grow in parallel with the rising complexity of computational processes.
This book presents the proceedings of EJC 2024, the 34th international conference on Information Modelling and Knowledge Bases, held in Tokyo, Japan, from 10 to 14 June 2024. The EJC conference series aims to explore the progress in research communities with a common interest in understanding and solving problems on information modeling and knowledge bases and applying the results of research to practice by means of sharing scientific results and experiences achieved using innovative methods and systems in computer science and other disciplines. The selected papers published here cover many areas of information modeling and knowledge bases, including the theory of concepts, semantic computing, data mining, machine learning, context-based information retrieval, ontological technology, image databases, temporal and spatial databases, natural language processing, software engineering, cross-cultural computing, environmental analysis, social computing, and many others. This latest edition of the conference also addressed the question of global & environmental AI for nature and society and asked whether System 2 can do everything that AI cannot do yet.
Offering a comprehensive overview of current developments, the book will be of interest to all those working in the field.
Information modelling and knowledge bases have become important topics in academic communities working with information systems, computer science, and artificial intelligence. The volume and complexity of information, the number of abstraction levels of information, and the size of databases and knowledge bases are continuously growing in parallel with the rising complexity of computational processes.
The aim of the Information Modelling and Knowledge Bases EJC conference series is to explore the progress in research communities with a common interest in understanding and solving problems on information modelling and knowledge bases, as well as applying the results of research to practice, by means of scientific results and experiences achieved using innovative methods and systems in computer science and other disciplines. The research topics relevant to the conference series are mainly concentrated on a variety of themes in prominent domains, including conceptual modelling, design and specification of information systems, multimedia information modelling, multimedia systems, ontology, software engineering, knowledge and process management, knowledge bases, cross-cultural communication, context modelling, human language perception, and thought process. Attention is also paid to theoretical disciplines, including cognitive science, artificial intelligence, logic, linguistics and analytical philosophy.
The selected papers published here cover many areas of information modelling and knowledge bases, namely theory of concepts, semantic computing, data mining, machine learning, context-based information retrieval, ontological technology, image databases, temporal and spatial databases, natural language processing, software engineering, cross-cultural computing, environmental analysis, social computing, and many others. This latest edition of the conference also addresses the question of global & environmental AI for nature and society and asks whether system 2 does everything that AI cannot do yet?
We believe that this edition of Information Modelling and Knowledge Bases will be productive, valuable and fruitful in advancing research and applications in related academic areas.
EJC 2024 was hosted by the Faculty of Data Science of the Musashino University, Tokyo, Japan, from 10 to 14 June 2024. We thank all colleagues for their support in making this conference successful, especially the programme committee, organization committee, and the programme coordination team, especially Professor Naofumi Yoshida, who maintains the paper submission and reviewing systems and who compiled the files for this book.
Editors
Yasushi Kiyoki
Virach Sornlertlamvanich
Marina Tropmann-Frick
Hannu Jaakkola
Naofumi Yoshida
This paper introduces Global Contextualized Representations (GCoRe) – an extension for existing transformer-based language models. GCoRe addresses limitations in capturing global context and long-range dependencies by utilizing Graph Neural Networks for graph inference on a context graph constructed from the input text. Global contextualized features, derived from the context graph, are added to the token representations from the base language model. Experiment results show that GCoRe improves the performance of the baseline model (DeBERTa v3) by 0.57% on the HotpotQA dataset and by 0.15% on the SQuAD v2 dataset. In addition, GCoRe is able to answer questions that require logical reasoning and multi-hop inference, while the baseline model fails to provide correct answers.
Challenged by data-driven AI limitations in reasoning and knowledge depth, this work presents a novel approach for enhanced conversational understanding. We leverage advanced text analysis to strategically extract key information from FAQs, then utilize AI-generated questions and robust semantic similarity metrics to significantly improve user query matching precision. Through the strategic integration of important sentence extraction in knowledge preparation, coupled with question generation and the application of semantic textual similarity measures, our model achieves a substantial improvement in user query matching precision. We propose a dual-system architecture—augmenting System 1 with additional knowledge akin to System 2 in human cognition. The methodology is exemplified through chatbot correction using FAQs, demonstrating the potential for human-like mind processing. Results showcase improved semantic understanding and reasoning, offering a promising path for advancing AI capabilities in conversational contexts.
One of the essential computations in ocean environmental study is semantic computing for interpreting and analyzing the changes of various situations (coral area, ocean water, places of ocean livings, sea level, etc.). It is important to realize global ocean-environmental computing methodology for analyzing difference and diversity of nature and livings in a context dependent way from a viewpoint of global environments.
It is also significant to memorize those situations and compute ocean-environment change in various aspects and contexts, in order to discover what are happening in the nature of ocean. We have various aspects and contexts in ocean-environmental changes in our planet, and it is essential to realize a new analyzer for computing differences in those situations for finding actual aspects and contexts existing in oceans. We have proposed a new method for Differential Computing in our Multi-dimensional World map in [3,4]. We utilize a multi-dimensional semantic computing model, and a multi-dimensional space with an adaptive-axes selection mechanism to implement semantic computing [1,2]. Computing ocean-environmental changes in multi-aspects and contexts using semantic computing, important factors that change natural ocean-environment are highlighted.
Semantic computing is an important and promising approach to semantic analysis for various environmental phenomena and changes in real world. This paper presents a new concept of “Coral-Health-Level Analysis Semantic-Space for Ocean-environment” for realizing global Ocean-environmental analysis. This space and computing method are based on environmental-database creation with coral-health-level-analysis sensors for analyzing and interpreting environmental phenomena and changes occurring in the oceans. This paper focuses on coral-health-level in South-East Asian Ocean area, as an experimental study for creating “Coral-Health-Level Analysis Semantic-Space for Ocean-environment”.
We have created a semantic-space for coral-health-level analysis in the South-East Asian Ocean area with 24-dimensional axes (coral-health-level parameters). As the first step, this space is applied to South-East Asian Ocean area, and it is expandable to multiple spots in Ocean-areas to analyze and compare their coral-health-level situations in the global scope for Ocean-environmental analysis.
In this paper, we also apply 5-Dimensional World Map System for semantic computing to analysis in South-East Asian Ocean area, as an international collaborative platform for Ocean-environment analysis with spatio-temporal and semantic dimensions.
The article focuses on one aspect of artificial intelligence – Generative AI (GenAI) – which is expected to offer significant opportunities in different areas of business, industry, and society. GenAI is the current state of the decades-long development of artificial intelligence (AI), and many companies are currently looking for ways to benefit from this market-changing technology. The use of GenAI in business practices is topical among companies and organizations, and decision-makers across the globe are considering the future potential of GenAI and large language models (LLM) for organizations and businesses. The aim of this paper is to examine the utilization of artificial intelligence in business operations and industry, emphasizing both the opportunities offered by GenAI as well as the challenges related to its usage. Additionally, the paper strives to determine whether the phenomenon is real or merely hype, as well as addressing its so-called revolutionary status. The topic is approached through a light literature review and discussion on the findings of two studies carried out in Finnish companies related to GenAI utilization.
Geert Hofstede famously labelled culture as the “software of the mind”, affecting how people cognitively process the world, and how organisations, communities and societies are structured. This lends to explain how culture influences the ways that people, perceive, use and experience technology design, and how within user experience design, cultural logic should be applied to develop user interfaces (UI). This study draws on Hofstede’s cultural dimension of ‘uncertainty avoidance’ (UA) to examine how UA, or the ways in which people within certain cultures cope with uncertainty, unknown and change, to examine the influence of culture on self-service technology (STT) UI design. The authors evaluate a sample of ten UIs from various STTs in Japan, a country of higher UA (N = 5), and Finland (N = 5) a country of lower UA. The results show that in higher UA cultures design of STT’s UI often rely on multimodal interaction, bright colours, and clear progress guidance via illustrations. However, we find also some contradictions in design solutions within the same cultures. It seems that instead of designers’ cultural identities playing a role, designers’ expertise in usability, company brand, and requirements by context affect how UI components are constructed. We discuss theoretical impacts of these manifestations of UI design on how they relate to accessibility and usability. As an implication to the practice, we propose a UI design assumption that embraces ‘Zero Uncertainty’, combining clear flow guidance, text and illustrations, with multimodal guidance and feedback.
The rise of Artificial Intelligence (AI) streamlines social media content generation, with ChatGPT sparking a surge in generative AI models. This impact is crucial for businesses enhancing digital marketing and researchers studying AI’s role in this realm. Marketing agencies and business leaders are actively engaging in the AI race with tailored models. Amid the multitude of applications, discerning, understanding, and effectively integrating appropriate tools into business practices are increasingly complex tasks. This paper addresses two questions: first, the potential of ChatGPT’s current version to transform social media content creation for micro-businesses, and second, the initial observations and changes in user experience over an extended use period. To answer the questions, we used a case study approach within the framework of Experiential Learning Theory (ELT), coupled with contextual inquiry research to reduce bias. Data from 15 specific ChatGPT interactions highlight its capabilities in fostering creativity for resource-strained micro-enterprises. This tool proves valuable for businesses without a designated social media marketing team, allowing them to consistently produce high-quality content, aim higher, and alleviate the pressure of finding perfect ideas for scaling in a competitive marketplace. ChatGPT serves as an ally, enhancing human capabilities and offering a transformative solution for micro-enterprises in content creation and marketing, however, there are limitations and concerns to be addressed.
The early and automatic detection of faulty behavior is essential for maintaining the reliability of a cyber-physical system. In this paper we describe a fault localization approach for such a highly complex distributed system, the optical synchronization system of the European X-ray free-electron laser. Using a dependency graph, we model the relationships between the components and the influences of environmental effects. After we first resolve linear long-term dependencies between dependent components with a correlation analysis, we then use an unsupervised fault detection pipeline consisting of statistical feature extraction and unsupervised anomaly detection to accurately identify anomalies and localize their origins in the system.
The “Law of Act on the Promotion of Understanding of Sexual Orientation and Gender Identity Diversity” was passed in Japan on June 16, 2023, and it has become particularly important to deepen the understanding of the similarities and differences in attitudes toward LGBT (Lesbian, Gay, Bisexual, Transgender) in an international context to promote global understanding. This study proposed a method to analyze news stories in each linguistic area related to LGBT and derived the respective differences in attitudes using approximate inverse model explanations (AIME). We aggregated news articles on LGBT from January 1, 2021, to December 12, 2023, in several languages, such as German, Spanish, French, Italian, Dutch, Russian, Swedish, and Chinese, which were obtained through NewsAPI and translated using DeepL. Consequently, the characteristics (words) of the news in each language were extracted using AIME. Using this method, we aimed to clarify the different focus areas on LGBT and the issues involved in elucidating differences in perceptions of LGBT in different language areas.
Energy storage systems (ESS) are essential for improving the performance and reliability of power systems, especially as renewable energy sources become more prevalent. This study focuses on determining the minimum amount of ESS needed to reduce power loss associated with integrating renewable energy. Through detailed simulations and case studies using the IEEE33 and IEEE69 bus test distribution systems, this paper demonstrates the importance of setting this minimum requirement. This helps optimize system efficiency, cut costs, and strengthen overall resilience in the face of challenges posed by renewable energy integration.
This study investigates the complex dynamics of team sports, with a focus on the role of team members’ impressions of each other. We propose a novel team building method that incorporates dynamic elements to promote team formation and development, based on these impressions. A web-based system was developed to manage these impressions and collect data. This data was then analyzed to evaluate the impact of these impressions on team performance. Two team sports with distinct characteristics, American football and basketball, were examined. The results revealed that mutual impressions among team members significantly influenced team performance. This study underscores the importance of the interplay of team members’ impressions in team performance.
We describe a graph-analytical framework for supporting data-driven decisions whether activities in registering companies point towards fraudulent behavior. The decision support is enabled by the design of a property graph model which public government data is transformed and loaded into. Fraud patterns are encoded as Cypher queries which return all instances of the patterns present in the data. All insights report on anonymized, real cases.
In this paper, we proposed an implementation method of a system according to ‘tri-knowledge base with personal context vectors model’ that proposed in previous study. The system aims not only a single set of contexts and knowledge base, but also create snapshot and stored it as the memories of the changes in timeseries. Organisational contexts vectors were added to the system. As well as the personal context vectors to absorb personal preferences, the organisational context vectors can absorb organisational preference without changing the knowledgebase. Experiments were conducted to verify the functionality of memorising the changes in timeseries and the effects of the organisational context vector.
This paper proposes a spatio-temporal and categorical data mining method for commercial transaction data. This method searches for elements from a set using three search methods: specific, concept, and pattern, which represent the level of abstraction of search conditions, and spatio-temporal information and category information, which correspond to domain knowledge. This method aims to obtain knowledge of combinations of events with substantial physical, temporal, and categorical correlations between two data sets based on the amount of correlation and frequent occurrence patterns. In other words, it is to realize the mechanism in the database by which humans try to obtain knowledge through memory recall about the location, trends, timing, and frequency of events. This method uses aggregation functions to extract spatio-temporal and categorical features of elements in two different sets contained in a single set. Furthermore, this method performs Numerization, which converts the features from linguistic to numerical and Linguization, which converts the features from numerical and linguistic formats. The set elements are represented as vector data consisting of spatio-temporal and categorical features by numerical and linguistic formats. Numerical and linguistic formats are used for specific and conceptual searches, while linguistic formats are used for pattern searches. This method uses a dynamic vector creation function at search to dynamically map only those set elements that satisfy the search conditions into a semantic orthogonal space with time, space, and category dimensions. This method calculates correlation computation by calculating the distance between elements for each feature, normalizing and integrating their scores. Additionally, this method extracts as correlation rules combinations of events with substantial physical, temporal, and category correlations by calculating support and confidence levels based on the frequency of occurrence of the elements. Namely, Apriori algorithm is contained in the calculation for correlation rules. In this paper, we present the details of the proposed method and its implementation as an application in business commerce.
This paper proposes an innovative method for knowledge construction and application based on the concept of space mapping. Space mapping is a mathematical operation that maps a vector to a space, expressing the relationship between input and output information. By representing data as vectors and matrices, the method establishes a mathematical relationship between input and output using a “knowledge vector”. The “dark-matter matrix” transforms the input vector into the knowledge vector, mapping the input data onto the output space. The approach is extended to “parallel spaces”, allowing for independent knowledge vectors in each space.
The paper defines concepts related to space mapping, such as space-time mapping, time-space mapping, chain mapping, and parallel spaces. It presents schemes and methods for knowledge construction and application based on space mapping processing, including vector construction techniques and knowledge modeling methods like chain mapping and parallel spaces. These techniques enable handling complex data and implementing efficient, scalable knowledge construction and application schemes.
The method is illustrated with two application examples: robot navigation in a maze and image recognition using chain mapping. The results demonstrate the method’s ability to handle temporal, high-dimensional, and heterogeneous data aspects, create non-redundant knowledge vectors, and generate different output information based on application requirements.
In this paper, a dynamic curation method that uses pretrained models of art interpretation is proposed. The proposed method leverages exhibition catalogs (primary classifications and images) as a source of curatorial knowledge to construct a machine learning model, thereby using this model to evaluate, position, and visualize other artwork. This method enables the dynamic curation of artwork in a vast integrated art collection archive. A prototype system for the proposed method was implemented and applied to two exhibition catalogs at the Artizon Museum. In the experiment, artworks matching the purpose and art style of the modeled exhibition from the archives of the Metropolitan Museum of Art in New York City and Paris Musées were evaluated using the constructed model. The experiment demonstrated that the proposed method successfully classifies and enables the visualization of artwork and images available in the open data from the MET and Paris Musées in alignment with the intended theme of the exhibition for which the model was constructed. Feedback from curators and art professionals indicates that the proposed method can be used to compare museums with art collections of the same genre and to organize new exhibitions based on the curatorial model.
We propose a multidimensional corporate analysis system architecture, Dynamic Corporate Inspector (DCI), for stakeholders. This system architecture provides multifaceted corporate analysis of companies by changing text datasets and keyword sets. In this paper, We implemented the system using 4225 Japanese annual securities reports as text data and 173 keywords of SDGs as keyword sets. The analysis visualizes each company’s interest and activities for the SDGs.
The increasing elderly population necessitates increased geriatric care. However, a shortage of caregivers leads to a risk of falls and bedsores in the elderly, both of which result in severe injuries. Whilst wearable devices, and vision sensors have been adopted for monitoring. However, these sensors come with limitations, impacting comfort and privacy for the elderly. To address these challenges, non-intrusive sensing devices integrated into the environment offer promising value for continuous elderly activity monitoring. This study uses a panel sensor embedded with four sensors, consisting of two piezoelectric sensors and two pressure sensors. It is placed beneath the mattress. The position classification encompasses five distinct positions: off-bed, sitting, lying in the center, lying on the left side, and lying on the right side. To find the best position for placing the panel, the positions of the panel and the combination of panel sensors positions are evaluated for five-bed positions classification. As a result, the best position for a sensor panel was in the middle of the bed (position No. 3), with an accuracy of 97.12%. This suggests the panel sensor should be placed at 123.5 cm, measured from the top of the bed. Moreover, in the case of placing two-panel sensors, the most effective arrangement comprises placing one-panel sensor placed at the the bed-top (position No. 1) and the other in the middle of the bed (position No. 3), yielding accuracy 99.93%.
Achieving the 2030 and 2050 Paris Agreement targets to improve the global environment and address climate change is critical despite the costs and time requirements to implement possible measures. However, current measures are primarily undertaken from a global macro perspective, and lack established means to examine them from micro, realistic, and practical perspectives. CO2 direct air capture (DAC) is an innovative negative CO2 emission technology in its early commercial stages that can help control and mitigate climate change in the long term. Despite technological advances in the past decade, there are still misconceptions about the current and long-term costs of DAC, and energy, water, and area demands. This paper presents a knowledge-based indication method with a prototype system for a DAC location and cost simulator to support early-stage decisions from a micro-local perspective and promote the use of the applicable and scalable DAC technology. Additionally, this study presents a method for determining optimal locations, estimating project costs, and prioritizing projects for early-stage feasibility assessments, policy, and business decisions on DAC investments to accelerate its deployment. The main feature of this approach is to provide a data model and unified cost index ($/CO2 Ton) to calculate the optimal location and cost projection of DAC implementation based on location characteristics and a quantified industry knowledge base considering various cost-carbon intensity constraints—such as reservoirs, infrastructure, low-carbon electricity, heat, transportation, and atmospheric conditions—that affect the suitability of specific locations.
Knowledge about the real world is often recorded in plain text, such as posts on social networks, descriptions in various guides, etc. These messages include spatial information that can be extracted using natural language processing methods. The extracted information can then be represented as a planar graph, which can be further transformed into a topological map using additional information describing the area. This paper outlines an algorithm that takes a given planar graph as input and uses a multi-agent system to place individual points in 2D space, creating a topological map respecting all edge directions given in the narratives.
This paper explores an outline of the application of AI in standardizing place names within urban narratives, addressing discrepancies caused by diverse agent terminologies. By leveraging AI chatbots for named entity recognition, coreference resolution, and entity linking, the study proposes an interactive methodology for homogenizing place names across different accounts. This innovative approach aims to enhance the accuracy of information extraction from narratives, demonstrating the potential of AI models over traditional linguistic methods in resolving place name inconsistencies.
Electric vehicles (EVs) are gaining popularity in Thailand as an increasingly competitive, eco-friendly transportation option. However, the insufficient placement of battery charging has been a barrier to the widespread adoption of electric vehicles. In this study, we propose a recommendation system for EV charging locations to promote sustainability in smart cities. We utilize a distance function and k-nn algorithm to identify the most optimal locations to offer efficient coverage. To validate the effectiveness of our model, we conduct simulations for station placements in Chonburi, Thailand. Our proposed coverage model offers a novel approach to determining station locations, emphasizing proximity to existing gas stations and maximizing coverage. The ultimate objective is to identify ideal locations for smart city sustainability development and establish a theoretical foundation for deploying these new charging stations.
This paper presents a speed testing platform designed to evaluate Thailand’s Internet broadband network experience. The platform measures internet speed at the ONT (Optical network terminal) as a signal source for the operator (or ISP) and the connected internet device at the user’s accommodation (fixed broadband internet). Scenario tests were conducted involving four major broadband network operators in Thailand, with our application collecting data from 30 devised-based operators and 200 internet users, resulting in a dataset of 25,000 records. The primary focus of our study was on assessing download and upload speeds, as well as latency. The proposed system was deployed in 8 provinces east of Thailand over two months. Results indicate that the Download Percentage Average (DPA) was 64.30%, while the Upload Percentage Average (UPA) was 71.84%. The average ping and jitter were 11.82 ms and 16.53 ms, respectively. Analysis revealed variations in average speed and latency values across different areas and period times, underscoring the efficacy of our system. Our proposed platform could serve NBTC as a central platform for evaluating future broadband internet services in Thailand.
In the modern era of image-based applications, efficient and accurate image search algorithms have become essential. This study presents an innovative method to improve scene picture search by applying metric learning neural networks on a Neural Network Selection (NNS) architecture. The fundamental goal is to acquire relevant images that capture semantic similarities between scenes, resulting in improved retrieval performance with adequate implementation with compact PCs such as a Raspberry Pi. Metric learning is applied to determine the metrics of image similarity based on the feature extracted by a neural network. Based mainly on the quality of feature extracted, the proposed NNS architecture will select the best neural model using an evaluation criteria, to calculate the selection score. This formula considers accuracy, precision, recall, device, power consumption, and response time, evaluating the model’s computation efficiency. In the experiment, an autoencoder on the neural network selection architecture is tested using an ECG dataset with 5,000 data points. The Mean Squared Error (MSE) is used to quantify the autoencoder’s performance. A suitable MSE threshold is established, and data points with errors over the threshold are considered anomalies, while those below the threshold are considered normal. Performance on this anomaly detection results in accuracy worth 0.942, precision worth 0.994, and recall worth 0.901. Neural network selection can also apply these values in addition to MSE to calculate the selection score for selecting the best neural model.
Nurses play a crucial role in healthcare, directly influencing the quality of patient care. Facing a global nursing shortage, there is an urgent need for strategies to enhance nursing efficiency and care quality. This foundational study explores an NLP-based approach to determine NANDA nursing diagnoses, leveraging both subjective and objective patient data recorded by nurses. Employing text data similarity analysis and a prototype predictive model, our research aims to refine the nursing assessment process and pave the way for the potential automation of nursing diagnoses. This work highlights the potential of AI to support nursing practices and sets a platform for future research to fully realize AI’s benefits in addressing the challenges posed by the nursing shortage.