Ebook: Information Modelling and Knowledge Bases XXVIII
Information modelling and knowledge bases are now essential, not only to academics working in computer science, but also wherever information technology is applied.
This book presents papers from the 26th International Conference on Information Modelling and Knowledge Bases (formerly the European Japanese Conference – EJC), which took place in Tampere, Finland, in June 2016. The conference provides a platform to bring together researchers and practitioners working with information modelling and knowledge bases, and the 33 accepted papers cover topics including: conceptual modelling; knowledge and information modelling and discovery; linguistic modelling; cross-cultural communication and social computing; environmental modelling and engineering; and multimedia data modelling and systems. All papers were improved and resubmitted for publication after the conference.
Covering state-of- the-art research and practice, the book will be of interest to all those whose work involves information modelling and knowledge bases.
In the last three decades information modelling and knowledge bases have become essential and important subjects, not only in academic communities related to information systems and computer science, but also in the business area where information technology is applied.
The series of International Conference on Information Modelling and Knowledge Bases (EJC) originally started as a co-operation initiative between Japan and Finland in 1982. The practical operations were then organized by professor Ohsuga in Japan and professors Hannu Kangassalo and Hannu Jaakkola in Finland (Nordic countries). The geographical scope expanded, first to cover Europe, and recently also other countries. Because of this, “European Japanese” was replaced by “International” in the title of the conference in 2014. The workshop characteristics – discussion, time for presentations and a limited number of participants – is typical of the conference.
The 26th International Conference on Information Modelling and Knowledge Bases (EJC 2016) constitutes a world-wide research forum for the exchange of scientific results and experiences. In this way, a platform has been established which brings together researchers as well as practitioners in information modelling and knowledge bases. The main topics of EJC conferences cover a variety of themes: Conceptual modelling; Knowledge and information modelling and discovery; Linguistic modelling; Cross-cultural communication and social computing; Environmental modelling and engineering; Multimedia data modelling and systems.
The Program Committee accepted 33 papers to be published in this book. The papers were evaluated by the international panel of reviewers (Program Committee). In the conference program, these were organized in the sessions shown in Fig. 1. In accordance with the conference principles, all papers were presented at the conference and both short and full papers were accepted for publication in this volume. All papers had to be both improved and resubmitted after the conference; so it is the final set of selected conference papers in their improved form which appear in this book.
We thank all colleagues for their support to the conference arrangements, especially the reviewers, the program committee, the organizing committee, and the program coordination team. Finally, the participants make the conference – thanks to all of them. We would also like to offer our special thanks to the Federation of Finnish Learned Societies for their economic support in the organizing costs of the conference and to Huawei Technologies Oy (Finland) Co. Ltd. for covering the printing costs of this book.
In conclusion, we would like to add a few words about the conference itself. It was held in Tampere on the premises of Holiday Club Tampereen Kylpylä. After an extensive renovation and modernization, the premises – having a long industrial history – were opened as a Spa hotel at the lakeside near the center of the City of Tampere. Tampereen Puuvillatehdas started its operation – the spinning mill and the textile factory – in 1899. It built its first factory on the Lapinniemi cape of Näsijärvi Lake. In 1934, it was bought by and merged with Tampella Ltd., which moved all its textile industry to Lapinniemi in 1977. The factories were closed in the middle of 1980s in the era when the textile industry became globalized and left Finland.
We had 49 participants at the conference. In addition to the papers accepted in the review process there were 4 top lectures given by invited speakers – 3 (Professors Marja-Leena Linne, Moncef Gabbouj and Hannu Kangassalo) of them from Finland and 1 (Professor Hideyuki Tokuda) from Japan. Our special thanks to them for their contribution to the successful program.
The Editors
Hannu Jaakkola
Bernhard Thalheim
Yasushi Kiyoki
Naofumi Yoshida
Software development projects have increasingly been adopting new practices, such as continuous delivery and deployment to enable rapid delivery of new features to end users. Tools that are commonly utilized with these practices generate a vast amount of data concerning various development events. Analysis of the data provides a lightweight data driven view on the software process. We present an efficient way of visualizing software process data to provide a good overall view on the features and potential problems of the process. We use the visualization in a case project that has become more agile by applying continuous integration and delivery together with development and infrastructure automation. We compare data visualizations with information gathered from the development team and describe how the evolution can be understood through our visualizations. The case project is a good example of how a change from a traditional long cycle development to a rapid cycle DevOps culture can actually be made in a few years. However, the results show that the team has to focus on the process improvement continuously in order to maintain continuous delivery all the time. As the main contribution, we present a lightweight way to software process visualization. Moreover, we discuss how such a heuristic can be used to track the characteristics of the target process.
Each day new applications are developed and extra features are added to the existing ones. This means establishment of new program code or introducing modifications to the existing code. To ease up new modifications, software design should be constantly improved to cope with the vast changes added. The quality of software architecture plays an important role, it determines the time to market new features, maintainability and extensibility of applications and readability of the source code. Sustaining overall quality of architecture requires refactoring phases for the design to cope with new changes, which also restores source code readability and extensibility. More experienced developers conduct code reviews in different phases of application development to pin point places that need improvements. Several code refactoring steps can be applied to source code, to recover maintainability and extensibility of it. Refactoring is playing an important role, not only in software development, but also in other fields. In this article the authors give an overview, what can be achieved by refactoring and point out some success stories. Several aspects of refactoring process are studied, and a refactoring strategy proposed.
Using of Ontology Design Patterns (ODPs) become useful for development and reengineering ontologies. ODPs represent encodings of best practices supporting ontology construction by facilitating reuse of proven solution principles. In this paper, we focus on Content ODPs (CDPs), which represent small ontology fragments that encode general use cases (e.g. participation in event, role playing, parts of object.). Content CDPs are used as building blocks during ontology development. In such cases they could be specializes, extends, integrates by user to obtain new composite CDP which would allow to provide more expressive representation of domain concept in the ontology being developed. But it may demands additional skills from the user. Therefore in this paper the automate selection of a CDP combination and subsequent synthesis of new composite CDP is considered.
Widely spread breast cancer takes patients to an early grave. Early detection and the ability to predict the effectiveness of treatments are among the means used to fight various malignant cancers. Data federation from multiple data sources is needed for data analytics. Granularity, internality, structure and other characteristics often differ in federated data. We discuss alternative approaches to data federation and their theoretical basis, especially the stance regarding the ontology of data. We also develop an artifact to federate cancer data at a university hospital. Our approach and the artifact improved data interoperability, i.e., data federation. We suggest the same to be repeated in other data federation contexts.
The multi-dimensional analysis is a promising approach to a new interpreting of environments by ground of the value-information and language-information on intellectual activities in various environment meanings to society. This paper presents a new analysis-system with semantic computing for environments in water-quality areas by integrating the fundamental important parameters of water-quality for creating the new meaning to society. The multi-water-parameter-analysis in a multi-dimensional space is important for current research issues in some water-quality research fields, which are based on the values and meanings of each parameter for obtaining the meaningful words in the category of agriculture, aquatic life, fish, drinking, industrial and irrigation. The multi-dimensional semantic space is significantly utilized for various interpretations related to the water-quality.
The concept of context-aware personalization enables implicit detection of user context to achieve personalization of services. Coupled with pervasive technologies such as augmented reality, this concept forms a view into the future of computing. This research argues that it is essential to understand the needs of the human-being using the technology. Hence, the goal of this research is to provide a description of the relationship between human needs and context information, and its role in the personalization of applications. This paper answers the research question: “How to enhance personalization using context awareness in applications for a better pervasive experience?” The research work resulted in the development of a user-centric model that embodies the concepts of context awareness and needs prediction. The proposed model is applicable in pervasive technology aiming to provide information or services with minimum attention from users to attain a high satisfaction level. The pragmatic values of this work is aimed in the fields of ambient assisted living, healthcare, entertainment, and advertisement.
Artificial neural network is a common method which has been used in many works. This paper presents a comparison of prediction methods for alum dosage using in water supply treatment process. In this research, we compared results from M5P, M5Rules and REPTree to the results from multilayer perceptron, one type of artificial neural network. Six input variables, i.e. turbidity, alkalinity, pH, conductivity, color and suspended solids relating to reaction of coagulation were used. The data in this research had been collected from Bangkhen Branch Office of Metropolitan Waterworks Authority, Thailand from 1 January 2006 – 31 July 2015. Our experimental results showed that the M5Rules method yielded the highest accuracy to predict Alum dosage comparing to other methods run in this study. For M5P and M5Rules, building model by using smoothing procedure and unpruned technique appears to give out the best model to predict with the highest accuracy.
Dengue fever is a communicable disease that attacks more than 120 countries in the world during 50 years. Therefore, it is to make sense to say that collaboration among the countries, especially neighborhood countries, is one important key to combat the dengue. Currently, except a serological collaboration, collaboration in dengue is sporadic and temporal. This paper addresses the initiative to build vector-control strategy collaborative among Surabaya (Indonesia), Kuala Lumpur (Malaysia), and Bangkok (Thailand). Deriving the global policy from World Health Organization (WHO), we build the system that (1) extracting global feature from the local feature, (2) selecting the significant features, to determine ranking of importance of a feature, by weighting a feature, and (3) matching the pattern of data to the suitable strategy by measuring the similarity. We built the system from the real data of the Surabaya, Kuala Lumpur and Bangkok in 2012. We verified reliability of the system by comparing the data with the actual action in January 2012 The result shows that the system is system feasible to be implemented, however we still need more preparation to implement the system.
In the design of multimedia data mining systems, one of the most important issues is how to search and analyze media data, according to contexts. We have introduced a semantic associative search method based on our “Mathematical Model of Meaning (MMM) [1, 2, 3]”. This model is applied to compute semantic correlations between keywords, images, music and documents dynamically in a context-dependent way.
We have constructed “A Multimedia Data Mining System for International and Collaborative Research in Global Environmental Analysis,” as a new platform of a multimedia data mining environment between our research team and international organizations. This environment is constructed by creating the following subsystems: (1) Multimedia Data Mining System with semantic associative-search functions and (2) 5D Space Sharing and Collaboration System for cooperative creation and manipulation of multimedia objects.
It is very important to memorize those situations and compute environment change in various aspects and contexts, in order to discover what are happening in the nature of our planet. We have various (almost infinite) aspects and contexts in environmental changes in our planet, and it is essential to realize a new analyzer for computing differences in those situations for discovering actual aspects and contexts existing in the nature. We propose a new method for Differential Computing in our Multi-dimensional World map [4, 5, 6]. We utilize a multi-dimensional computing model, the Mathematical Model of Meaning (MMM), and a multi-dimensional space filtering method with, adaptive axis adjustment mechanism to implement differential computing. Computing environmental changes in multi-aspects and contexts using differential computing, important factors that change natural environment are highlighted. We also present a method to visualize the highlighted factors using our Multi-dimensional World Map.
Semantic computing is an important and promising approach to semantic analysis for various environmental phenomena and changes in real world. This paper presents a new semantic computing method with multi-spectral images for analyzing and interpreting environmental phenomena and changes occurring in the physical world.
We have presented a concept of “Semantic Computing System” for realizing global environmental analysis. This paper presents a new semantic computing method to realize semantic associative search for the multiple-colours-spectral images in the multi-dimensional semantic space, that is “multi-spectral semantic-image space” consisting of (a) Infra-Red filtered axis, (b) Red axis, (c) Green filtered axis, (d) Blue filtered axis, (e) NDVI axis, and (f) NDWI axis, with semantic projection functions. This space is created for dynamically computing semantic equivalence, similarity and difference between multi-spectral images and environmental situations.
The most essential and significant point of our “multispectral-semantic computing method” is that it realizes “the interpretation of substances (materials)” appearing and reflected in the multi-spectrum images by using “6-dimensional multi-spectral semantic-image space” and “semantic projection functions”. That is, this method interprets the substances appearing in the image into “the names of substances” by using “knowledge of substances” expressed in this semantic-image space. This is corresponding to the human-level interpretation when we look at an image and recognize the substances appearing in the image. This method realizes this human-level interpretation with “multi-spectral semantic-image space” and “semantic projection functions”.
We apply this system to global environmental analysis as a new platform of environmental computing. We have already presented the 5D World Map System, as an international research environment with spatio-temporal and semantic analysers. We also present several new approaches to global environmental-analysis for multi-spectrum images in “multi-spectral semantic-image space.”
This article is aimed at indicating correlation between climatic changes and atmospheric moisture and precipitation using GPS precipitable water vapor values (PWV) and meteorological data in Khon Kaen, Thailand. PWV, average temperature, and precipitation data from 2001 to 2014 were analyzed to determine the changes over the time period. The estimation showed the average, maximum and minimum values of PWV in Khon Kaen at 48.42, 69.88, and 11.23 mm, respectively, with the standard deviation of 13.42 mm. Additionally, there was an increasing trend of PWV changes following temperature changes which could be due to the warm atmospheric properties that can hold vapor better than dry atmosphere. Again, at high temperatures, water in the environment vaporizes more easily than at low temperatures. However, precipitation tends to decrease which could be due to topographical condition of Khon Kaen which is on a high plain surrounded by mountains. As a result, monsoon wind is not able to bring moisture into the area. Therefore, the slightly increasing moisture cannot be a major cause of precipitation similar to a storm.
This paper aims to present the improved techniques to classify the user's feedbacks on hotel service qualities. The data were mainly collected from online feedback sources by PHP program. The training set was manually tagged as: NEGATIVE, POSITIVE, and NEUTRAL. In total, 2969 Vietnamese language terms were successfully collected. In the first part, the common machine learning techniques like K-Nearest Neighbor algorithm (KNN), Decision Tree, Naive Bayes (NB) and Support Vector Machines (SVM) were applying for classification. In the second part, we enhanced the efficiency of the text categorization by applying feature selection techniques, χ2 (CHI). At the end of the paper, we concluded that the overall performance of general machine learning techniques was significantly improved by applying feature selection.
In this investigation, we propose how to accelerate Q-learning which is one of the most successful reinforcement learning methods using mirror images for hunting problems. Mirror images have symmetric differences of views, and they allow us to accelerate Q-learning dramatically. In this investigation, we show Q-learning goes 2 times faster (one mirror) or 3 times faster (2 mirrors) but a capturing ability decreased slightly even if intervention happens. Moreover, we prove that the new approach gets to the convergence if the one with no mirror does.
The importance of information in our daily life is increasing rapidly. Simultaneously the availability of information from different sources has grown exponentially. The progress in data oriented context has grown from traditional approach based on queries to databases to the beneficial use of wide variety of openly available, quite often also non-structured data sources. The complexity of data needs has also increased and solutions are based on combined multi-query results. The development has also taken us towards global context, in which data is used over geographical and cultural borders. Information search is a communication oriented task, in which the cultures of users and meet the culture related aspects of data repositories. A mismatch between the users' national culture based expectations to the behavior of global information services (information system and their user interfaces) is the source of a variety of problems. In our paper we analyse the characteristics of information search and the cultural aspects guiding the behavior of the users of information systems. These two approaches are merged in the form of query-answer profiles. The purpose of the paper is to find guidelines possible to generalize and apply by the developers and users for survival in global information system context.
A sociotechnical system is a complex inter-relationship of people and technology, including hardware, software, data, physical and virtual surroundings, people, procedures, laws and regulations. An e-Education environment is a particularly complex example of a sociotechnical system that requires equal support for user needs and technological innovations. The challenge for e-Education environment development is that in addition to the producers, users, domain experts and software developers, pedagogical experts are also key stakeholders. In our paper, we discuss different meta-aspects and components of modelling e-Education ecosystems in multicultural contexts.
Provenance tackles problems as: provides information to a user about his data that are given to him or passed by him; informs a user in which way a foreign user has used his data; allows rollback mechanisms on wrong data; observes the currentness of data for closed contracts and creates a history of data for a user. We show that with a sophisticated metadata support these problems can be solved.
Data warehousing is a process of integrating multiple data sources into one for, e.g., reporting purposes. An emerging modeling technique for this is the data vault method. The use of data vault creates many structurally similar data processing modifications in the transform phase of ETL work. Is it possible to automate the creation of transformations? Based on our study, the answer is mostly affirmative. Data vault modeling creates certain constraints to data warehouse entities. These model constraints and data vault table populating principles can be used to generate transformation code. Based on the original relational database model and data flow metadata we can gather populating principles. These can then be used to create general templates for each entity. Nevertheless, we need to note that the use of data flow metadata can be only partially automated and includes the only manual work phases in the process. In the end we can generate the actual transformation code automatically. In this paper, we carefully describe the creation of automation procedure and analyze the practical problems based on our experiences on PL/SQL proof of concept implementation. To the best of our knowledge, similar has not yet been described in the scientific literature.
Effort overruns is common problem in software development. Our main intention is to support estimation by method for classification of use cases. The goal of this paper is to evaluate usage of the feed-forward neural network for the Use Case classification purposes. Experimental results show that the feed-forward neural network classifier, using softmax activation function in the output layer and hyperbolic tangent activation function in the hidden layer, offers the best classification performance.
Google Earth with high-resolution imagery basically takes months to process new images before online updates. It is considered as a time consuming and slow process especially for post-disaster application. In this study, we aim to develop a fast and accurate method of updating maps by detecting local differences occurred over different time series; where only region with differences will be updated. In our system, aerial imageries from Massachusetts's building open datasets are used as training datasets; meanwhile Saitama district datasets are used as input images. Semantic segmentation is then applied to input images to get predicted map patches of building. Semantic segmentation is a pixel-wise classification of images by implementing convolutional neural network technique. Convolutional neural network technique is implemented due to being not only efficient in learning highly discriminative image features such as buildings, but also partially robust to incomplete and poorly registered target maps. Next, in order to understand overall changes occurred in an area, both semantic segmented images from the same scene are undergone change detection method. Lastly, difference extraction method is implemented to specify the category of building changes. The results reveal that our proposed method is able to overcome current time-consuming map updating problem. Hence map updating will be cheaper, faster and more effective especially post-disaster application, by leaving unchanged region and only updating changed region.
Human-microbiome-relations extraction is important for analyzing the effects on human gut microbiome from the difference of human attributes such as country, sex, age and so on. Human gut microbiome, a set of bacteria, provides various pathological and biological impacts on a hosting human body system. This paper presents a new analytical method for data resources that are difficult to understand such as human gut microbiome, by extracting the unknown relations with other adjunct metadata (e.g. human attributes data) with context-dependent clustering and semantic analysis. This method realizes the significant bacterial components acquisition for categorizing human attributes. The most important feature of our method is to analyze the unknown relations of human-microbiome with or without a correlation between a human attribute and bacteria that is found by related studies in bacteriology. With this method, an analyst is able to grasp the overview of bacteria data clustered by several clustering algorithms (k-means clustering / hierarchical clustering) using bacteria data selected by human attributes as a set of context. In addition, even without an association between a human attribute and bacteria as heuristic knowledge, an analyst is able to extract human-microbiome-relations focusing on a number of bacteria selected from all bacteria combinations by one-way analysis of variance (ANOVA) and our original criteria called the “degree of separation” of clustering. This paper also presents an experimental study about human-microbiome-relations extraction and the experimental results that show the feasibility and effectiveness of this method.
The power law is predominantly describing the ways humans learn, especially in psychophysics, in skill acquisition, and in retention. Yet a few researchers claim that this law is applicable only on the aggregate level and that exponential law should be considered when describing a single learning process. The question which law should be used on aggregate or single learner level has not yet been answered in the artificial learning community. This work is shedding some light towards the answers. We conducted an experiment with three artificial learners using 109 training cases. The statistical tests have shown that power law and exponential law are describing the learning curves equally well. However, in quite many cases neither of laws is applicable. Additionally, there are significant differences among artificial learners.
Music plays an important role in the human's life. It is not only a set of sounds – music evokes emotions subjectively perceived by listeners. The growing amount of audio data wakes up a need for content-based searching. Traditionally, tunes information has been retrieved based on a reference information, for example, the title of a tune, the name of an artist, the genre and so on. When users would like to try to find music pieces in a specific mood such standard reference information of the tunes is not sufficiently effective. We need new methods and approaches to realize emotion-based search and tune content analysis. This paper proposes a new music-tune analysis approach to realize automatic emotion recognition by means of essential musical features. The innovativeness of this research is that it uses new musical features for tune's analysis, which are based on human's perception of the music. Most important distinction of the proposed approach is that it includes broader range of tunes genres, which is very significant for music emotion recognition system. Emotion description on continuous plane instead of categories results in more supported adjectives for emotion description which is also a great advantage.
Music players and cloud solution for music recommendation and automatic playlist creation are becoming increasingly more popular, as they intent to overcome the issue of the difficulty for users to find fitting music, based on context, mood and impression. Much research on the topic has been conducted, which has recommended different approaches to overcome this problem. This paper suggests a system which uses a multi-dimensional vector space, based on the music's key elements, as well as the mood expressed through them and the song lyrics, which allows for difference and similarity finding to automatically generate a contextually meaningful playlist.
Multispectral image becomes widely used for environmental analysis to detect an object or phenomena that human eyes cannot capture. One of the main type of images acquired by remote sensing such as satellite or aircraft for earth observation. This paper presents a multispectral analysis for aerial images that captured by dual cameras (visible and infrared camera), which are mounted on an unmanned autonomous vehicle (UAV) or Drone. In our experiments, four spectral bands (three visible and one infrared band) were imaged, processed and analyzed to detect agricultural area and measure the health of vegetation. To interpret environmental phenomena and realize an environmental analysis, this study applies semantic analysis by creating a multispectral semantic image space, combined with three numerical indicators (the normalized difference vegetation index (NDVI), the normalized difference water index (NDWI) and the soil adjusted vegetation index (SAVI)) that can be used to analyze plant health, photosynthetic activity and detect environmental object to determine an agricultural area. This paper also proposed the concept of multi-spectrum semantic-image space for agricultural monitoring by defining the correlation meaning from multi-dimensional parameters which related to agricultural analysis to realize and explain agriculture conditions. This paper presents the experimental study on a rice field, a cornfield, a salt farm and a coconut farm in Thailand.
The services and products are provided to the community through crowdsourcing platforms where crowds are of different backgrounds, nationalities, languages, religions, and education levels. These differences affect the results of services and products that are developed using crowdsource platforms. This study presents the way of integration of cultural factors into the design features of crowdsource product designing. The research presents the activities of crowdsource product designing, and the cultural factors to derive the theoretical foundation for formulating the cultural factors for crowdsource product designing. Also, the study derives methods that are used for integrating the cultural factors into crowdsourcing product design platforms. Also, it illustrates the design of the user interface of crowdsourcing product design platforms taking cultural factors into consideration. The research is validated by prototyping and conducting test cases. Finally, it presents a discussion of the research results and explains the effect of cultural factors on crowdsource product designing. Thus confirms the necessity of designing platform activities integrating the cultural factors in order to satisfy the crowdsource product design users' needs.