Ebook: Information Modelling and Knowledge Bases XXVI
Within the last three decades, information modelling and knowledge bases have become essential subjects, not only for academic communities related to information systems and computer science, but also for businesses where information technology is applied.
This book presents the proceedings of EJC 2014, the 24th International Conference on Information Modelling and Knowledge Bases, held in Kiel, Germany, in June 2014. The main themes of the conference were: conceptual modelling, including modelling and specification languages, domain specific conceptual modelling, and validating and communicating conceptual models; knowledge and information modelling and discovery, including knowledge representation and knowledge management, advanced data mining and analysis methods, as well as information recognition and information modelling; linguistics modelling; cross-cultural communication and social computing; environmental modelling; and multimedia data modelling and systems, which includes modelling multimedia information and knowledge, content-based multimedia data management, content-based multimedia retrieval as well as privacy and context enhancing technologies.
This book will be of interest to all those who wish to keep abreast of new developments in the field of information modelling and knowledge bases.
In the last three decades information modelling and knowledge bases have become essentially important subjects not only in academic communities related to information systems and computer science but also in the business area where information technology is applied.
The series of International Conference on Information Modelling and Knowledge Bases (EJC) originally started as a co-operation initiative between Japan and Finland in 1982. The practical operations were then organized by professor Ohsuga in Japan and professors Hannu Kangassalo and Hannu Jaakkola in Finland (Nordic countries). Geographical scope has expanded to cover Europe and recently also other countries. Because of that “European Japanese” in the title of the conference was replaced by “International” in 2014. Workshop characteristic – discussion, enough time for presentations and limited number of participants (50) / papers (30) – is typical for the conference.
The main topics of EJC conferences target a variety of themes. First theme is conceptual modelling including modelling and specification languages, domain-specific conceptual modelling, concepts, concept theories and ontologies, conceptual modelling of large and heterogeneous systems, conceptual modelling of spatial, temporal and biological data and methods for developing, validating and communicating conceptual models.
An important subject is also knowledge and information modelling and discovery covering knowledge discovery, knowledge representation and knowledge management, advanced data mining and analysis methods, conceptions of knowledge and information, modelling information requirements, intelligent information systems as well as information recognition and information modelling.
One of the main themes is linguistic modelling with models of HCI, information delivery to users, intelligent informal querying, linguistic foundation of information and knowledge, fuzzy linguistic models and philosophical and linguistic foundations of conceptual models.
Much attention is also paid to cross-cultural communication and social computing including cross-cultural support systems, integration, evolution and migration of systems, collaborative societies, multicultural web-based software systems, intercultural collaboration and support systems, moreover, social computing, behavioral modeling and prediction.
One of the latest topics is environmental modelling and engineering covering environmental information systems (architecture), spatial, temporal and observational information systems, large-scale environmental systems, collaborative knowledge base systems, agent concepts and conceptualization along with hazard prediction, prevention and steering systems.
Current theme is also multimedia data modelling and systems with modelling multimedia information and knowledge, content-based multimedia data management, content-based multimedia retrieval, privacy and context enhancing technologies, semantics and pragmatics of multimedia data as well as metadata for multimedia information systems.
Overall we received 56 submissions. After careful evaluation, 16 papers have been selected as long paper, 17 papers as short papers, 5 papers as position papers, and 3 papers for presentation of perspective challenges. The long and short papers are included in the book. One additional paper reports the results of a panel session of the conference.
We thank all colleagues for their support of this issue of the EJC conference, especially the program committee, the organizing committee, and the program coordination team. The long and the short papers presented in the conference are revised after the conference and published in the Series of “Frontiers in Artificial Intelligence” by IOS Press (Amsterdam). The books “Information Modelling and Knowledge Bases” are edited by the Editing Committee of the conference.
We believe that the conference will be productive and fruitful in the advance of research and application of information modelling and knowledge bases.
The Editors
Bernhard Thalheim
Hannu Jaakkola
Yasushi Kiyoki
Naofumi Yoshida
Database research and practice has brought up a large body of knowledge and experience. This experience and knowledge is based on solutions for database structures that occur and reoccur in many applications in a similar form. We may distinguish two classes of such solutions: reference models that can be used as a blueprint for a fully fledged schema and stereotypes that are general solutions.
Pattern research considers structures at various levels of detail and is often limited to small schemata. Moreover, the abstraction level varies. We thus need a systematisation. This paper introduces stereotypes as general solutions to problems in a certain context, pattern as classes of refinements of such stereotypes, and templates as technology dependent solutions to problems in the given context.
We develop a general methodology and a number of techniques for stereotypes, pattern and templates. The paper considers structuring which is typically the starting point for development of database systems.
Domain-specific web applications often need to integrate information from schematically heterogeneous sources that share some semantic similarities. These applications often include application widgets—where each widget may address a (potentially small) subset of the local schema. We seek to provide flexible integration where each widget may use its own “global” schema and use its own mapping to each local schema. It is possible for each such “global” schema to be mapped multiple times, in different ways, to a given local schema. Traditional information integration is too rigid to meet these requirements. Here, we define a new integration model that introduces a metamodel of small domain-specific schema fragments—called domain structures—that can be mapped to local schemas. We show how generic, polymorphic widgets can be created by writing queries against domain structures using an extended relational algebra that includes a local type operator to propagate local type names to the domain structures. By bringing the local semantics to the global level we create an integration system with local dominance where important, distinct local schema semantics are available globally.
Background: A general, restrictions-free theory on performance of arbitrary artificial learners has not been developed yet. Empirically, not much research has been per-formed on the question of an appropriate description of artificial learner's performance.
Objective: The objective of this paper is to find out which mathematical description fits best learning curves produced by a neural network classification algorithm.
Methods: A Weka-based multilayer perceptron (MLP) neural network classification algorithm was applied to a set of datasets (n=109) from publicly available repositories (UCI) in step wise k-fold cross-validation and an error rate was measured in each step. First, four different functions, i.e. power, linear, logarithmic, exponential, were fit to the measured error rates. Where the fit was statistically significant (n=69), we measured the average mean squared error rate for each function and its rank. The dependent samples T-test was performed to test whether the differences between mean squared error rates are significantly different from each other, and Wilcoxon's signed rank test was used to test whether the differences between ranks are significant.
Results: The error rates, induced by a neural network, were best modeled by an exponential function. In a total of 69 datasets, exponential function was a better descriptor of error rate function in 60 of 69 cases, power was best in 8, logarithmic in 1, and linear in none out of 69 cases. Average mean squared error across all datasets was 0,000365 for exponential function, and was significantly different at P=0,002 from power, at P=0,000 from linear and at P=0,001 from logarithmic function. The exponential function's rank is, using Wilcoxon's test, significantly different at any reasonable threshold (P=0,000) from the rank of any other model.
Conclusion: In the area of human cognitive performance the exponential function was found to be the best fit for a description of an individual learner. In the area of artificial learners, specifically the multilayer perceptron, our findings are consistent with the mentioned. Our work can be used to forecast and model the future performance of a MLP neural network when not all data have been used or there is a need to obtain more data for better accuracy.
The MilUNI knowledge portal, based on the knowledge base developed in ATOM software has been created at the authors' workplace with the aim to form a collaborative society of military universities. The analysis of the collaborative society concept is presented. The description of the MilUNI project is included. Some areas for university cooperation are proposed, as well as the measures facilitating the formation and development of the collaborative society.
Cloud computing is changing the utilization environments of computers in both enterprise and personal. Application development techniques and methodologies that are suitable to cloud environments are new challenging research topics. As demand for developing business applications is increasing rapidly and commercial profitability is dependent on decreasing the application development costs, it is essentially important to provide methods meeting the requirements of developing applications with low cost and short development time. Therefore, it is required to develop new methods facilitate the cloud application development based on easy-to-accomplish and end-user-composition. In this work, we compiled seven requirements of typical business application development such as data structure, database schema, page transition control, authorization, session management, programming, and input and output interface design. Furthermore we observe that none of current cloud development environments support a majority of these requested features. As a result, we present our own cloud application development platform, called FOCAPLAS that meets all of these requirements. A case study presenting a cloud application developing process is presented demonstrate how to use our FOCAPLAS. We believe that our requirements may serve as a valuable guide for cloud data modeling and our FOCAPLAS will be a useful platform for cloud application development.
Background: Measuring the performance of a classifier is crucial when trying to find the best machine-learning algorithm with optimal parameters. Multiple methods are available in this regard and the more common is the k-fold cross-validation. Other similar methods are the bootstrap method and the k-fold repeated cross-validation.
Objective: This paper compares three such methods, namely the k-fold cross-validation, the k-fold repeated cross-validation, and the bootstrap method. The latter two have regarding our experimental set-up a 20-fold increase in computational effort. The objective of this paper was to experimentally find the best cross-validation method regarding both its accuracy and its computational effort.
Methods: Four classification algorithms were selected and applied on multiple datasets within the field of Life Sciences (n=35) using all three selected cross-validation methods. We used the pairwise dependent Student's T-Test with the standard 95% confidence interval for statistical comparisons.
Results: The results of the statistical comparisons between the cross-validation methods were as follows. Despite 20 times less computational effort, the k-fold cross-validation method was statistically considered equal to the k-fold repeated cross-validation. The third method, the bootstrap method, was considered to be too pessimistic and therefore inferior to the other two selected methods.
Conclusion: The k-fold cross-validation was proved to be the best choice between the selected cross-validation methods both regarding its accuracy and its computational effort.
We design and implement a music-tune analysis system to realize automatic emotion identification and prediction based on acoustic signal data. To compute physical elements of music pieces we define three significant tunes parameters. These are: repeated parts or repetitions inside a tune, thumbnail of a music piece, and homogeneity pattern of a tune. They are significant, because they are related to how people perceive music pieces. By means of these three parameters we can express the essential features of emotional-aspects of each piece. Our system consists of music-tune features database and computational mechanism for comparison between different tunes. Based on Hevner's emotions adjectives groups we created a new way of emotion presentation on emotion's plane with two axes: activity and happiness. That makes it possible to determine perceived emotions of listening to a tune and calculate adjacent emotions on a plane. Finally, we performed a set of experiments on western classical and popular music pieces, which presented that our proposed approach reached 72% precision ratio and show a positive trend of system's efficiency when database size is increasing.
After the East Japan Great Earthquake occurred in Japan on 11th March 2011, a large number of messages related to the earthquake were posted to social media web sites not only in Japan but also all over the world. Particularly, in Asian countries, a lot of topics concerned about the earthquake were observed. Exploring topics related to the earthquake on social media gains rich insights into the social contexts. The goal of this research is to analyze Asian people reactions to the East Japan Great Earthquake on social media. As the first target, this paper selected Thai language and analyzed how people reacted to the earthquake by comparing with reactions in Japan. This analysis also presented some characteristics of Thai society and culture.
We have put into practice the flipped classroom as a way of utilizing e-learning contents in our computer-programming course since 2013. The main feature of this practice is to use most of the time usually dedicated to lecture for practicing by assigning the students the e-learning materials as preparation before the class. We gave the students homework to learn vocabularies and grammar of the C programming language. This decreased the time a teacher spent lecturing and the students were assigned applied-problems to make practical software in addition to conventional basic problems in training. Our goal is to maintain the students' motivation to learn the computer programming through the sense of accomplishment that each student obtains by finishing practical assignments in the training. We confirmed the effectiveness of this approach by comparing examination scores between last year and this year, and putting questionnaires to the students. Additionally, we analyzed the learning situation of the students who were week in programming. The results are shown in this paper.
There exist a lot of studies about time, its interpretations, different features and structures from several scientific points of view. In our paper, we propose a multidimensional framework of time. The main idea of the paper is to present a synthesis of different dimensions of time. We discuss some parts of the framework to illustrate and highlight the multidimensional features of time. We also demonstrate an early-stage implementation of the framework as a “Time on Wall” course in the eEducation/Optima environment. By means of the “Time on Wall”, we are able to teach different dimensions of time across disciplines and faculties and to illustrate different time scales.
Developments in Biology, Economy, Information Technology, Social studies etc have introduced and acknowledged the understanding that the whole world consists of Information Processing Systems (IPS). We understand rather well computations in man-made devices, especially in computers, and how these computations change the (logical) state of the world (pre- and post-conditions). But we do not have general model of computations which occur in natural IPS – living organisms and its subsystems, businesses, languages, social systems etc and how these computations change the environment where they proceed.
Here is proposed a unified view of computations occurring in different Information Processing Systems and clarified notions of Data and Information; the view is based on Entropy. While IPS (e.g. living systems) are capable reducing their inner entropy during their limited existence (none of them are immortal), the final result of their existence is increase of Entropy; Life is (possibly) the greatest factor increasing entropy on Earth.
This paper presents an explorative cultural-image analyzer and its application to comparative analyses of cultural arts and crafts. The goal of this system is to provide a new image-exploration environment that reflects the diversity of humans' sense of color and the breadth of cultural human knowledge by detecting and visualizing characteristic historical color-trends within cultural-image data sets. The primary components of this system are the two explorative analysis methods with feature estimation and evaluation of culture-dependent colors: (a) image-group exploration and (b) color exploration. The system visualizes the distinct differences among image groups aggregated by the attributes such as author, era, region, etc. and provides notable images for users through the image-group exploration method. In addition, the system visualizes the subtle differences of colors in images and provides a key to analyze cultural art works by the color-exploration method, with a zooming function for color distributions and cultural-color name estimation. By utilizing the existing annotations and attributes that are available for most images, the system analyzes the differences of colors among image-groups defined by statistical analysis and visualizes the representations of each image-group on an overview map. This system enables a user to analyze the characteristics of a collection of cultural art works by browsing the representative images of each image-group, exploring the specified culture-dependent colors with high accuracy, and observing subtle differences of colors among image-groups according to culture-dependent color names at a glance.
The paper develops an approach to information system adaptation supporting diversity and heterogeneity of users. The user perspective is handled from culture point of view. Cultures are layered and this structure can be used for starting with national cultures and deriving stereotypes. Culture is recognised as a multidimensional structure, in which national culture provides a basement for the behavioural variations of individuals in personal level. For information system development this layering covers three different perspectives: cultural stereotypes and user models, organisational models and technology models. We thus may develop a coherent information system perspective that supports adaptivity in multicultural information system applications.
When we introduce icon-based language into the context of requirements engineering, we must take into account that what users perceive as recognizable and usable depends on their background. In this paper, we argue that it is not possible to provide a single set of visual notations that appeal to all of stakeholders. Instead, we suggest an adaptable preference framework, which generates personalized notations that correspond to personal background. We present and evaluate icon-based language: a new kind of approach to requirements engineering work to explore its possibility and usability. In an initial evaluation of students residing in Finland, results reveal that users are able to recognize a group of icons fairly well. Our findings show that an icon-based language could probably be a positive means in improving awareness of requirements engineering as it tends to take advantages of icons which are intuitively understandable to represent traditional textual requirements.
In the last decades we have got used to software applications and services being everywhere and working for us. However sometimes they fail to work as desired. The situation we experience nowadays is often characterized as the second software crisis and it is caused by many root causes. One main root problem is an old one – it is an insufficient specification of requirements and processes to be executed in software development. Even though the computer science offers many specification methods, standards, generic software processes, best practices, and languages, the problem is still here. The research presented in this work proposes utilization of formal methods and knowledge bases as a key solution for software process capture and modeling; the goal of the research is to describe particular sub-processes of software development process and their optimization with formalisms like Web Ontology Language (OWL). We will propose and develop complex software process modeling methodology that would combine semi-formal and formal approaches with forward and reverse process engineering (process mining) that would be used easily as well-known semi-formal approaches.
Data quality becomes an important issue for scientific databases. It is currently managed insufficiently. General solutions for managing the quality data on the object level are represented insufficiently nowadays. These applications need, however, a management system that supports quality treatment over the entire lifespan. We develop an approach to quality management that can be integrated into a metadata management system. This approach is based on a separation of concern through database components, extended views and Information Containers.
A theoretical approach to determine innovative evaluation of model quality is proposed. This approach is based on two different mappings to design the same conceptual database model: the former is a vertical mapping composed of bottom-up steps. It starts from the specification of a database applications supported by a formal model and achieves a resulting model based on semantic data models. The latter is a horizontal mapping composed of successive model extensions. It starts from a graph of conceptual classes and achieves the resulting model of the previous mapping. Formulas for the quantitative/numerical evaluations of the models are introduced during the vertical mapping, whereas “formulas” for the qualitative/conceptual evaluations of models are introduced during the horizontal mapping. The quantitative evaluations express the costs of what has been specified/proven in terms of variable and constant cardinality. The qualitative/conceptual evaluations express the saving that you get for what has been implicitly specified/proved. The quality measure of the resulting model is given in terms of hidden classes. These provide indication about different aspects of the model quality.
An algorithm is introduced in concept theory to design concept structures related to object classes/categories supported by computer systems. Although concept theory has a formal background, these algorithms are not yet available. The approach is supported by a methodology which starts from algorithms of object decomposition proposed in computer science and reaches an algorithm of concept construction related to class/categories of objects.
In this work the ontologies visualization technology, focused first of all on simplification of getting knowledge from them by the expert is offered. For this purpose it is proposed to form for concepts of ontology special structures – cognitive frames. Each cognitive frame includes the build in a special way fragment of ontology and the visual image, corresponding to it. It is expected that showing cognitive frames for a concept during visualization instead of just showing any terms linked with it will be more useful for presenting of the concept's meaning. In this paper, we consider only the forming the content of cognitive frames based on common relationships from the upper-level ontologies such as “taxonomy”, “partonomy” and “dependence”. We also provide experiment evaluating the cognitive qualities of frames created for the concepts of application ontology.
In this paper we introduce the Learning Management System (LMS) eLogika that has been developed in our department for teaching mathematical logic. There were many reasons that led us to the decision to develop such a system, including inter alia a great amount of students enrolled for the courses on logic. Yet the most important reason was a specific character of logic education. As a result, the eLogika system is a web application that provides didactic material for courses on mathematical logic. Its main goal is an automatic test generation and computer-aided test evaluation based on a large database of logic tasks. The system makes it possible to adjust the level of particular tests according to students' knowledge level. To this end we developed a feedback module that makes use of statistics and data mining methods. The system can generate a large number of training as well as exam test variants for each common thematic topic. At the same time it provides effective semi-automatic methods of test rating and evaluation. In the paper we describe particular modules of eLogika with the focus on the modules of data mining and statistics.
In this paper, we design a new model for Big data analytics – data-driven axes creation model. In Big data environment, the one of the important technologies is a correlation measurement. We cannot define a protocol of measurement on Big data era, because there are many varieties of data. However, almost current data analytics and data mining method cannot apply to Big data environment, because the big data environment is opened assumption and we have to consider new methods for opened assumption. That is, we have to design a new data-driven axes creation model for correlation measurement method. Our proposed model creates axes for correlation measurement on Big data analytics. Specifically, this model infers in the Bayesian network and measures correlation in the coordinate axes. Therefore, this model maps the Bayesian network into measure correlation mutually. This model contributes to a paradigm shift of Big data analytics.
Multimedia retrieval task is faced with increasingly large datasets and variously changing preferences of users in every query. We realize that the high dimensional representation of physical data which previously challenges search algorithms now brings chances to cope with dynamic contexts. In this paper, we introduce a method of building a large-scale video frame retrieval environment with a fast search algorithm that handles user's dynamic contexts of querying by imagination and controlling response time. The search algorithm quickly finds an initial candidate, which has highest-match possibility, and then iteratively traverses along feature indexes to find other neighbor candidates until the input time bound is elapsed. The experimental studies based on the video frame retrieval system show the feasibility and effectiveness of our proposed search algorithm that can return results in a fraction of a second with a high success rate and small deviation to the expected ones. Moreover, its potential is clear that it can scale to large dataset while preserving its search performance.
Finding good users to follow in social network services such as Twitter is one way to communicate and share information about a topic of interest with other users efficiently. However, it is not easy to find such users due to a massive number of users. In this paper, we classify Twitter users according to tweet frequency, tweet content, communication style, and follow relation, then present a method for finding good users to follow based on the user classification. The method is incorporated with our previous work on Twitter user search, and we show that the search performance is improved by the method.
In this paper an axiomatic approach to the relational concept theory, denoted by RKC, is presented. This axiomatic approach is based on the intensional containment relation between the under relations of the given relational concept. Also, it is proposed that an algebraic model for RKC is a complete semilattice, where every relational concept as a principal ideal generated by it defines a Boolean algebra.