Ebook: Information Modelling and Knowledge Bases XXIII
Information modelling and knowledge bases have become hot topics, not only in academic communities concerned with information systems and computer science, but also wherever information technology is applied in the world of business. This book presents the proceedings of the 21st European-Japanese Conference on Information Modelling and Knowledge Bases (EJC 2011), held in Tallinn, Estonia, in June 2011. The EJC conferences provide a worldwide forum for researchers and practitioners in the field to exchange results and experiences achieved in computer science and related disciplines such as conceptual analysis, design and specification of information systems, multimedia information modelling, multimedia systems, software engineering, knowledge and process management, cross cultural communication and context modelling. Attention is also paid to theoretical disciplines including cognitive science, artificial intelligence, logic, linguistics and analytical philosophy. The selected papers (16 full papers, 9 short papers, 2 papers based on panel sessions and 2 on invited presentations), cover a wide range of topics, including database semantics, knowledge representation, software engineering, www information management, context-based information retrieval, ontology, image databases, temporal and spatial databases, document data management, process management, cultural modelling and many others. Covering many aspects of system modelling and optimization, this book will be of interest to all those working in the field of information modelling and knowledge bases.
Information modelling and knowledge bases have become hot topics not only in academic communities related to information systems and computer science but also in the business area where information technology is applied.
The 21st European-Japanese Conference on Information Modelling and Knowledge Bases (EJC2011) continues the series of events that originally started as a cooperation initiative between Japan and Finland already in the last half of the 1980's. In 1991 the geographical scope of these conferences expanded to cover the whole Europe and other countries as well.
The EJC conferences constitute a world-wide research forum for the exchange of scientific results and experiences achieved in computer science and related disciplines using innovative methods and approaches. A platform has been established drawing together researches and practitioners dealing with information modelling and knowledge bases. The main topics of EJC conferences target the variety of themes in information modelling, conceptual analysis, design and specification of information systems, multimedia information modelling, multimedia systems, ontology, software engineering, knowledge and process management, knowledge bases, cross-cultural communication and context modelling. We also aim at applying new progressive theories. To this end much attention is paid also to theoretical disciplines including cognitive science, artificial intelligence, logic, linguistics and analytical philosophy.
In order to achieve EJC targets, an international program committee selected 16 full papers and 9 short papers in a rigorous reviewing process from 36 submissions; additional 2 papers are based on panel sessions in the conference and 2 on invited presentations. The selected papers cover many areas of information modelling, theory of concepts, database semantics, knowledge representation, software engineering, WWW information management, context-based information retrieval, ontology, image databases, temporal and spatial databases, document data management, process management, cultural modelling and many others.
The conference could not be a success without a lot of effort of many people and organizations.
In the Program Committee, 29 reputable researchers devoted a lot of effort to the review process selecting the best papers and creating the EJC2010 program. We are very grateful to them. Professor Yasushi Kiyoki and Professor Takehiro Tokuda were acting as co-chairs of the program committee. Dr. Naofumi Yoshida in Program Coordination Team was managing the review process and the conference program. Dr. Jaak Henno and his colleagues were managing the conference venue and the local arrangement. Professor Hannu Jaakkola took care of general organizational things necessary for annually running the conference series and, moreover, of arranging the conference proceedings in the form of a book to be printed by IOS Press Amsterdam. We gratefully appreciate the efforts of all the supporters.
We believe that the conference will be productive and fruitful in the advance of research and application of information modelling and knowledge bases.
The Editors
Jaak Henno
Yasushi Kiyoki
Takehiro Tokuda
Hannu Jaakkola
Naofumi Yoshida
Designing a tool for data extraction from semi-structured and unstructured text, we are confronted with a problem that has largely been neglected by scholars so far: What if we need to find matches for several different patterns in a document and there are no keywords to support the search? And if so, what if the same section matches several different patterns or if matches in part overlap? How can we decide which one to pick? We suggest that this is an important problem in data extraction and propose a solution based on a token classification system and weighted finite-state automata.
This paper proposes a cross-cultural computing system that deals with multilingual analysis. This system focuses on a cultural aspect comparison that is based on linguistic basic elements. The most important task of our system is to realize a cross-cultural computation in the framework of correlation computation by using vectorized numeric data that express cultural aspects in some concepts and objects with regard to speech sounds.
The key technology of the system is a cross-cultural semantic distance computation in phonological-semantic metadata spaces that involve the phonological aspects of sound, syllabic and lexical composition features. The phonological-semantic metadata of multiple languages is extracted based on two main aspects of language: form and meaning. Form refers to speech sound, and meaning refers to the semantic of language.
We compare language units (or terms) with the same meaning from different cultures, focusing on the speech sound characteristics of the terms. The speech sound metadata are extracted from a term and separated based on the phonological aspects of sound, syllabic and lexical composition features. These metadata are converted into vectorized numeric data to create phonological-semantic vector spaces. By using these spaces, we conducted similarity and weighting computations to perform a comparative analysis of language-related metadata.
Our research goal is to perform a language similarity analysis through a term-based distance calculation in phone (sound) and meaning spaces, and to reconstruct an inheritance relationship among languages via agglomerative hierarchical clustering based on an inter-term distance calculation.
Our system clusters the phonological-semantic vector space and represents a 2D visualization of cultural differentiation to analyze further the interconnectedness across languages. In this paper, we perform our proposed cross-cultural computing system for an experimental purpose with linguistic data from 32 different Asian-Oceanic languages.
UML is a general-purpose visual modeling language that can be adapted by creating profiles based on its built-in extension mechanism. There exist UML profiles for modeling the logical design of SQL databases. The profiles allow developers to create model elements that correspond to SQL language elements. In case of many integrity constraints, developers have to manually specify the implementation of the constraint in a model. It complicates the creation of the models and the use of the models to generate final code that does not need elaboration. In this paper, we propose to extend UML SQL database design profiles with stereotypes that correspond to domain-independent database design patterns about integrity constraints to raise the abstraction level of models and facilitate generation of code that implements integrity constraints. We also present the results of an experiment about extending a UML SQL database design profile.
This paper points out that achievements in the field of multimedia analysis and retrieval represent an important opportunity for improvement of recommender system mechanisms. Online shopping systems use various recommender systems; however a study of different approaches has shown that they do not exploit the potential of information carried by multimedia product data for product recommendations. We demonstrate how this can be accomplished by a personalized recommender system framework that is based on a method of analysis of colour features of entity images. Colour-features are based on image colour histograms, psychological properties of colours and a learning mechanism. We have developed a service-oriented framework for a personalized recommender system that is based on incorporation of this method into a highly interactive business process model. The framework is designed in a generic way and can be applied to an arbitrary domain. It is based on service-oriented architecture in order to promote its flexibility and reuse, which is important when applying it to existing recommender system environments. An experimental study was performed for the domain of travel agency. The framework provides several important advantages, such as automatic creation of entity image meta-data which is based on colour-based image analysis and extraction of their semantic properties, user-interaction based learning, dynamic selection and presentation ordering of entity images, and feedback for creation of base image entity sets.
Pictorial symbols capture our imagination. A visual vocabulary - that anyone from any culture, any country, and in any context of life can understand - is a very interesting research challenge. The collections of visual symbols usually are context-specific, and include symbols and signs for example at airports, in hotels, in traffic and in maps. Our ongoing research work is related to pictorial symbols in our living environments. These symbols symbolize our living environments and guide us in different situations. Here we introduce our idea of intelligent icons and their functions. We also give some examples of their applications. We review spatial relations as an interesting approach to (a) icon recognition and (b) to illustrate the situation-specific relation between the user and the icon in question. The design principles and the basic model of the system architecture of our I-Icon system are described. We also present an example of intelligent icon implementation.
An implementation-oriented model to represent uncertain temporal information in databases is proposed. Temporal information is presented to the user as anchored time intervals with optional beginning and end dates. The model accounts for both instants and intervals, and can be applied to uncertain dates by leaving days and months optional, or by using symbolic constraints that present additional time granularities. The model is defined using conventional relational database structures to support ease of deployment and integration to legacy systems with efficient query capabilities. The model is based on experiences with an existing museum database and highlights challenges related to temporal representation of cultural-historical data in practice. The model is compared with temporal representations used in other museum information systems and collections management standards. Possible opportunities to extend the model in the future research include defining a formal algebraic presentation or utilizing an explicit time ontology.
In this paper we define a concept of service model, and introduce hierarchical service models suitable for large service parks. We describe a method for handling higher-order workflows that we use for automatic composition of services. We give a description of a specification language suitable for representing hierarchical service models. We introduce two kinds of ontologies: service ontologies and the user ontologies, and demonstrate their usage in our experiments on Estonian e-government information system.
We propose a structural approach to service composition using a relational model. We focus on structural aspects of service composition and apply the relational model. Services in our model are defined and organized by relations, and service manipulations such as service invoke and service composition, are achieved by relational operations to the corresponding relations. Since conventional relational algebra is insufficient for service management, we extend relational algebra and introduce a new θ-operator to deal with services.
We also explain how to achieve this model using a relational system to manage the database. Users utilizing this system, can access services with SQL, and it becomes easier for them to collaborate with data stored in relational databases and services.
Conceptual modelling is one of the central activities in Computer Science. Conceptual models are mainly used as intermediate artifact for system construction. They are schematic descriptions of a system, a theory, or a phenomenon of an origin thus forming a model. A conceptual model is a model enhanced by concepts. The process of conceptual modelling is ruled by the purpose of modelling and the models. It is based on a number of modelling acts, on a number of correctness conditions, on modelling principles and postulates, and on paradigms of the background or substance theories. Purposes determine the (surplus) value of a model. Conceptual modelling is performed by a modeller that directs the process based on his/her experience, education, understanding, intention and attitude.
Conceptual models are products that are used by other stakeholders such as programmers, learners, business users, and evaluators. Conceptual models use a language as a carrier for the modelling artifact and are restricted by the expressiveness of this carrier. This language is often also used for the description of the concepts that are incorporated into a modelling result. Concepts can be explicitly defined or can be implicitly assumed based on some common sense within an application domain, a Computer Science sub-culture and within a community of practice.
A theory of conceptual models and a theory of modelling acts have been developed in [26,27]. This paper aims at a development of a general theory of modelling as an art in the sense of [9]. A general theory of modelling also considers modelling as an apprenticeship and as a technology. We distinguish between the art of modelling within a creation and production process, the art of modelling within an explanation and exploration process, the art of modelling within an optimisation and variation process, and the art of modelling within a verification process. This distinction allows to relate the specific purpose with macro-steps of modelling and with criteria for approval or refusal of modelling results.
A computational reconstruction of communication in dialogue requires not only an analysis of language expressions, but of utterances. In DBS, an utterance is defined as (i) a propositional content and (ii) a cluster of pointers called STAR. The STAR serves to anchor a content to the interpretation's parameter values of Space, Time, Agent, and Recipient. The STAR also provides the referents for a certain kind of sign in natural language, namely the indexical.
In this paper, three different STARs are used to code three different perspectives on content: (i) the STAR-0 for coding the agent's perspective on non-language content resulting from recording current recognition and action, (ii) the STAR-1 for coding the speaker's perspective on content underlying language production, and (iii) the STAR-2 for coding the hearer's perspective on content resulting from language interpretation. It is shown that the computation of these perspectives by means of DBS inferences is completely software mechanical.
A large number of news articles are published on the Web every day, and demand of discovering news articles on new/important topics has been growing. In this paper, we present a method for detecting characteristic words co-occurring with a target word (characteristic co-occurrence words) to help users find important topics related to the target word. The method divides news articles published in a certain period of time into two groups by whether the target word is included or not, then computes score of each word co-occurring with the target word in some news articles by counting the number of news articles including the co-occurring word for each of the news article groups. We can detect characteristic co-occurrence words more effectively by clustering news articles in advance and computing the score only in clusters which news articles including the target word belong to.
Modeling of information systems is typically based on conceptual (intensional) primitives abstracted from a Universe of Discourse. Among different relationships the hierarchical is-a and part-of relationships play a central role in modeling. Traditionally these are viewed as separated structures. However, there are approaches that aim to view them thorough one intensional containment relationship among concepts. In this paper, we seek a solution for this philosophically hard question. In other words, we aim to find one universal relationship at the intensional level that covers both the is-a and part-of relationships. The study is based on the integration of two established theory – Kauppi's concept theory that is an intensional theory and Mereology that is a theory of parts and wholes at the extensional level.
We present archetypes based techniques we are using in developing of real life laboratory information management systems (LIMS) software and LIMS Software Factory. These archetypes based techniques utilize software engineering triptych together with archetypes and archetype patterns for software factories. Following the software engineering triptych, to write software we have to know requirements; to know requirements we have to know domain; to know the domain we have to analyze and model one. We call our techniques as Archetypes Based Development (ABD). In ABD the domain is analyzed according to Zachman Framework by asking questions what, how, where, who, when, and why. The domain model is developed by using the product (what), business process (how), organization structure (where), person (who), order and inventory (when), as well as rule (why) archetype patterns. We use the domain model analyzed and developed in this way as a domain specific language for prescribing requirements. In our understanding, with ABD it is possible to increase the dependability of developed software; reduce semantic heterogeneity of models and data types; improve the maturity of the development process; and lead developments of one-off software towards software factory.
Data mining algorithms aim to provide some means to expose the hidden information behind data. But considering a particular problem statement raises the question, which algorithm should be employed and moreover how and which processing steps should be nested to convey a target-aimed knowledge discovery process. Present approaches such as the CRISP-DM are mainly focused at the management or the description of such processes but they do not really describe how such a discovery process should be designed.
In the presented work we propose a framework aiming at the design of a knowledge discovery process where the prior knowledge of a user and his goals are central to the process design. We discuss three stages of knowledge, leading to a framework with three layers. For each level we show the particular meaning and transformation of knowledge, being more specific as we move towards the lower layers of the framework. We demonstrate an approach for supposition-based algorithm recommendation in terms of clustering as an illustration of the framework.
This paper proposes a method to discover consumer behavior from buzz marketing sites. For example, in 2009, the super-flu virus spawned significant effects on various product marketing domains around the globe. Using text mining technology, we found a relationship between the flu pandemic and the reluctance of consumers to buy digital single-lens reflex camera. We could easily expect more air purifiers to be sold due to flu pandemic. However, the reluctance to buy digital single-lens reflex cameras because of the flu is not something we would have expected. This paper applies text mining techniques to analyze expected and unexpected consumer behavior caused by a current topic like the flu. The unforeseen relationship between a current topic and products is modeled and visualized using a directed graph that shows implicit knowledge. Consumer behavior is further analyzed based on the time series variation of directed graph structures.
This paper presents a concept model for solving bond mathematics problems; it is the manner in which our knowledge concerning bond mathematics is organized into a cognitive architecture. We build a concept model for bond mathematics to achieve good organization and to integrate the knowledge for bond mathematics. The ultimate goal is to enable many students to understand the solution process without difficulty. Our concept model comprises entity-relationship diagrams, and using our concept models, students can integrate financial theories and mathematical formulas. This paper illustrates concept models for bond mathematics by showing concrete examples of word mathematics problems. It also describes our principles in developing the concept model and the descriptive power of our model.
Nowadays modern Web applications provide desktop-application-like flexible user experiences without using explicit requests. Modern Web applications need complex components: sever-side logic programs, server-side communication programs, output Web pages, client-side logic programs, and client-side communication programs. We present a new generation model called client-side centric model. This model allows us to define one-page modern Web applications dealing with databases easily. From GUI based declarative definitions, we can generate COMET technology based modern Web applications such as chat applications and calendar applications.
A robust and reliable transactional processing is a key quality factor in many database internet information systems. We give three design patterns that model reliable session and transaction management in transactional web applications. These are: session timeout, server default action and split client-server state representation. Only the first design pattern can be successfully implemented with the standard WCF communication facilities available for Silverlight .NET subset. Therefore we include also a simple, robust and WCF compatible communication stack. Surprisingly, overriding the standard WCF facility has resulted in an almost factor 30 performance boost for local calls.
In this paper we summarise our acquaintance with preference learning after a series of papers - presenting models, algorithms and experiments with preference learning in e-shop environment. We recall some achievements, several observations and problems left, together with thorough description of the preference learning. We conclude with our future plans.
CRUD (Create, Read, Update, Delete) matrix is a popular way for specifying the relation between software systems functionality and data classes. This article proposes a visual model that is based on Formal Concept Analysis for examining its hidden structure. We describe the relevance of this kind of model to software engineering in terms of the discovery of separate subsystems, use-case coverage and data table stability.
The technique of automatic ontology integration on the level of alignment has been developed. The technique is oriented for information search in heterogeneous resources. The thesaurus, which is result of integration, is used for translation search queries in term of particular ontologies. The technology is based on use of the weighed relation of association which is modified during use of the integrated ontology. It allows to avoid attraction of experts in the course of integration of ontologies and, thereby, to raise level of automation of the given process.
The article is based on the defense research project “Knowledge Management of the NEC in the Army of the Czech Republic – MENTAL”. The theoretical basis of the project is Topic Maps. The key issue for the project solution is designing and creating a suitable ontology. The implementation environment is technology Tovek Tools and ATOM2. The paper describes the procedure from the selection of an Upper Ontology through the Core Ontology design to the processing of the Domain Ontology. Ontology definitions are stated and their meaning is explained. The paper next explains the ways reusing of an existing taxonomy in an ontology construction, presents the possibilities of using the taxonomy built in the selected domain when creating its domain ontology, explains the difference between taxonomies and ontologies in various contexts, and focuses on the description of the specific domain ontology and the use of the existing taxonomy for its building.
A set of memes were extracted from a knowledge-exchange chain experiment to investigate the communication and exchange of information in physics classes. Meme change patterns were analysed using two approaches. First, meme similarity matrix was created using a similarity measure, a Võhandu maximum correlation path/tree (similar to a minimum spanning tree approach on an inverted scale) was computed, the tree was partitioned using Võhandu partitioning rules. Second, a monotone systems minus technique algorithm was used to reorder memes and to partition the ordering. The two approaches gave roughly similar partitioning of memes into possible memeplexes.
In this paper, we focus on the communication and reasoning of agents in Multi-Agent Systems. We propose a method of modelling dynamic behaviour of these systems using Transparent Intensional Logic. We also describe several issues where the formalization affects the implementation of both the system and the agents, and propose solutions to the problems that may arise. In order to make the paper easily readably, we illustrate our approach by a simple example.