Ebook: Information Modelling and Knowledge Bases XXIV
Information modelling and knowledge bases have become important topics in academic communities related to information systems and computer science. The amount and complexity of information itself, the number of abstraction levels of information, and the size of databases and knowledge bases are continuously growing.
The aim of this series of Information Modelling and Knowledge Bases is to make progress in research communities with scientific results and experiences achieved using innovative methods and systems in computer science and other disciplines, which have common interests in understanding and solving problems on information modelling and knowledge bases, as well as applying the results of research to practice. The research topics in this series are mainly concentrated on a variety of themes in the important domains: conceptual modelling, design and specification of information systems, multimedia information modelling, multimedia systems, ontology, software engineering, knowledge and process management, knowledge bases, cross-cultural communication and context modelling. Much attention is also paid to theoretical disciplines including cognitive science, artificial intelligence, logic, linguistics and analytical philosophy.
The selected papers cover many areas of information modeling and knowledge bases, namely theory of concepts, database semantics, knowledge representation, software engineering, WWW information management, context-based information retrieval, ontological technology, image databases, temporal and spatial databases, document data management, process management, cultural modeling, social networks, personalization, interfaces, data mining and many others. This new issue also contains a paper initiated by panel “Multimedia Information Systems for Social, Cross-Cultural and Environmental Computing”.
We believe that this series of Information Modelling and Knowledge Bases will be productive, valuable and fruitful in the advance of research and applications of those academic areas.
The Editors
Peter Vojtáš
Yasushi Kiyoki
Hannu Jaakkola
Takehiro Tokuda
Naofumi Yoshida
Collections of visual symbols are context-specific, and include symbols and signs for example at airports, in hotels, and in traffic. They symbolize our living environments and guide us in different situations. We introduce a pattern recognition system, based on bag-of-features descriptors and support vector machine classifier, for recognizing semantic elements from pictorial symbols. We evaluate our approach by extracting semantic tags from Finnish and Japanese road signs, and by translating their meaning. Our paper demonstrates the idea of a visual dictionary that anyone from any culture can understand in any context of life.
The paper describes the creation of a knowledge system about knowledge management on the basis of the Topic Maps theory, using the Ontopia open source software. Special attention is given to the ontology design and application processing to use the knowledge base. Applied components will be specified, as well as the architecture of the knowledge system. An example of the outcome application will be presented.
Data masses require a lot of data processing. Data mining is the traditional way to transfer data into knowledge. In visual analytics, humans are integrated into the process due to continuous interaction between the analyst and the analysis software. Data mining methods can be utilized also in visual analytics, where the priority is given to the visualization of the information and to dimension reduction. However, the provided data is not always enough. There is a large amount of background contextual information, which should be included into the automated process. This paper describes a context-sensitive approach, in which we utilize visual analytics by studying all phases in the process. To observe the process from all points of view, we divide it into three architectural sections: sensing, processing, and actuation. Thanks to the versatility of visual analytics methods, our goal in the future is to apply these methods in the field of energy production in a laboratory-scale power plant.
In addition to functions, the architecture of a software system defines numerous other properties, commonly referred to as non-functional or quality properties. In web applications, two well-known architectural styles are common. They are resource-oriented architectural style, implemented using RESTful principles, and message-passing architectural style, implemented in XMPP. These architectural styles have their different benefits, whereas in real applications properties of both are sometimes needed. In this paper we discuss how RESTful web architecture can be complemented with XMPP. As an example, we use a distributed content management system, which is built using RESTful design guidelines and complemented with message passing architectural style implemented on XMPP. Features of both architectural styles are used in a fashion where no negative feature interaction takes place. In the completed system, both architectural styles are clearly identifiable, and their non-functional properties are preserved.
Twitter has attracted attention recently as a new way of collecting, providing and sharing information on the Internet. However, it is difficult for us to find Twitter users to follow for getting valuable information related to a topic of interest. In this paper, we propose a method for ranking Twitter users using the Twitter keyword search. This method utilizes tweet relation among Twitter users and tweets (retweet and reply) as well as tweet count of each Twitter user. Experimental results show our method outperforms methods using only tweet count, tweet relation, or follow relation.
The Internet is providing a huge amount of information/knowledge through Web pages. For personal and effective use of such resources, the partial information extraction technology breaks a new path to enable end-users to obtain and integrate only needed information from various Web pages into original compositions. However the traditional XPath-only extraction method would fail in case Web sites use different templates to construct Web pages or change the layout of Web pages, which we call as the stability problem. In this paper, we propose a novel hybrid extraction mechanism for stably extract the partial information. We compare the original and changed Web pages to get the unchanged nodes as a stable-part list and use them to generate new paths. Since the list will be re-ranked after new stable-parts are found, the success rates of extraction can be self evolving and correspondingly reduce manual intervention. We show the usefulness of our approach by experiment on real Web sites in practice.
In this paper we summarize efforts of our research group on web semantization - process of increasing the degree of automation of web processing – and some of its applications. We present several methods for mining textual information and assisted annotations as we believe this should be the first steps towards the semantic web. Then several methods for processing the gathered data are described. The proposed methods mainly aim on modeling user and his/her preferences and then helping them with reaching their goals. We focus on the complex environment of the web semantization as a wider context where both tools providing and consuming semantical data are placed.
This paper describes a method for directly encoding and sharing the unique knowledge and perspective of Website-usage and relationships that each person using the Web holds. Knowledge sharing is achieved by creating a Real-time Dynamic Collaborative Space which combines a user's, their friends', and the public's combined knowledge in a new space. This new space provides a context in which the abstract properties of a user's knowledge is expressed through an intuitive visualization which uses real-world-space and behavior knowledge-sharing analogues for allowing the user to understand their Personal Web Context in relation to other people's. This new space and visualization is dynamic, reacting in real-time to a user's and their friends' changing knowledge of the Web. This paper describes the new space, the modeling of Web knowledge, and a discussion of implementation strategies and an in-progress prototype.
Software products in companies are a substantial part of their production and a necessary condition of their business success. However, there is still a problem of software proper specification and verification. We propose a formal specification method for software processes based on Transparent Intensional Logic, TIL. This method is logic-oriented, because logical specification within a rich formal framework makes it possible to explicitly define process resources as well as process logic. Moreover, our novel contribution consists in integrating a knowledge-based method with process dynamic modeling.
Once a disaster occurs, people discuss various topics in social media such as electronic bulletin boards, SNSs and video services, and their decision-making tends to be affected by discussions in social media. Under the circumstance, a mechanism to detect topics in social media has become important. This paper targets the East Japan Great Earthquake, and proposes a time series topic detection method based on modularity measure which shows the quality of a division of a network into modules or communities. Our proposed method clarifies emerging topics from social media messages by computing the modularity and analyzing them over time, and visualizes topic structures. An experimental result by actual social media data about the East Japan Great Earthquake is also shown.
This paper describes the concept and some preliminary experiments of extension of the sitIT.cz portal – the social network of the ICT specialists in Czech Republic. SitIT.cz interconnects ICT specialists and offers effective search according to several types of structured – machine readable – profiles. It is intended to support technology transfer, sharing information, building teams and to enable discussions, especially targeted to increase of competitiveness of R&D in ICT. Main stimulus for sitIT.cz extension came from acquaintance with software development which often needs extensive testing – not only technological but also from users' point of view. Extension can make network more attractive for both developers and users. We propose new structured profiles that can be used for evaluating software and allowing developers obtain as valuable feedback as possible. One of main outcomes of this system is creating baseline knowledge by humans for further comparison and/or training.
Data analysis based on spatial and temporal relationships leads to new knowledge discovery in multi-database environments. As various and almost infinite relationships are potentially existing among heterogeneous databases, it is important to realize an objective-based dynamic data analysis environment with appropriate data collection from selected databases.
These databases are integrated within a meso database which combines the data from different databases into one redundant data store. Typically the data store consists of a number of data cubes. These cubes are recharged whenever micro data changes depending on a recharge policy. The meso database is then use for population of the analysis databases which contains data according to the analysis demands. After application of analysis or data mining functions the result presentation database is populated.
We develop a novel approach to data analysis by turning topsy-turvy the analysis task. The analysis task drives the features of the data collectors. These collectors are small databases which collect data within their interest profile. Data from the collector databases are then used for the presentation database.
The feature of this approach is to realize dynamic data integration and analysis among heterogeneous databases by computing spatial and temporal interrelationships objective-dependently, and such integration realizes to retrieve, analyse and extract new information generated with a viewpoint of spatial and temporal occurrences among legacy databases.
Database querying is still based on programming languages. The user must learn the database programming language and the corresponding database schemata before asking a data question to the system. Programming languages use either the classical textual style or some kind of visualisation for queries (e.g., query-by-example or VisualSQL). Conceptual queries are first to be specified at the conceptual level, using concepts and languages easily understood by the application domain experts who must be qualified to validate the query.
Question verbalisation is already a difficult cognitive problem. Data understanding and assessment is not less difficult for casual users of database infrastructures. The formalisation of such questions within a programming language environment might be too difficult for these users. The evaluation whether a formal query corresponds to the question the user has had in mind is a very difficult task.
This paper focuses on the verbalisation of questions to a database system without profound knowledge of SQL or other query languages. We propose a six-step procedure for query verbalisation in this paper. This procedure is based on query and answer forms which can be mapped to database queries in dependence on the database schema and on the profile of the database system.
Recognizing the central quality of a metallographic sample represents, for a metallographic laboratory expert, a challenging task involving analyzing the size and number of defects at the evaluated segment. In order to help human experts, a software tool was developed for machine evaluation of these metallographic samples. The aim of the paper is to present the properties of the developed software and to demonstrate its practical application in industry. The developed software tool is based on the machine recognition of small but important non-homogeneity objects located at the central part of the metallographic sample and on a statistical evaluation of the properties of these extracted central objects. Because the central objects are very similar to noise, machine recognition is not straightforward. However, this paper contains promising results of comparing machine and expert assessments of an industrial image database of digitized metallographic samples. The image database is provided by the research department of ArcelorMittal Ostrava, a.s., in the Czech Republic.
This paper presents a model of evaluation for computing performance and usability benchmarks. The target evaluation is context analysis of video streams. The paper describes the required system components, system architecture and evaluation model, as well as the test cases that can be performed using the designed environment. During the modelling, design, implementation and preliminary tests of the environment, some issues, problems and good-to-know key points were discovered which are explained in this paper. Reference guidance for future test cases is also given. The modelling and experimental results presented in this paper are based on the test environment, and the observations of the cooperation project between Tampere University of Technology Pori Unit and Keio University Shonan Fujisawa Campus.
Our imagination-based query creation method is a new approach to image-recall functions reflecting color-based imaginations in human brains. This method dynamically represents user's imaginations and creates queries for image retrieval. The important aim of this method is to express user's abstract intentions in a computable objects and to recall those imaginations from image databases with color-analytical image retrieval. The main features of this method are: (1) a hierarchical model for color indexing to express image contexts, (2) color zooming for context-dependent color histogram generation by utilizing color names as contextual words, (3) integrating several histograms by using five histogram-combining operations and dynamic semantic weighting, and (4) threshold controlling for semantic correlation. This method exploits color information to retrieve intended images expressing a user's imagination. This paper also represents a “Database for Imagination Expression” (DBIE), which stores, shares and reuses created queries. Additionally, several qualitative experiments are also represented to examine the effectiveness and the feasibility of our method.
We present business archetypes and archetype patterns based methodology for modelling of business domains. Business domain model (e.g. in health care, banking, transportation, etc.) describes universe of discourse of business without any reference to the software requirements or to the design to any software system. Business archetype is an abstract model describing some common term that consistently and universally occurs in business domains and in business software systems. Business archetype pattern is collaboration between business archetypes. Product, process, party, party relationship, inventory, order, quantity and rule archetype patterns are members of this archetype patterns framework. We exemplify the usefulness of this framework of business archetypes and archetype patterns by utilizing it in development of clinical laboratory domain model and of clinical laboratory information management system (LIMS) software based on this domain model. In software development we follow the software engineering triptych - from domain modelling via requirements constructing to design and implementation of software. In our understanding the domain modelling with archetypes and archetype patterns complements Bjørner's domain facets based domain analysis methodology and results in more flexible, customizable, reliable and interoperable software.
This paper deals with document similarity based on contextual similarity of words occurring in the documents. The method makes it possible to cluster documents in which words with similar meanings occur even though the words and their meanings are not identical. These similarities can be discovered due to context tracing. On the other hand, we can distinguish between homonyms bearing different meanings. Thus the proposed method provides a fine-grained information mining from particular documents.
This paper presents a context-based multi-dimensional corporate analysis method that evaluates companies based on user-specified contextual settings. The contextual settings are translated and decomposed into distinct spaces, finance, technology, and brand, each of which consists of a subspace containing multiple parameters. The contextual settings determine the relevance of each of such parameters in evaluating companies by assigning appropriate weight to the parameter. The important feature of this corporate analysis method is that it allows the user to analyze companies seamlessly only with the contextual settings without the knowledge of multi-dimensional decomposition.
Present computer science and informatics have to produce IT solutions that are user friendly and in general global appropriate while we are nowadays much more connected to each other as in the past. We are living in multicultural and multilingual societies where the intercultural dialog and cultural awareness are more and more important. With increased mobility, small societies are more multicultural and international, while in the past mostly only big societies with a lot of immigrants have been described as multicultural and multilingual. Internationalization and mobility pushed us to take care about the global conceptual modeling. This entire means that our partners in the modeling process (users and customers) are/could coming from different organizational, cultural and language environments. Consecutive, this can influence theirs way of understanding conceptual modeling, as well as methods, expectations, results and other activities in the process. Even more, this can influence also their understanding of expert topics independent to the fact that we have to do with the common, everyday used topics like counting and banking, known all around the world.
We live in “Age of Information”, but we still do not understand well, what Information is and what are its properties. Is there some kind of Information conservation law similar to conservation laws for energy and matter or does Information behave like a nuclear reaction, feeding itself and trying all the time to propagate further? In order to understand Information better we should consider principles which have guided the best producer and consumer of information – life and especially its highest form – Man. Memory, storing information about their environment and ability to replicate allows living things to change the thermodynamical balance of their environment – they decrease disorder (entropy) in their population while increasing it in their environment. Man invented for collecting information (thus also increasing entropy in environment) a totally new tool – language, which moved the process from the level of individual entities to the level of the whole Mankind. Language is the Mankind's model of the world, which reflects word structure and (in the limit) converges to a similar structure, i.e. has entropy close to the entropy of the world which it describes.
Here is considered the essence of the concept “information” and different uses of the word. From the many kinds of information the most important in everyday life is the social, macro information – the secondary information, created by social communication from individual, primary information obtained by our senses in perceptions. The tool for creating the social, shared by whole Mankind information is language.
To understand a phenomenon we should consider why and where it appeared. Here is modelled emergence of language in computer simulations of communication and information exchange in community of agents. In simulations agents created for exchange of their perceptions (new) language, following some very simple principles – they were eager to distribute their perceptions, inventing new signals for them; when receiving signals from others, they followed the principle of maximizing similarity of their language with their perceptions and received messages, using only minimal assumptions about meaning of received signals; they also utilized information compression using names.
To measure the process were introduced several measures of entropy: for the world, using the Shannon's entropy applied to Pawlak's model of information systems and a object-oriented approach – using the vector of differences of objects; for entropy of languages as weighted many-to-many relations between real-world objects, their attributes and names and their denotations – words is presented a new formula. Made in simulations measurements of the entropy of language show, that language continued to develop also when agents already well understood each other; entropy of languages steadily increases, still remaining smaller than the entropies of the world which language models.
Exceptions are considered to be unusual states that can but must not be taken primarily into account. They form exclusions, represent cases to which a rule does not apply, form specific states that are not going to be handled at least by the current system, or might represent legal objections against the typical state. Exceptions are currently considered to be a sign of poor culture, wrong implementation, bugs in the code, bad understanding of the application domain area, wrong deployment of technology, or poor education of people. This understanding is completely different from the understanding and treatment of exceptions in real life.
In our projects we discovered that exceptions must not be neglected in such a way. Architectures can be made more flexible to cope with exceptions. This paper thus aims in extending architectures of modern software and information systems in such a way that these systems are exception-aware and provide a management of exceptions in a coherent form.
Website usage continues to increase, while the number of servers increases, their total energy consumption is growing. Meanwhile, server utilization is consistently very low. Servers can be combined to effectively increase the utilization rate and hence the energy consumption even while reducing maintenance costs. This requires, however, that the knowledge of usage of services is updated and the information is sufficient. Usage of services will change as a consequence of users' habits and as a result of internal changes of the service. Therefore, usage of the service has to be capable of being continuously analyzed and to be able to express the result in an easily understandable manner. In the literature, there are several ways to analyze usage of web service. So far, none of these has been found suitable for continuous and automated analysis, in a way that could produce results for programmatic use in the future. We present an automated method for modeling web service usage. The model can help increase knowledge of website usage. As a result, valid actions can be taken in capacity planning and to consequently reduce the energy consumption. At the same time, it will also ensure that service quality remains satisfactory in relation to real demand.
In this paper, we explain one approach for a real-time solution in a mobile device, which would automatically sense and respond to the context of the consumer. This approach uses bag of words to record the state and context of the device. Using the Experience Matrix device will predict the next states, without a use of a schema, processing of bags of words using the Random Index (RI) algorithm. The index forms an associative storage of data that conveys proper data instruction pairs for the next steps or states. It turned out that this was also a very effective approach to provide real time response and almost static storage requirements. Moreover there was no need for training sequences for learning and recording the context. In this article we explain the overall solution architecture, where a dictionary is the model of the world. The context is specialized by inserting the information from mobile sensors to a bag of words, which is used to record and predict events on the mobile device. The Random Index algorithm is used to recall similar events from past. After recall the consumer is selecting an individual way to continue processing from predicted choices and a new state is entered. We would say that this new approach is a base for activity based computing that could potentially replace current application based paradigm in mobile communication devices. As an example we have implemented a predictive browser that uses the event matrix to predict next pages to be navigated into.