
Ebook: Databases and Information Systems VII

The series of International Baltic Biennial Conferences on Databases and Information Systems (Baltic Db&IS) was initiated by Prof. Janis A. Bubenko Jr. (Royal Institute of Technology, Stockholm, Sweden) and Prof. Arne Sølvberg (Norwegian University of Science and Technology, Norway). The first conference was held in 1994 in Trakai, Lithuania. Professors Sølvberg and Bubenko also helped to arrange funding, and even trained researchers from Baltic countries in West European conference organization technologies. This initiative from Janis A. Bubenko Jr. and Arne Sølvberg was indeed a great idea. It helped to consolidate research communities in Baltic countries, to establish contacts with research centres in European countries, both Western and Eastern, and even to renew contacts with researchers in Russia. The Baltic DB&IS conferences now make a major contribution to the development of regional research and boost international cooperation.
The Baltic DB&IS 2012 conference continues this series of successful conferences, held in (Riga 1998, 2004, 2010), Tallinn (1996, 2002, 2008), and Vilnius (2000, 2006). This is the tenth (Jubilee) Conference. The Conference took place on July 8–11 2012, in Vilnius. It was organized by the Lithuanian Academy of Sciences, Vilnius University Institute of Mathematics and Informatics, and the Lithuanian Computer Society and was supported by the Research Council of Lithuania.
The International Programme Committee of the Baltic DB&IS 2012 consisted of 62 members from 23 countries. Sixty nine papers from 14 countries were submitted for conference. As a rule, each paper was reviewed by at least three referees from different countries. As a result of this process, 43 papers were accepted and presented at the conference. Twenty seven of the best-revised papers have been selected for this volume, which also includes two papers presented by invited speakers.
The papers present original results concerning query processing and optimization, data mining, information systems engineering, business process modelling and implementation, service engineering, software evaluation and testing, information systems engineering tools and techniques, e-learning environments, and other related issues.
We would like to express our warmest thanks to all authors who contributed to the 10th Jubilee Baltic DB&IS Conference. Our special thanks to the invited speakers Prof. Janis A. Bubenko Jr., Prof. Marite Kirikova, and Prof. Arne Sølvberg. We are grateful to the members of international Programme Committee and additional referees for their careful reviewing of the submitted papers, and we also take this opportunity to express our very special thanks and deep gratitude to all our sponsors and to the conference organizing team. Last, but not least, we also thank all conference participants.
September 2012
Albertas Caplinskas
Gintautas Dzemyda
Audrone Lupeikiene
Olegas Vasilecas
Over the years we have grown used to a 10-fold increase in computer hardware performance every 5 years. This exponential growth has been consistent with Moore's law. Around 2004 exponential performance growth at this staggering pace seemed to have reached a ceiling. The major reason for this was that further performance increases led to intolerable high levels of heat generation from the circuitry. A dominating challenge to future hardware designs is to limit the heat generation of the electronic circuitry while improving the overall computational performance. An important question is whether exponential performance growth can be expected also in the future, and at which pace. The paper contains a discussion of what parameters are important for the future development, and what can be expected in the future. The presentation is meant for a non-specialist audience.
By adopting to more domains, database management systems (DBMSs) increase their functionality continously. This leads to DBMSs that often include unnecessary functionality, which decreases performance. A result of this trend is that new specialized systems arise that focus only on a certain application scenario but often reimplement already existing functionality. To avoid bloated DBMSs, we propose to introduce variability in DBMS implementations that allows users to select only needed functionality for a specific application scenario. In this paper, we focus on the query optimizer as it is a key component of DBMSs. We describe the potentials of tailoring query optimizers. Furthermore, we analyze common and differing functionality of three query optimizers of industrial DBMSs (SQLite, Oracle, and PostgreSQL) to create a variability model for query optimizers that can be used as a basis for future variability-aware implementations.
Resource allocation (RA) is one of the key stages of distributed query processing in the Data Grid environment. In the last decade were published a number of works in the field that deals with different aspects of the problem. We believe that in those studies authors paid less attention to such important aspects as definition of allocation space and criterion of parallelism degree determination. In this paper we propose a method of RA that extends existing solutions in those two points of interest and resolves the problem in the specific conditions of the large scale heterogeneous environment of Data Grids. Firstly, we propose to use a geographical proximity of nodes to data sources to define the Allocation Space (AS). Secondly, we present the principle of execution time parity between scan and join (build and probe) operations for determination of parallelism degree and for generation of load balanced query execution plans. We conducted an experiment that proved the superiority of our GeoLoc method in terms of response time over the RA method that we chose for the comparison. The present study provides also a brief description of existing methods and their qualitative comparison with respect to proposed method.
Intuitive ontology visualization is a key for their learning, exchange, as well as their usage in conceptual modeling and semantic database schema design. OWLGrEd is a visual tool for compact graphical UML-style rendering and editing of OWL 2.0 ontologies. We describe here the extensibility features for OWLGrEd that allow tailoring the editor for specific ontology-based modeling needs, including custom entity annotation visualizations and description of integrity constraints for semantic database schemas. We discuss the application of concrete OWLGrEd extensions in the context of ontology-centered information system engineering.
The discovery of high-level long-term system behavior patterns is essential for several tasks such as system analysis, performance tuning, adaptive data placement in a cloud data centers. This paper describes techniques for mining long-term activities from database query logs. We describe algorithms for extraction of query groups with similar occurrence patterns, identification of periodic groups, and experimentally evaluate the correspondence of the query groups with business processes in the system.
Index selection is an important part of physical database design. Its goal is to select an appropriate set of indexes to minimize the cost for a given workload under storage constraint. However, selecting a suitable configuration of indexes is a difficult problem to solve. The problem becomes more complex for indexes defined on multiple tables such as bitmap join indexes, since it requires the exploration of a much more search space. Studies dealing with the bitmap join indexes selection problem mainly focused on proposing pruning solutions of the search space by the means of data mining techniques or heuristic approaches. So far, the data mining based approaches have used closed frequent itemsets to reduce the search space for the selection process. These approaches have two notable shortcomings. Firstly, they generate a huge number of indexes with a lot of redundancy that it is very difficult to manage according to the system limitation (number of Indexes per table, storage space constraint). Secondly, when they construct the extraction context for mining frequent sets of attributes, they have used indexable attributes only once for each query in the workload which does not reflect the importance of a given query in the workload. Indeed, the queries in a workload are unlikely to have the same probability of being requested. To overcome these imitations, we propose to combine maximal frequent itemsets and query frequencies to improve the quality of generated indexes. This paper describes an approach that refines the index selection process, incorporating query frequencies in the extraction context for mining frequent set of attributes. We experimentally prove that our approach reduces the storage space and improves the quality of the recommended indexes.
Through years more and more software is produced. The quality of software architecture however has an important role in systems exploitation, as it determines the maintainability and extensibility of an application. Recently more emphasis is put on quality of the design, so that new features can be added with ease. To preserve code readability and extensibility, software architecture must be refactored from time to time to cope with the modifications. Nevertheless, reviewing the whole source code is time consuming and does not return any surplus, thus it is often skipped, causing the software architecture to decay in time over several modifications and making it harder to add new functionality in the future. An automated method of recognizing “bad” code would help to solve some of the issues. In this article the authors propose a concept of a refactoring tool, which uses ontology to find “smelly” design and tackle the aforementioned problems. Several aspects of the tool are discussed – how it works and how it can be used to improve the software architecture, thus augment the quality.
Relationship between clustering and data quality has not been thoroughly established. It is usually assumed that input dataset does not contain any errors or contains some “noise”, and this concept of “noise” is not related to any data quality concept. In this paper we focus on the four most commonly used data quality dimensions, namely accuracy, completeness, consistency and timeliness. We evaluate the impact these quality dimensions on clustering outcomes in order to find out which of them has the most negative effect. Four different clustering algorithms and five real datasets were selected to show the interaction between data quality and cluster validity.
In this paper author provides description of the original approach for visual analysis of data represented with general graphs, based on modification of magnetic-spring model and color-coded cognitive manipulation with graph elements. The theoretical background of magnetic fields in application to graph drawing is presented along with discussion of appropriate visualization techniques for improved information analysis and comprehension. Usage of other existing graph layout strategies (e.g. hierarchical, circular) in conjunction with magnetic-spring approach are also considered for improved data representation capabilities. A concept of integrated virtual workshop for graph visualization is introduced which relies on aforementioned model and can be used in GVS (Graph Visualization Systems). A case study of application of proposed approach is presented along with conclusions of its usability and potential future work in this field.
The enormous amount of information available over the Internet has forced users to face information overload while browsing the World Wide Web. Alongside with search engines, recommender systems and web personalization are seen as a remedy to this problem, since users are browsing the web according to their informational expectations while having a sort of implicit conceptual model in their mind. The latter is partially shared with other site visitors. In this paper we apply ontological modeling of anonymous ad-hoc web users' behavior to improve online user action prediction for web personalization via recommendations.
The formal foundation of Topological Functioning Model (TFM) makes it as a powerful tool to analyze the functioning of a problem domain and to formally relate problem domain artifacts with the artifacts that should exist in solution domain. TFM captures system functioning specification in the form of topological space consisting of functional features and cause-and-effect relations among them and is represented in a form of directed graph. The functional features together with topological relationships contain the necessary information to formally analyze problem domain workflows. To specify the behavior of system execution a new element is added to the TFM – the logical relations. The presence of logical relations within TFM denotes forking, joining, decision making, and merging during the functioning of the problem and solution domains. Thus it is needed to identify and carefully analyze logical relations within TFM in order to have all the necessary information to perform formal analysis of problem domain workflows. The problem domain workflows within this paper is represented by means of Activity diagrams.
There are many organizations, whose everyday life involves lots of tasks performed, or let us say executed, by lots of different people. Since nowadays processes have become much more complex, a big challenge for humans is to even understand what, when and how have to be done in order to reach their goals. Business process models are frequently used in organizations to make the process understandable to performers and to alleviate their work by connecting the process to organization's information system thus making processes human-executable. However, while developing a solution, there are usually only two extremes to choose from – either we use an all-in-one solution for describing process steps or we develop a domain-specific process modeling language from scratch. In this paper we propose the golden mean – a good base for domain-specific process modeling languages and appropriate tooling to be used in a big portion of related organizations and relatively easily integrated into their information systems. We define, what is meant to be “good” by binding the process language base with the natural language generator. We also demonstrate the approach on a case study of a process modeling language for the University of Latvia.
There are several data silos containing information about business entities and people but are not semantically connected. If in integration process of data sources trust management is also employed than we can expect much higher success rate in relations discovery among entities. Majority of current mash-up approaches that deal with integration of information from several data sources omit or don't fully address the aspect of trust. In this paper we discuss semantic integration of personal and business information from various data sources coupled with trust layer. The resulting system has higher and more defined solidity while trust for single entity and also for data source is defined. The case study presented in the paper focuses on integration of personal information from data sources mainly maintained by government authorities who have higher trustability than information from social networks, but we also include other less trusted sources. The developed SocioLeaks system allows users traversal and further relation discovery in a graph based manner.
The topological functioning model (TFM) of the system could be automatically transformed to behavioral specifications, e.g., UML Activity Diagrams, and BPMN diagrams. However, the TFM lacks a formal definition and specification of topological cause-and-effect relations. This paper addresses this challenge by using an inference means suggested by classical logic, namely, notions of necessity, sufficiency and logical operators. The result could be applied in order to reduce human participation in transformations, as well as to verify results of analysis of the system.
Business Process Management Systems (BPM systems) are used to control, analyze and manage business processes in organizations. BPM systems help to reduce the amount of administrative effort and focus on the processes which add value. Nowadays moving towards cloud-based Software-as-a-Service (SaaS) architecture some additional requirements for successful BPM implementation are identified. One of the main challenges is how to integrate SaaS BPM systems with existing on-premises systems, data sources and devices. In this paper mobile agents are proposed as the technology addressing this new challenge. A mobile agent is a composition of computer software and data which is able to migrate from one device to another autonomously and continue its execution on the destination device. The paper starts with an overview of SaaS BPM and existing approaches addressing SaaS integration challenges. Then the concept of mobile agents is described and the idea of how mobile agents may be used in SaaS BPM integration scenarios is presented. The paper is continued by a comparison of widely used integration approaches with proposed mobile agents based mechanism. Finally a newly proposed architecture is presented in a prototype outlining its advantages and proposing directions for future research.
It is hard to automatically find a semantically meaningful web service composition over a huge collection of web services available on the web. However, recent results in semantic web service research and technology could be effectively used within some specific domains. E-government is one of the sectors that need horizontal integration. Therefore, semantic web services and their composition become necessary and applicable in this domain. The paper proposes a semantic method of automatic composition of e-government services. It uses domain ontologies presented in OWL, semantic web services described in SAWSDL, quality of service (QoS) characteristics, ontology reasoning and AI planner in order to automatically provide service plans that could be presented in some service execution language. The approach is motivated by a case study from the domain of the Estonian state information systems.
The aim of the current article is to identify the factors influencing the enterprises' choice of communication channels with government comparing e-government to traditional service delivery channels such as telephone, mail, fax or visiting a government office in the context of media richness theory. The article, through logistic regression of data from a survey of Estonian and German enterprises, assesses the preferred communication channels in terms of the nature of enterprises' interaction with government and other characteristics of enterprises, as well as their experience with using e-government services. The analysis indicates that the use of communication channels between enterprises and government differs between countries and depends on the size and location of enterprise, economic sector, enterprise growth trend and some other characteristics. The study showed that the quality and ease of use of e-government services have an impact on increasing use of electronic communication channels.