
Ebook: Databases and Information Systems IV

This publication contains papers that present original results in business modeling and enterprise engineering, database research, data engineering, data quality and data analysis, IS engineering, Web engineering, and application of AI methods. The contributions are from academics and practitioners from the entire world. We hope that the presented results will contribute to the further development of research in DB and IS field. The conference where these papers were presented has been approved by the IEEE Communication Society for Technical Cosponsorship. All papers have been extended significantly and rewritten completely. They have been reviewed by at least 3 reviewers from different countries who evaluated their originality, significance, relevance, and presentation and found their quality suitable for the publication in this volume.
This volume contains the best papers presented at the 7th International Baltic Conference on Databases and Information Systems (BalticDB&IS'2006). The series of Baltic DB&IS conferences has been initiated by Janis Bubenko jr. and Arne Solvberg in 1994. The conferences are highly international and bring together academics and practitioners from the entire world. They are organized by the Baltic countries in turn. The first conference was held in Trakai (1994), then followed conferences in Tallinn (1996), Riga (1998), Vilnius (2000), Tallinn (2002), Riga (2004), and again in Vilnius. The conference BalticDB&IS'2006 took place on July 3–6, 2006. It was organized by the Department of Information Systems (Vilnius Gediminas Technical University) and Software Engineering Department (Institute of Mathematics and Informatics). The conference has been approved by the IEEE Communication Society for Technical Cosponsorship.
The call for papers attracted 84 submissions from 21 countries. In a rigorous reviewing process the international program committee selected 48 papers for the presentation at the Conference and 27 for publishing in the Proceedings published by IEEE. After the Conference the program committee selected 20 best papers to be published in this volume. All these papers have been extended significantly and rewritten completely. They have been reviewed by at least 3 reviewers from different countries who evaluated their originality, significance, relevance, and presentation and found their quality suitable for the publication in this volume. These papers present original results in business modeling and enterprise engineering, database research, data engineering, data quality and data analysis, IS engineering, Web engineering, and application of AI methods. We hope that the presented results will contribute to the further development of research in DB and IS field.
We would like to express our warmest thanks and acknowledgements to all the people who contributed to BalticDB&IS'2006:
– the authors, who submitted papers to the conference,
– the members of the international program committee and the additional referees, who voluntarily reviewed the submitted papers in order to ensure the quality of the scientific program,
– the sponsors of the conference, who made this conference possible,
– all the local organizing team voluntarily giving their time and expertise to ensure the success of the conference.
We express our special thanks to Audrone Lupeikiene for all his assistance during the preparation of this volume.
Olegas Vasilecas, Johann Eder, Albertas Caplinskas
Proponents of active systems have proposed Event-Condition-Action (ECA) rules, a mechanism where behavior is invoked automatically as a response to events but without user or application intervention. The environment (the programming language and the operating system) in which a system is built influences how the event detector is designed and implemented. Sentinel provided active capability to an object-oriented database environment implemented in C++. However, C++ environment had certain limitations that proved deterrent to implementing some of the features of active capability. This paper discusses the re-designing and implementation of the active subsystem in the Java environment. Main motivations behind our objective of re-designing and implementing the active subsystem in the Java environment include: i) to overcome the limitations of the C++ environment, and ii) to exploit some of the capabilities provided by the Java environment that are critical for an active system. It also provides a novel approach for supporting the creation of rules, and composite and temporal events dynamically at run time, which is inevitable for several classes of monitoring applications. This avoids recompilation and restart of the system which are inappropriate in many environments that require fine-tuning of rules on the fly. It provides a generic set of classes that are designed to handle rules dynamically. This set of generic classes is application-independent making the system a general-purpose tool.
In philosophy, the term ontology has been used since the 17th century to refer both to a philosophical discipline (Ontology with a capital “O”), and as a domain-independent system of categories that can be used in the conceptualization of domain-specific scientific theories. In the past decades there has been a growing interest in the subject of ontology in computer and information sciences. In the last few years, this interest has expanded considerably in the context of the Semantic Web and MDA (Model-Driven Architecture) research efforts, and due to the role ontologies are perceived to play in these initiatives. In this paper, we explore the relations between Ontology and ontologies in the philosophical sense with domain ontologies in computer science. Moreover, we elaborate on formal characterizations for the notions of ontology, conceptualization and metamodel, as well as on the relations between these notions. Additionally, we discuss a set of criteria that a modeling language should meet in order to be considered a suitable language to model phenomena in a given domain, and present a systematic framework for language evaluation and design. Furthermore, we argue for the importance of ontology in both philosophical senses aforementioned for designing and evaluating a suitable general ontology representation language, and we address the question whether the so-called Ontology Web languages can be considered as suitable general ontology representation languages. Finally, we motivate the need for two complementary classes of modeling languages in Ontology Engineering addressing two separate sets of concerns.
Modern software engineering attacks its complexity problems by applying well-understood development principles. In particular, the systematic adoption of design patterns caused a significant improvement of software engineering and is one of the most effective remedies for what was formerly called the software crises. Design patterns and their utilization constitute an increasing body of knowledge in software engineering. Due to their regular structure, their orthogonal applicability and the availability of meaningful examples design patterns can serve as an excellent set of use cases for organizational memories, for software development tools and for e-learning environments.
Patterns are defined and described on two levels [1]: by real-world examples—e.g., textual or graphical content on their principles, best practices, structure diagrams, code etc.—and by conceptual models—e.g., on categories of application problems, software solutions, deployment consequences etc. This intrinsically dualistic nature of patterns makes them good candidates for conceptual content management (CCM). In this paper we report on the application of the CCM approach to a repository for teaching and training in pattern-based software design as well as for the support of the corresponding e-learning processes.
Workflow time management provides predictive features to forecast eventually upcoming deadline violations and proactive strategies to speed up late processes. Existing time management approaches assume that communication with external processes or services is conducted synchronously. This is not the case with inter-organizational processes which very frequently communicate in an asynchronous manner. Therefore weexamine diverse asynchronous communication patterns, show how to map them on an interval-based time model, and describe their application to inter-organizational workflow environments.
The paper deals with modeling of the enterprise knowledge management. The enterprise knowledge base together with explicitly modeled knowledge management activity is concerned as the major component of the knowledge-based enterprise. The Knowledge-Based Enterprise Model is developed for the analysis and design of knowledge management activity domain as well as for modeling its relationships with business domain and information technology domain. Knowledge management layer is identified within the model and further is decomposed and represented as formalized interactions – knowledge management controls.
Ontologies are well applicable for the description of scientific databases and thus provide appropriate means to link different databases. In a first step it is useful to generate initial ontologies from the respective database schemas. They can be completed with more specific information later on. If the database schemas change, however, these changes should now automatically be propagated to the respective ontologies. Using a realistic example from the biological field of signal transduction pathways, we demonstrate that the suggested approach is both, feasible and reasonable.
Fragmentation and allocation are database distribution design techniques used to improve the system performance by increasing data localisation and reducing data transportation costs between different network sites. Often fragmentation and allocation are considered separately, disregarding that they are using the same input information to achieve the same objective. Vertical fragmentation is often considered a complicated problem, because the huge number of alternatives make it nearly impossible to obtain an optimal solution. Therefore, many researchers seek for heuristic solutions, among which affinity-based vertical fragmentation approaches form a main stream in the literature. However, using attribute affinities to perform fragmentation can not really reflect the local needs of data at each site. Therefore it is not guaranteed that the remote data transportation costs can be reduced. This paper addresses vertical fragmentation and allocation simultaneously in the context of the relational data model. The core of the paper is a heuristic approach to vertical fragmentation, which uses a cost model and is targeted at globally minimising these costs. Further, based on the proposed vertical fragmentation, an integrated methodology is proposed by applying vertical and horizontal fragmentation simultaneously to produce mixed fragmentation schemata.
In many application domains (e.g., WWW mining, molecular biology), large string datasets are available and yet under-exploited. The inductive database framework assumes that both such datasets and the various patterns holding within them might be queryable. In this setting, queries which return patterns are called inductive queries and solving them is one of the core research topics for data mining. Indeed, constraint-based mining techniques on string datasets have been studied extensively. Efficient algorithms enable to compute complete collections of patterns (e.g., substrings) which satisfy conjunctions of monotonic and/or anti-monotonic constraints in large datasets (e.g., conjunctions of minimal and maximal support constraints). We consider that fault-tolerance and softness are extremely important issues for tackling real-life data analysis. We address some of the open problems when evaluating soft-support constraint which implies the computations of pattern soft-occurrences instead of the classical exact matching ones. Solving efficiently soft-support constraints is challenging since it prevents from the clever use of monotonicity properties. We describe our proposal and we provide an experimental validation on real-life clickstream data which confirms the added value of this approach.
On Line Analytical Processing (OLAP) is a technology basically created to provide users with tools in order to explore and navigate into data cubes. Unfortunately, in huge and sparse data, exploration becomes a tedious task and the simple user's intuition or experience does not lead to efficient results. In this paper, we propose to exploit the results of the Multiple Correspondence Analysis (MCA) in order to enhance data cube representations and make them more suitable for visualization and thus, easier to analyze. Our approach addresses the issues of organizing data in an interesting way and detects relevant facts. Our purpose is to help the interpretation of multidimensional data by efficient and simple visual effects. To validate our approach, we compute its efficiency by measuring the quality of resulting multidimensional data representations. In order to do so, we propose an homogeneity criterion to measure the visual relevance of data representations. This criterion is based on the concept of geometric neighborhood and similarity between cells. Experimental results on real data have shown the interest of using our approach on sparse data cubes.
The goal of this paper is to demonstrate how interfaces of Data Warehouses and OLAP may be used in holistic Model-driven development process of applications of enterprise. Traditionally, the main responsibilities of Data Warehouses and OLAP are concerned with relation to data analyzing and reporting activities. However, the real power of data analysis lies in possibility to support dynamic formulation of queries and analyzable data structures to respond to emerging needs of users and other applications of enterprise and the Web. From the business goals and processes viewpoint, there is no difference between operational systems and Data Warehouses/OLAP. The proposed Model-driven methodology brings possibility to automate development of enterprise applications extended with analytical capabilities, incorporating feedback from OLAP tools into computerized business processes, and tendering analysis results for business improvement. Proposed metamodels and transformations are extension of our previous work, which was focused on generating normalized multidimensional data models on demand. In this paper, wider possibilities to obtaining well-formed warehouse schemas, ensuring completeness of warehouse data, and using them in Model-driven development process are considered.
The work is supported by Lithuanian State Science and Studies Foundation according to Eureka programme project “IT-Europe” (Reg. No 3473).
The analysis of events ordered over time and the discovery of significant hidden relationships from this temporal data is becoming the concern of the information society. Using temporal data as temporal sequences without any preprocessing fails to find key features of these data. Therefore, before applying mining techniques, an appropriate representation of temporal sequences is needed. Our representation of time series can be used in different fields, such as aviation science and earth science, and can also be applied to, for instance, Temporal Web Mining (TWM) [1], [2], [3], [4]. Our representation of time series aims at improving the possibility of specifying and finding an important occurrence. In our new concept, we use data band ranges and areas in order to determine the importance or the weight of a segment. According to the closeness of a segment to a data band range, this representation of time series can help to find a significant event.
This paper focuses on our representation of time series.
This paper discusses a possibility to take business rules out of the software system code and proposes a method based on presented framework for use of such business rules for dynamical software system integrated model transformations into the final software system code or system requirements at the same time tracking relations between them through the entire business rule based information system engineering process.
Aspect-oriented development has become one of the most intensively investigated themes in software development. In this paper, the method is proposed for reconfigurable modeling of aspect-oriented information system when <<Core>> and <<Aspect>> concerns may be represented separately and combined in different ways without changing their models or implementation. <<Aspect>> concerns are consistently represented during development process starting from <<Aspect>> use cases till crosscutting interfaces and templates for tailoring aspects for specific contexts. Examples from IT-Europe project are given where aspect-oriented concepts were used for modeling behavior of software agents performing self-management functionality of IT Knowledge Portal.
The work is supported by Lithuanian State Science and Studies Foundation according to Eureka programme project “IT-Europe” (Reg. No 3473).
Current component-oriented theory does not provide adequate support for information systems development. To say more precisely, it is forming. The main purpose of this paper is to discuss the attempts to combine classical enterprise information systems development methodologies with component-based approach. The paper advocates and argues for the use of component-based paradigm for the development of systems at all three layers of enterprise. It presents concepts and principles of this paradigm in enterprise engineering context. The integrated enterprise information system components, their structure and types are discussed and defined.
Web services are one of the most popular ways for building distributed Web Information Systems (WIS). In this paper we propose a distributed implementation of the Hera methodology, a methodology based on models that specify the different aspects of a WIS. The Web services used in the implementation are responsible for capturing the required data transformations built around specific Hera models. A service orchestrator coordinates the different Web services so that the required WIS presentation is built. Based on the degree of support of the user interaction with the system two architectures are identified: one for the construction of static applications, and another one for the building of dynamic applications.
This paper explores the challenges of constructing an architecture for inter-organisational collaborative interactions based on Service Oriented Architecture (SOA), Web services choreographies and software agents. We present an approach to harmonisation of the “global” or neutral definition of business collaborations, with partner-specific implementations, which can differ in terms of platform, environment, implementation technology, etc. By introducing the concept of pluggable business service handlers into our architecture we draw on the work carried out by ebXML initiative, business services interfaces, in particular. Due to increasing need for better management of collaborative interactions, Virtual Organisations (VO) become an important tool for creation and maintenance of federated trust domains among the collaboration partners. We look into the software agents abilities to serve as the background support mechanism for the automation and management of the Virtual Organisations lifecycle.
Development of web-based applications needs tools in order to make the application development more effective by using/re-using business rules as well as web services. These tools should incorporate facilities for capturing different semantic aspects of an application. This paper presents a new conceptual and technological framework of using a rule language and rule engine for capturing semantics in modern web-based systems. The framework enables to cover two aspects of semantics in web-based systems: business rules and web service composition logic. The technology consists of 2 main parts: the application server Xstone for creating 3-layered systems and the RqlGandalf rule solver. The middleware server Xstone connects to Oracle, PostgreSQL databases and the RqlGandalf rule system. The RqlGandalf rule system is targeted for two different tasks: defining and using business logic rules as well as for rule-based synthesis of complex queries over web services. The presented rule-based system development technology is implemented for the Linux platform as open source software.