
Ebook: Databases and Information Systems X

The importance of databases and information systems to the functioning of 21st century life is indisputable.
This book presents papers from the 13th International Baltic Conference on Databases and Information Systems, held in Trakai, Lithuania, from 1- 4 July 2018. Since the first of these events in 1994, the Baltic DB&IS has proved itself to be an excellent forum for researchers, practitioners and PhD students to deliver and share their research in the field of advanced information systems, databases and related areas.
For the 2018 conference, 69 submissions were received from 15 countries. Each paper was assigned for review to at least three referees from different countries. Following review, 24 regular papers were accepted for presentation at the conference, and from these presented papers the 14 best-revised papers have been selected for publication in this volume, together with a preface and three invited papers written by leading experts. The selected revised and extended papers present original research results in a number of subject areas: information systems, requirements and ontology engineering; advanced database systems; internet of things; big data analysis; cognitive computing; and applications and case studies.
These results will contribute to the further development of this fast-growing field, and will be of interest to all those working with advanced information systems, databases and related areas.
The book contains the post-proceedings of the 13th International Baltic Conference on Databases and Information Systems, held on 1–4 July 2018 in Trakai, Lithuania. The Baltic DB&IS 2018 continued this series of biennial conferences, held in Trakai (1994), Tallinn (1996, 2002, 2008, 2014), Riga (1998, 2004, 2010, 2016), and Vilnius (2000, 2006, 2012). The conference was accompanied by a Conference Forum and Doctoral Consortium. Since the commencement of the first event, the Baltic DB&IS has proved itself to be an excellent forum for researchers, practitioners and PhD students to deliver and share their research in the field of advanced information systems, databases and related areas.
The conference was organized by the Institute of Data Science and Digital Technologies of Vilnius University, Vilnius Gediminas Technical University, the Lithuanian Academy of Sciences, and the Lithuanian Computer Society.
The International Programme Committee consisted of 91 members from 31 countries, and 69 submissions were received from 15 countries. Each paper was assigned for review to at least three referees from different countries. As a result, 24 regular papers were accepted for presentation at the conference. From the presented papers, 14 of the best-revised ones have been selected for this volume. In addition, we were also pleased to receive three invited papers written by leading experts.
The selected revised and extended papers present original research results in a number of subject areas: information systems, requirements and ontology engineering; advanced database systems; internet of things; big data analysis; cognitive computing; applications and case studies. We hope that these results will contribute to the further development of this fast-growing field.
We would like to thank all authors who submitted papers for this book, as well as all participants of the Baltic DB&IS 2018 Conference. Our special thanks go to the invited speakers Prof. Janis Grundspenkis, Prof. Inguna Skadiņa, Prof. Marlon Dumas, Prof. Marko Bajec, and Dr. Milan Zdravković.
We thank all the members of the Programme Committee and the Steering Committee, especially Prof. Albertas Caplinkas, for their valuable support, as well as the additional referees. We would also like to acknowledge our supporters and partners, Algoritmų Sistemos Ltd., the official development agency of the City of Vilnius “Go Vilnius”, and the IEEE, the IEEE Lithuanian Section. Without their cooperation we would never have achieved the success of this conference. Moreover, we would like to express our deep appreciation for the real help of the organizing team, especially Snieguolė Meškauskienė, Laima Paliulionienė and Jolanta Miliauskaitė, during the conference and its organization.
November 2018
Audrone Lupeikiene
Olegas Vasilecas
Gintautas Dzemyda
While for a smart city there is no generally accepted definition, there exists a wide consensus of what a smart city (or country) should bring to its citizens. It is believed that it such smart environment, all required conditions are met for a quality of life of individuals and communities and for sustainable economic growth. In most circumstances, this is associated with the availability of health and social security, the ease of use of public services, efficient mobility, security, job and business opportunities etc. How to achieve that remains a challenge of the 21st century. In this paper, we describe a program of R&D projects that were launched in late 2016 within the Slovenian Smart Specialization initiative as a response to the aforementioned challenge.
Today, when we are surrounded by different smart devices, life of our languages is very much influenced by technologies that support them in digital environment. Language technology solutions are particularly important for languages that are small in size. This paper aims to analyze representation of languages of Baltic countries – Estonian, Latvian and Lithuanian – in digital environment. We analyze technological challenges for these languages and most important achievements (recently created language resources and tools) that help to narrow technological gap with widely used languages, facilitate use of the natural language in interaction between computer and human, and minimize threat of digital extinction. Special attention is paid to the natural language understanding task, machine translation and speech technologies.
The paper summarizes the experience obtained during more than a decade of research on intelligent tutoring systems (ITSs) and, in particular, on their integral part – the knowledge assessment systems. Special attention is paid to challenges and issues of automation of knowledge assessment. Possibilities of automation, a teacher's workload, objectivity of comparison with standard, and the assessed knowledge level in accordance with Bloom's taxonomy are chosen for selection of appropriate format for students' submitted answers and/or solutions. The focus of the paper is on motivation to use concept maps (CMs) as knowledge assessment tool. Advantages of CMs are discussed and the basic conceptions for the developed adaptive intelligent knowledge assessment system IKAS are presented. The short overview of IKAS highlights the novel theoretical solutions that are implemented in the system. Lessons learnt from the practical usage of IKAS in seventeen different study courses are used to define the unsolved challenging problems and open questions for future work.
In highly networked systems, such as Industry 4.0, it is essential to take care that only trustworthy elements participate in the network, otherwise the security of the system might be compromised and its functionality negatively influenced. Therefore it is important to identify whether the nodes in the network can be trusted by other elements of the system. The Industry 4.0 involves both human and artificial participants and imposes human-human, artefact-artifact; and human-artefact relationships in the system. This requires comparable interpretation and representation of trust in several areas. For this purpose the paper discusses three trust interpretations (in social sciences, in information systems, and distributed ad-hoc networks) and respective multidimensional trust models; and shows how the elements of the of these trust models can be integrated in the trust handling framework for in Industry 4.0.
Security in the Internet of Things (IoT) systems is an important topic. In the previous study we have presented a reference model for security risk management in the IoT systems. In this study we analyse how it can be applied. Specifically we consider an example of the connected vehicle and illustrate how the reference model could help discovering and explaining security vulnerabilities, defining security risks, and introducing security countermeasures.
In Industry 4.0 large-scale discrete event systems (DES) are becoming increasingly dependent on internet of things (IoT) based real-time distributed supervisory control systems. In order to meet the IoT growing demand, new technologies and skills are being developed which require an automated systematic testing to achieve success in adoption. Testing such systems requires an integration of computation, communication and control in the test architecture. This may pose number of issues that may not be suitably addressed by traditional centralized test architectures. In this paper, a distributed test framework for testing distributed real-time systems is presented, where online monitors (executable code as annotations) are integrated to systems to record relevant events. The proposed test architecture is more scalable than centralized architectures in the sense of timing constraints and geographical distribution. By assuming the existence of a coverage correct centralized remote tester, we give a partitioning algorithm of it to produce distributed local testers which enables to meet more flexible performance constraints while preserving the remote tester's functionality. The proposed approach not only preserves the correctness of the centralized tester but also allows to meet stronger timing constraints for solving test controllability and observability issues. The effectiveness of the proposed architecture is demonstrated by an illustrative example.
A system-to-system communication involving stateful sessions between a clustered service provider and a service consumer is investigated in this paper. An algorithm allowing to decrease a number of calls to failed provider nodes is proposed. It is designed for a clustered client and is based on an asynchronous communication. A formal specification of the algorithm is formulated in the TLA+ language and was used to investigate the correctness of the algorithm. An agent-based model was constructed and used to evaluate effectiveness of the proposed algorithm by performing simulations.
Big data technologies are rapidly gaining popularity and become widely used, thus, making the choice of developing methodologies including the approaches for requirements analysis more acute. There is a position that in the context of the Data Warehousing (DW), similar to other Decision Support Systems (DSS) technologies, defining information requirements (IR) can increase the chances of the project to be successful with its goals achieved. This way, it is important to examine this subject in the context of Big data due to the lack of research in the field of Big data requirements analysis. This paper gives an overview and evaluation of the existing methods for requirements analysis in Big data projects. In addition, we explore solutions on how to (semi-) automate requirements engineering phases, and reason about applying Natural Language Processing (NLP) for generating potentially useful and previously unstated information requirements.
It has been demonstrated in a number of studies that color preferences, emotions and associations are interlinked in various ways (e.g., [1, 2, 3, 4, 5]). To a lesser extent it is also known that color preferences are task-dependent. In our study we apply rating tasks in an in-group quasi-experimental setting (150 participants) to test dependence of color ratings on different types of e-services (or interface types) in Latvia. According to our results, color ratings are highly dependent on the type of the e-service but to a different degree. There are colors that are highly variable across the interface types (e.g., red, yellow or purple); there are colors that are rated relatively positive (green, blue) and relatively negative (black, pink) in all types of interface but to a different degree. Finally, we might generalize our findings to underline the task-dependency of color ratings and color categorization in general.
Nowadays, customers require customized products instead of standard offers and in many cases prefer web shops and market places for purchasing goods and services. Therefore, intelligent product configuration tools are great value for many companies. One of the main components of these tools is a configuration model. In order to integrate the configuration model to the semantic web new approaches to the representation of product configuration knowledge are needed. In this paper, we present a layered ontology modelling framework for building semantic product configuration models and provide the corresponding methodological approach to product configuration. The approach enables building distributed product configurators that use semantic configuration models in the form of web ontologies and Shapes Constraint Language (SHACL) to validate integrity constraints of individual configurations. SHACL is also used to query the model and to represent additional constraints required by different applications. The provided method is demonstrated on the basis of the real case study from timber industry that specifically covers thermally modified timber (sawn wood) products.
The glyphs of the Japanese writing system mainly consist of Chinese characters, and there are tens of thousands of such characters. Because of the amount of characters involved, glyph database creation and character representation in general on computer systems has been the focus of numerous researches and various software systems. Character information is usually represented in a computer system by an encoding. Some encodings target specifically Chinese characters: this is the case for instance of Big-5 and Shift-JIS. Tere are also encodings that aim at covering several, possibly all, writing systems: this the case for instance of Unicode. However, whichever the solution adopted, a significant part of Chinese characters remain uncovered by the current encoding methods. Thanks to the properties and relations featured by Chinese characters, they can be classified into a database with respect to various attributes. First, the formal structure of such a database is described in this paper as a character encoding, thus addressing the character representation issue. Importantly, we show that the proposed logical structure overcome the limitations of existing encodings, most notably the glyph number restriction and the lack of coherency in the code. This theoretical proposal will then be followed by the practical realisation of the proposed database and the visualisation of the corresponding code structure. Finally, an additional experiment is conducted to measure the memory size overhead that is induced by the proposed encoding, comparing with the memory size required by an implementation of Unicode. Once the files are compressed, the memory size overhead is significantly reduced.
We address the task of migrating standalone model-based applications to the web, where we face the need to synchronize models between the client and the server. Because of synchronization overhead and limited server memory that has to be shared among all connected users, there is the risk that the model storage could become a bottle-neck. We propose a model repository that uses an efficient encoding of the model that resembles its Kolmogorov complexity and is suitable for direct sending over the network (with almost no serialization overhead). All traverse and query operations can also be implemented efficiently by introducing just 3 automatic indexes. By utilizing the OS paging mechanism, we are able to hold 10,000 and more repositories on a single server.
Hadoop is a Java-based open source programming framework, which supports the processing and storage of large volumes of data sets in a distributed computing environment. On the other hand, an overwhelming majority of organizations are moving their big data processing and storing to the cloud to take advantage of cost reduction – the cloud eliminates the need for investing heavily in infrastructures, which may or may not be used by organizations. This paper shows how organizations can alleviate some of the obstacles faced when trying to make Hadoop run in the cloud.
In clustering of unequal-length time series, how to deal with the unequal lengths is a crucial step. In this paper, the given unequal-length clustering problem is first changed into several equal-length clustering sub-problems by dividing the given group of unequal-length time series into some groups of equal-length subsequences. For each sub-problem, the standard fuzzy c-means algorithm can give the clustering result represented by a partition matrix and cluster centers. In order to obtain the final clustering result, horizontal collaborative fuzzy clustering algorithm is employed to fuse the clustering results of the sub-problems. In horizontal collaborative fuzzy clustering algorithm, the collaborative knowledge is transmitted by partition matrixes whose sizes should be the same to the final partition matrix. But in the scenario here, the obtained partition matrixes most often have different sizes, thus we cannot directly use the horizontal collaborative fuzzy clustering algorithm. This paper here presents two new manners for extending the partition matrixes to have same size to the final partition matrix. In the first new manner, each added element in the extended partition matrix is the element in the same position of the extending matrix. In the second new manner, each added column is same to the corresponding column of the extending matrix; while each added element in the pre-existing column is set to be 0. The main difference between the two manners is that the normalization condition does not hold for some columns in the first new manner. Thus, normalization should be made for those columns. Meanwhile, this paper investigates the selection of the extending matrix which is crucial in the two new extending manners. Both the two new extending manners can make the partition knowledge be effectively transmitted and thus assume the proposed clustering algorithms good clustering results. Experiments showed the effectiveness of the proposed manners.
This paper proposes the Prefix-Root-Postfix-Encoding (PRPE) algorithm, which performs close-to-morphological segmentation of words as part of text pre-processing in machine translation. PRPE is a cross-language algorithm requiring only minor tweaking to adapt it for any particular language, a property which makes it potentially useful for morphologically rich languages with no morphological analysers available. As a key part of the proposed algorithm we introduce the ‘Root alignment’ principle to extract potential sub-words from a corpus, as well as a special technique for constructing words from potential sub-words. In addition, we supplemented the algorithm with specific processing for named-entities based on transliteration. We conducted experiments with two different neural machine translation systems, training them on parallel corpora for English-Latvian and Latvian-English translation. Evaluation of translation quality showed improvements in BLEU scores when the data were pre-processed using the proposed algorithm, compared to a couple of baseline word segmentation algorithms. Although we were able to demonstrate improvements in both translation directions and for both NMT systems, they were relatively minor, and our experiments show that machine translation with inflected languages remains challenging, especially with translation direction towards a highly inflected language.
Stock prediction has always been attractive area for researchers and investors since the financial gains can be substantial. However, stock prediction can be a challenging task since stocks are influenced by a multitude of factors whose influence vary rapidly through time. This paper proposes a novel approach (Word2Vec) for stock trend prediction combining NLP and Japanese candlesticks. First, we create a simple language of Japanese candlesticks from the source OHLC data. Then, sentences of words are used to train the NLP Word2Vec model where training data classification also takes into account trading commissions. Finally, the model is used to predict trading actions. The proposed approach was compared to three trading models Buy & Hold, MA and MACD according to the yield achieved. We first evaluated Word2Vec on three shares of Apple, Microsoft and Coca-Cola where it outperformed the comparative models. Next we evaluated Word2Vec on stocks from Russell Top 50 Index where our Word2Vec method was also very successful in test phase and only fall behind the Buy & Hold method in validation phase. Word2Vec achieved positive results in all scenarios while the average yields of MA and MACD were still lower compared to Word2Vec.
Most of the systems that rely on the solution of shortest path problem or constrained shortest demand real-time response to unexpected real world events that affect the input graph of the problem such as car accidents, road repair works or simply dense traffic. We developed new incremental algorithm that uses data already present in the system in order to quickly update a solution under new conditions. We conducted experiments on real data sets represented by road graphs of the cities of Oldenburg and San Joaquin. We test the algorithm against that of Muhandiramge and Boland [1] and show that it provides up to 50% decrease in computation time compared to solving the problem from scratch.