Ebook: Thirteenth Scandinavian Conference on Artificial Intelligence
Artificial intelligence (AI) has featured widely in the news recently. It is vital to the continued development of computer science and informatics, and is indispensable for the effective functioning of a multitude of systems in fields such as medicine, economics, linguistics, philosophy, psychology and logical analysis, as well as industry.
This book presents the proceedings of the 13th Scandinavian Conference on Artificial Intelligence (SCAI 2015), held in Halmstad, Sweden, in November 2015. SCAI is the main biennial conference for the AI research communities of Scandinavia, but also attracts the attendance of a wide range of international participants. The book features 17 accepted papers from the conference as well as extended abstracts describing the work of six Ph.D. students who presented their research-in-progress to a panel of experts in the doctoral symposium which forms part of the conference. A wide range of topics are covered, including machine learning, data mining, logical reasoning, robotics and planning, and the papers included here focus on both the theory and practical applications of AI.
The book will be of interest to all those wishing to keep abreast of the latest developments in the field of AI.
The Thirteenth Scandinavian Conference on Artificial Intelligence is held in Halmstad, Sweden, on the 4th and 5th of November 2015. The conference is organised by Halmstad University together with SAIS, the Swedish Artificial Intelligence Society. SCAI is the main biennial conference for the AI research communities of Scandinavia. The first SCAI conference was held in 1988 in Tromsø, Norway. Although its primary function is to assemble researchers from the Nordic countries, it has constantly attracted a rather international participation.
This year's program consists of a Doctoral Symposium, two workshops (Intelligent and Connected Vehicles and Intelligent Environments Supporting Health and Wellbeing) and the main conference, with seventeen regular paper presentations and two invited speakers (Christine Chevallereau and Christopher Nugent), as well as the annual meeting of the Swedish AI Society (SAIS). The contributions submitted for presentation at SCAI 2015 cover a wide range of topics, including machine learning, data mining, logical reasoning, robotics, planning, and more. Both papers focusing on the theory, as well as those presenting applications of AI, are well represented in the selection.
This year we feature the second edition of SCAI Doctoral Symposium, where PhD students have an opportunity to present their research-in-progress to a panel of experts. Extended abstracts of these presentations are included in these proceedings.
The conference organisers and editors of this volume would like to thank all the authors who submitted their papers to the conference, as well as the members of the program committee for their work in evaluating those submissions. The PC members ensured that each contribution was reviewed by at least two, and in most cases by at least three, competent referees.
Sławomir Nowaczyk
September 2015
Halmstad
Electroencephalogram (EEG), measures the neural activity of the central nervous system, which is widely used in diagnosing brain activity and therefore plays a vital role in clinical and Brain-Computer Interface application. However, analysis of EEG signal is often complex since the signal recoding often contaminates with noises or artifacts such as ocular and muscle artifacts, which could mislead the diagnosis result. Therefore, to identify the artifacts from the EEG signal and handle it in a proper way is becoming an important and interesting research area. This paper presents an automated EEG artifacts handling approach, where it combines Independent Component Analysis (ICA) with a 2nd order clustering approach. Here, the 2nd order clustering approach combines the Hierarchical and Gaussian Picture Model clustering algorithm. The effectiveness of the proposed approach has been examined and observed on the real EEG recording. According to the result, artifacts in the EEG signals are identified and removed successfully where the artifacts handled EEG signals show acceptable considering visual inspection.
In this paper we perform an exploratory analysis of a financial data set from a Spanish bank. Our goal is to do risk prediction in credit operations, and as data is collected continuously and reported on a monthly basis, this gives rise to a streaming data classification problem. Our analysis reveals some practical problems that have not previously been thoroughly analyzed in the context of streaming data analysis: the class labels are not immediately available and the relevant predictive features and entities under study (in this case the set of customers) may vary over time. In order to address these problems, we propose to use a dynamic classifier with a wrapper feature subset selection to find relevant features at different time steps. The proposed model is a special case of a more general framework that can also accommodate more expressive models containing latent variables as well as more sophisticated feature selection schemes.
In machine learning we are often faced with the problem of incomplete data, which can lead to lower predictive accuracies in both feature-based and relational machine learning. It is therefore important to develop techniques to compensate for incomplete data. In inductive logic programming (ILP) incomplete data can be in the form of missing values or missing predicates. In this paper, we investigate whether an ILP learner can compensate for missing background predicates through predicate invention. We conduct experiments on two datasets in which we progressively remove predicates from the background knowledge whilst measuring the predictive accuracy of three ILP learners with differing levels of predicate invention. The experimental results show that as the number of background predicates decreases, an ILP learner which performs predicate invention has higher predictive accuracies than the learners which do not perform predicate invention, suggesting that predicate invention can compensate for incomplete background knowledge.
Given multiple user-input rank lists, rank aggregation or combining the rankings to obtain a consensus (joint ordering) provides an interesting and classical domain of research, pertinent to applications across information retrieval, natural language processing, web search, etc. Efficient computation of such joint ranking poses a challenging task as optimal rank aggregation based on the Kemeny measure has been shown to be NP-hard.
This paper proposes a novel rank aggregation framework, CRAAR, incorporating a linear combination of the input rank lists, based on user groups, exhibiting similar ranking preferences, obtained via unsupervised hierarchical clustering. To this end, we also present the Accordance Ratio as a measure to capture the inter-user preference similarity. Extensive experiments on real datasets show an improved performance of our approach (based on optimal Kemeny ranking) over state-of-the-art, thereby better capturing the preference of the majority.
For an information security manager it can be a daunting task to keep up and assess which new cyber vulnerabilities to prioritize patching first. Every day numerous new vulnerabilities and exploits are reported for a wide variety of different software configurations. We use machine learning to make automatic predictions for unseen vulnerabilities based on previous exploit patterns. As sources for historic vulnerability data, we use the National Vulnerability Database (NVD) and the Exploit Database (EDB). Our work shows that common words from the vulnerability descriptions, external references, and vendor products, are the most important features to consider. Common Vulnerability Scoring System (CVSS) scores and categorical parameters, and Common Weakness Enumeration (CWE) numbers are redundant when a large number of common words are used, since this information is often contained within the vulnerability description. Using machine learning algorithms, it is possible to get a prediction accuracy of 83% for binary classification. In comparison, the performance differences between some of the algorithms are marginal with respect to metrics such as accuracy, precision, and recall. The best classifier with respect to both performance metrics and execution time is a linear time Support Vector Machine (SVM) algorithm. We conclude that in order to get better predictions the data quality must be enhanced.
In the automotive industry, cost effective methods for predictive maintenance are increasingly in demand. The traditional approach for developing diagnostic methods on commercial vehicles is heavily based on knowledge of human experts, and thus it does not scale well to modern vehicles with many components and subsystems.
In previous work we have presented a generic self-organising approach called COSMO that can detectin an unsupervised mannermany different faults. In a study based on a commercial fleet of 19 buses operating in Kungsbacka, we have been able to predict, for example, fifty percent of the compressors that break down on the road, in many cases weeks before the failure.
In this paper we compare those results with a state of the art approach currently used in the industry, and we investigate how features suggested by experts for detecting compressor failures can be incorporated into the COSMO method. We perform several experiments, using both real and synthetic data, to identify issues that need to be considered to improve the accuracy. The final results show that the COSMO method outperforms the expert method.
In real-time strategy games players make decisions and control their units simultaneously. Players are required to make decisions under time pressure and should be able to control multiple units at once in order to be successful. We present the design and implementation of a multi-agent interface for the real-time strategy game STARCRAFT: BROOD WAR. This makes it possible to build agents that control each of the units in a game. We make use of the Environment Interface Standard, thus enabling different agent programming languages to use our interface, and we show how agents can control the units in the game in the Jason and GOAL agent programming languages.
Building complex systems such as autonomous robots usually require the integration of a wide variety of components including high-level reasoning functionalities. One important challenge is integrating the information in a system by setting up the data flow between the components. This paper extends our earlier work on semantic matching with support for adaptive on-demand semantic information integration based on ontology-based introspection. We take two important standpoints. First, we consider streams of information, to handle the fact that information often becomes continually and incrementally available. Second, we explicitly represent the semantics of the components and the information that can be provided by them in an ontology. Based on the ontology our custom-made stream configuration planner automatically sets up the stream processing needed to generate the streams of information requested. Furthermore, subscribers are notified when properties of a stream changes, which allows them to adapt accordingly. Since the ontology represents both the systems information about the world and its internal stream processing many other powerful forms of introspection are also made possible. The proposed semantic matching functionality is part of the DyKnow stream reasoning framework and has been integrated in the Robot Operating System (ROS).
This study introduces the conformal prediction framework to the task of predicting the presence of adverse drug events in electronic health records with an associated measure of statistically valid confidence. The imbalanced nature of the problem was addressed both by evaluating different machine learning algorithms, and by comparing different types of conformal predictors. A novel solution was also evaluated, where different underlying models, each model optimized towards one particular class, were combined into a single conformal predictor. This novel solution proved to be superior to previously existing approaches.
Given a simple graph G=(V,E), the problem dealt with in this paper ask to transform a graph G by, only removing a minimum number of edges, into a disjoint union of cliques. This optimization problem is known to be NP-hard and is referred to as the Cluster Deletion problem (CD). This observation has motivated us to improve the chance for finding a disjoint union of cliques in a limited amount of search. To this end, we propose an encoding of CD in terms of a Weighted Constraint Satisfaction Problem (WCSP), a framework which has been widely used in solving hard combinatorial problems.
We compare our approach with a fixed-parameter tractability FPT algorithm, which is one of the most used algorithm for solving the Cluster Deletion problem. Then, we experimentally show that the best results are obtained using the new encoding. We report a comparison of the quality and running times of WCSP and FPT algorithms on both random graphs and protein similarity graphs derived from the COG dataset.
Today research is going on within different essential functions need to bring automatic vehicles to the roads. However, there will be manual driven vehicles for many years before it is fully automated vehicles on roads. In complex situations, automated vehicles will need human assistance for long time. So, for road safety driver monitoring is even more important in the context of autonomous vehicle to keep the driver alert and awake. But, limited effort has been done in total integration between automatic vehicle and human driver. Therefore, human drivers need to be monitored and able to take over control within short notice. This papers provides an overview on autonomous vehicles and un-obstructive driver monitoring approaches that can be implemented in future autonomous vehicles to monitor driver e.g., to diagnose and predict stress, fatigue etc. in semi-automated vehicles.
A set of novel time-domain features characterizing multi-channel surface EMG (sEMG) signals of six muscles (rectus femoris, vastus lateralis, and semitendinosus of each leg) is proposed for prediction of physiological parameters considered important in cycling: blood lactate concentration and oxygen uptake. Fifty-one different features, including phase shifts between muscles, active time percentages, sEMG amplitudes, as well as symmetry measures between both legs, were defined from sEMG data and used to train linear and random forest models. The random forests models achieved the coefficient of determination R2=0.962 (lactate) and R2=0.980 (oxygen). The linear models were less accurate. Feature pruning applied enabled creating accurate random forest models (R2>0.9) using as few as 7 (lactate) or 4 (oxygen) time-domain features. sEMG amplitude was important for both types of models. Models to predict lactate also relied on measurements describing interaction between front and back muscles, while models to predict oxygen uptake relied on front muscles only, but also included interactions between the two legs.
Cloud computing has recently drawn much attention due to the benefits that it can provide in terms of high performance and parallel computing. However, many industrial applications require certain quality of services that need efficient resource management of the cloud infrastructure to be suitable for industrial applications. In this paper, we focus mainly on the services, usually executed within virtual machines, allocation problem in the cloud network. To meet the quality of service requirements we investigate different algorithms that can achieve load balancing which may require migrating virtual machines from one node/server to another during runtime and considering both CPU and communication resources. Three different allocation algorithms based on Genetic Algorithm (GA), Particle Swarm Optimization (PSO) and Best-fit heuristic algorithm are applied in this paper. We evaluate the three algorithms in terms of cost/objective function and calculation time. In addition, we explore how tuning different parameters (including population size, probability of mutation and probability of crossover) can affect the cost/objective function in GA. Depending on the evaluation, it is concluded that algorithm performance is dependent on the circumstances i.e. available resource, number of VMs etc.
In the English-speaking world, the idea of human-robot interaction in natural language has been well established. The tools for other languages are lacking, more specifically, Scandinavian languages are not supported by robot programming environments. The RobotLab at Lund University has a programming environment with English natural language programming. In this paper a module for Swedish natural language programming is presented. Program statements for force-based assembly tasks for an industrial robot are extracted from unstructured Swedish text. The goal is to create action sequences with motion and force constraints for the robot. The method produces tuples with actions and objects and uses the dependency relations to find nested temporal conditions.
Learning to recognize common activities such as traffic activities and robot behavior is an important and challenging problem related both to AI and robotics. We propose an unsupervised approach that takes streams of observations of objects as input and learns a probabilistic representation of the observed spatio-temporal activities and their causal relations. The dynamics of the activities are modeled using sparse Gaussian processes and their causal relations using a probabilistic graph. The learned model supports in limited form both estimating the most likely current activity and predicting the most likely future activities. The framework is evaluated by learning activities in a simulated traffic monitoring application and by learning the flight patterns of an autonomous quadcopter.
Conditioning on some set of confounders that causally affect both treatment and outcome variables can be sufficient for eliminating bias introduced by all such confounders when estimating causal effect of the treatment on the outcome from observational data. It is done by including them in propensity score model in so-called potential outcome framework for causal inference whereas in causal graphical modeling framework usual conditioning on them is done. However in the former framework, it is confusing when modeler finds a variable that is noncausally associated with both the treatment and the outcome. Some argue that such variables should also be included in the analysis for removing bias. But others argue that they introduce no bias so they should be excluded and conditioning on them introduces spurious dependence between the treatment and the outcome, thus resulting extra bias in the estimation. We show that there may be errors in both the arguments in different contexts. When such a variable is found neither of the actions may give the correct causal effect estimate. Selecting one action over the other is needed in order to be less wrong. We discuss how to select the better action.
Recent research has been carried out along the line of multiple kernel methods for data fusion in self organizing maps and clustering, a vector quantization and a dimensionality reduction technique. However, the fixed grid size of Self-Organizing-Maps (SOM) knowing the nature of data in advance. This hinders the capabilities of SOM in various aspects (e.g. limits its applicability in complex data sets). This paper explores adaptation of multiple kernel methods for dynamically growing SOM, to adapt to variations in the distribution of the data. The primary focus has been application of kernel methods in the form of similarity measures while using the kernels in an optimal combination. This approach has been applied in the domain of road traffic visual information analysis. Several application specific video feature extraction techniques have been explored, based on recent research, to preprocess the video data in order to make it usable with the core algorithms. The inability to capture more useful knowledge from individual sources has been shown and successfully addressed by means of the fusion with multiple kernels. Individual performance of similar techniques in clustering are compared with the performance of this novel fusion approach. The experimental results convey the major improvements expected from the data fusion, including novelty detection and stability of results. Moreover, it signifies the necessity of heterogeneous data fusion in unsupervised learning to leverage complex real world data.
We consider the problem of finding collision-free trajectories for a fleet of automated guided vehicles (AGVs) working in ship ports and freight terminals. Our solution computes collision-free trajectories for a fleet of AGVs to pick up one or more containers and transport it to a given goal without colliding with other AGVs and obstacles. We propose an integrated framework for solving the goal assignment and trajectory planning problem minimizing the maximum cost over all vehicle trajectories using the classical Hungarian algorithm. To deal with the dynamics in the environment, we refine our final trajectories with CHOMP (Covariant Hamiltonian optimization for motion planning) in order to trade off between path smoothness and dynamic obstacle avoidance.
My research area is centred on AI planning, with the current work focused on optimising low quality plans generated by existing planners. Since the beginning of my PhD studies, I have concentrated on studying existing plan-optimisation techniques. Efforts have been made to compare, evaluate and test them on several benchmarks domains and planning engines. From my work, I have concluded that adapting different techniques to work together is pivotal to establishing an efficient technique. Besides this, another very important aspect to study was the theoretical properties of these techniques, including completeness and complexity.