Ebook: Applications of Intelligent Systems
The deployment of intelligent systems to tackle complex processes is now commonplace in many fields from medicine and agriculture to industry and tourism.
This book presents scientific contributions from the 1st International Conference on Applications of Intelligent Systems (APPIS 2018) held at the Museo Elder in Las Palmas de Gran Canaria, Spain, from 10 to 12 January 2018. The aim of APPIS 2018 was to bring together scientists working on the development of intelligent computer systems and methods for machine learning, artificial intelligence, pattern recognition, and related techniques with an emphasis on their application to various problems.
The 34 peer-reviewed papers included here cover an extraordinarily wide variety of topics – everything from semi-supervised learning to matching electro-chemical sensor information with human odor perception – but what they all have in common is the design and application of intelligent systems and their role in tackling diverse and complex challenges.
The book will be of particular interest to all those involved in the development and application of intelligent systems.
This book contains scientific contributions presented at the 1st International Conference on Applications of Intelligent Systems, APPIS 2018, held at the Museo Elder in Las Palmas de Gran Canaria, Spain, on 10–12 January 2018.
The aim of APPIS 2018 was to bring together those scientists who develop intelligent computer systems and methods for machine learning, artificial intelligence, pattern recognition and related techniques with an emphasis on their application to various problems.
APPIS 2018 featured three plenary lectures by the invited speakers: Petia Radeva from the Universitat de Barcelona, Michael Biehl from the University of Groningen, and Theo Gevers from the University of Amsterdam.
We would like to thank the members of the International Technical Program Committee, who provided timely and thorough reviews of the submitted papers and guaranteed that only high-quality contributions were selected to be included in these proceedings. We would also thank the University of Groningen, the University of Las Palmas de Gran Canaria and the Gran Canaria Tourism Bureau for their sponsorship.
We are very grateful to the administration of the Museo Elder of Science and Technology, especially to the director Mr. José Gilberto Moreno García and Mrs. Arantxa Rodríguez Quintana for making this unique venue available for APPIS 2018, and also for their help with local arrangements.
Carlos M. Travieso-González
(co-chairs of APPIS 2018 and co-editors of the proceedings)
Domestic pigs vary in the age at which they reach slaughter weight even under the controlled conditions of modern pig farming. Early and accurate estimates of when a pig will reach slaughter weight can lead to logistic efficiency in farms. In this study, we compare four methods in predicting the age at which a pig reaches slaughter weight (120 kg). Namely, we compare the following regression tree-based ensemble methods: random forest (RF), extremely randomized trees (ET), gradient boosted machines (GBM), and XGBoost. Data from 32979 pigs is used, comprising a combination of phenotypic features and estimated breeding values (EBV). We found that the boosting ensemble methods, GBM and XGBoost, achieve lower prediction errors than the parallel ensembles methods, RF and ET. On the other hand, RF and ET have fewer parameters to tune, and perform adequately well with default parameter settings.
The structure of the vessel tree in a retinal fundus image has been demonstrated to be a valid biometric feature that can be used for person identification. Most of the existing methods in this field rely on vessel segmentation algorithms, which may be computationally intensive and may suffer from insufficient robustness to noisy images and to images with pathologies. We propose a method that matches the spatial arrangement of five bifurcations between two retinal fundus images. It does not involve vessel segmentation and as a result it is more efficient and more robust to noisy images. The method that we propose uses a hierarchical approach of trainable COSFIRE filters. The bottom layer consists of bifurcation-selective COSFIRE filters and the top layer models the spatial arrangement of the concerned bifurcations. We demonstrate the effectiveness of our approach on the benchmark Retinal Identification Database (RIDB) and the VARIA data set where we obtained an accuracy score of 100% on both sets. The proposed method does not rely on domain knowledge and thus it can be adapted to other vision-based biometric systems, such as fingerprint and palmprint recognition.
The main purpose of Automatic Speech Recognition systems consists of converting audio signals into text sequences in a reliable way. Most of these recent systems have been developed on English language. In the case of other languages such as Spanish, the development of these techniques is not sufficiently advanced due to the lack of properly transcriptions. As consequence, the implementation of a Spanish speech-to-text system arises as a complex and time-consuming task. Nevertheless, semi-supervised learning approaches can be suitable when low-amount of data supposes a barrier for building precise automatic speech recognition methods. Among the most remarkable approaches for solving the speech-to-text task, Deep Neural Networks have obtained outstanding results thanks to their ability for generalizing with smaller number of parameters in comparison to classical methods such as Gaussian Mixture Models. In this contribution, we propose a Spanish Automatic Speech Recognition system based on a semi-supervised learning approach, which uses Deep Neural Networks for codifying sounds into sequences of words. Hence, a Neural Network is trained in order to obtain an acoustic model. In the test phase, the sentences with the lowest error, following the Word Error Rate and the Fuzzy Match Score metrics, are included into the initial training dataset for re-training the acoustic model. Moreover, our proposal has been compared to a Gaussian Mixture Models-based approach. An improvement of 5% relative Word Error Rate was obtained. These results suggest that our technique obtains promising results by supporting in the task of building an Automatic Speech Recognition System when audio-transcription resources are scarce.
In this paper, we present a hybrid algorithm for the reconstruction of compressively sensed signals sparse in the Hermite transform (HT) domain. The aim is to combine the advantages of two reconstruction approaches: the gradient algorithm well-known for its wide range applicability, and the recently proposed highly efficient Hermite coefficient tresholding algorithm. The later is based on the theoretically derived signal support detection threshold, which takes into account the specific properties of the observed sparsity domain. Reducing the compressive sensing noise level and increasing the component coefficients values with a partial time-domain reconstruction, the gradient algorithm prepares the signal for the tresholding procedure.
Super-resolution (SR) methodologies allow the construction of high resolution images from several noisy low-resolution images. This methodology can be applied to overcome the inherent resolution limitation and improve the performance in digital imaging of scanning transmission electron microscopes (STEM). Here we apply SR to column resolved images of a monocrystalline GaAs/InAs quantum dot covered by a GaAsSb capping layer. Series of 8 images with a dwell time of 1s were taken to apply SR. The improvement of the SR images has been evaluated in terms of SNR and spatial reliability by comparison of images captured with a dwell time of 8s. The results indicate that the SR technique emerges as a remarkable and powerful tool for the improvement of the quality of the STEM imaging.
Sampled values of volumetric data are expressed as three-way array data, which is expressed as tensors. We are required to process and analyse volumetric data without embedding into higher-dimensional vector space from the viewpoints of object oriented data analysis. Multi-way forms of volumetric data require quantitative methods for the discrimination of multi-way forms. Therefore, we define a distance metric for subspaces of multi-way data arrays using transportation between the Stiefel manifolds constructed by subspaces of multi-way data arrays.
We extend the histogram of oriented gradient method to three-channel images which are captured by commercial RGB cameras. Firstly, we reformulate the oriented gradients (HoG) method from the viewpoints of gradient-based image pattern recognition and the directional statistics. Secondly, we develop operations for unification of three directional histograms constructed in channels of colour images.
Finding interesting stellar structures inside large astronomical datasets is a challenging task. Depending on the type of data different techniques are needed to tackle this challenge. In this paper, we focus on the GAIA survey DR1, which provides the sky positions and G-band magnitudes for more than 109 stars. We compare the use of the magnitude distribution (or luminosity function) and Difference of Gaussian filtering extracted from small patches of the sky to detect globular clusters (GC). By using a Nearest Neighbor Retrieval strategy we find windows which depict similar magnitude distribution or filter responses compared to the ones extracted from known GCs. Our first results show that the Difference of Gaussian filters are advantageous to find spherical structures such as GC if only limited information, such as sky position and G-band magnitudes, are available.
Current progress in technology enables high-level automatic analysis of human motion. Next to action and gesture recognition, sport analysis is one of active research subjects in this area. In many disciplines athletes benefit from introducing technology as a support of the training process. In this work we present a novel system supporting weapon practice in fencing. Active markers mounted on the weapon as well as first-person perspective camera are employed, in order to allow for accurate blade trajectories and rotation tracking. With the help of fencing experts, data are collected to train models of how specific actions should be performed. Based on these models our system is able to assess how accurately given action is performed by a practising person. Feedback is delivered in real-time via visualization of trajectories as well as by numerical measures. The proposed system was evaluated in a number of experiments as well as by fencers and a fencing coach.
This article describes the tasks and first results of the work package “Manipulator and Control” of the EU project Trimbot2020. This project develops a mobile robot for outdoor hedge, rose and bush trimming. The Kinova Jaco
Visual search of relevant targets in the environment is a crucial robot skill. We propose a preliminary framework for the execution monitor of a robot task, taking care of the robot attitude to visually searching the environment for targets involved in the task. Visual search is also relevant to recover from a failure. The framework exploits deep reinforcement learning to acquire a common sense scene structure and it takes advantage of a deep convolutional network to detect objects and relevant relations holding between them. The framework builds on these methods to introduce a vision-based execution monitoring, which uses classical planning as a backbone for task execution. Experiments show that with the proposed vision-based execution monitor the robot can complete simple tasks and can recover from failures in autonomy.
This work deals with the problem of gas source localization by a mobile robot with gas and wind sensing capabilities. Particularly, we address the problem for the case of indoor environments where the presence of obstacles and the complex structure provoke the chaotic dispersion of the gases. Under these challenging conditions where traditional approaches based on mathematical modeling of the plume cannot be applied, we propose the use of numerical methods to solve the gas dispersion and its exploitation in a probabilistic formulation to estimate the likelihood of the gas source location from a set of sparse observations. We validate our approach with a simulated set of experiments in an office-like environment composed of multiple connected rooms. Two search strategies are compared (active and passive) demonstrating the suitability of our approach to infer the location of the source even when the robot is not actively searching for it.
Manufacturing industries are increasingly adopting data-driven, decision making systems towards the Industry 4.0 paradigm. In the context of this data revolution, the innovative SiMoDiM project aims at developing a smart predictive maintenance system for the stainless steel industry. In its first stage, it focuses on the assets within the hot rolling process, one of the core components involved in the manufacturing of steel sheets, and more specifically on the coiler drums of Steckel mills. These drums operate under mechanical and thermal stresses that degrade them, and their replacements directly impact the product valor chain. In this work we present the data analysis stage of SiMoDiM, where the huge amount of available historical and real-time data from the hot rolling process (collected by onboard sensors in the mills) are studied in order to find which variables and descriptors are valid indicators of the coiler drums' conditions. This analysis is the first step towards an intelligent system that takes advantage of such descriptions for performing a predictive maintenance of the machinery.
Smell maps are geo-localized representations of the odors present in an environment as perceived by humans. They provide a convenient mean to assess the smellscape of urban areas, determine regions with heavy impact on the population, and measure the reach of industrial emissions. However, their use is not widespread because they are laborious to generate and easily outdated, as they rely on in-place human annotations of the perceived smells. In this work we study the feasibility of automatizing the generation of smell maps by means of a wearable electronic nose (e-nose) as replacement for the human sense of smell; being our main objective the analysis of whether this technology can be employed to map the subjective information inherent in smells. We have collected to that end a dataset composed of more than 450 labeled samples of 10 different smells with a wearable e-nose, and performed a thorough comparison of several machine learning algorithms to evaluate their suitability for this task. As a second contribution, we present a smartphone application developed to record (in situ) e-nose measurements and GPS coordinates as well as the human perception of smells (using a form-based input method). Finally, we present an illustrative example with several automatically generated smell distribution maps and discuss their accuracy.
The human-computer interaction, as a field of study focusing on the design of computer technology, provides foundations for instructional design of activities performed in cyberspace. Educational context of the instructional designer's work requires adopting a specific approach based on learning theories. The aim of the article to is present a new approach to instructional design which is grounded in a theory of a Danish scientist, Knud Illeris, who distinguished three dimensions of learning (cognitive, emotional, and social). Due to specificity of performance in a virtual space, as well as the development of intelligent systems and virtual reality accessories, the author proposes to add another dimension, a psychomotor one. A new four-dimensional instructional design approach focuses on cognitive, emotional, social and psychomotor aspects of learning. This conceptual framework is universal - it allows for harmonious and well-thought-out planning of e-course components on various applications and devices. The foundations of the approach is based on preliminary research on a small number of responders. The results are promising and constitute an incentive for further research.
We develop a variational image registration method for a pair of images with different resolutions observed in the same imaging modality. Multiple modalities for medical imaging define the resolutions and qualities of images. For unified analysis of these multimodal images, an image registration method for a collection of images with different resolutions and qualities is required if the resolutions and qualities of the target and reference images are different.
Using Deep Reinforcement Learning (DRL) can be a promising approach to handle various tasks in the field of (simulated) autonomous driving. However, recent publications mainly consider learning in unusual driving environments. This paper presents Driving School for Autonomous Agents (DSA2), a software for validating DRL algorithms in more usual driving environments based on artificial and realistic road networks. We also present the results of applying DSA2 to handle the task of driving on a straight road while regulating the velocity of one vehicle according to different speed limits.
Text mining applications in the investment process involves a complex interaction between computational linguistics, natural language processing (NLP) and the know-how of the financial aspects. Given the progress in big data and multimodal data fusion, this state-of-the-art survey provides a timely consolidation of this ever evolving topic, together with new perspectives on the acquisition, input, variable relevance, feature extraction, fusion, and decision making based on a conjoint treatment of text and standard financial variables. Such an insight is then used as a basis to introduce an overarching framework for text-based big data in the investment process. The proposed approach is both novel and flexible, making it possible to be seamlessly employed across a variety of investable assets, including stocks, credit instruments, rates, FX and market indices. Another unique aspect is its modularity, whereby both emerging techniques in signal processing and machine learning, as well as traditional econometric techniques, are readily incorporated and combined towards the informed decision. Another virtue of the proposed concept is its ability to identify the semantic nature (context) of the source, even for general text-based sources (financial reports, social media, market news) while at the same time maintaining investors intuition, as news do affect asset prices and market moves. An example of a recent stock market performance during a company takeover process demonstrates the advantages of the proposed framework.
Visual Lifelogging is the process of keeping track of one's life through wearable cameras. The focus of this research is to automatically classify images, captured from a wearable camera, into indoor and outdoor scenes. The results of this classification may be used in several applications. For instance, one can quantify the time a person spends outdoors and indoors which may give insights about the psychology of the concerned person. We use transfer learning from two VGG convolutional neural networks (CNN), one that is pre-trained on the ImageNet data set and the other on the Places data set. We investigate two methods of combining features from the two pre-trained CNNs. We evaluate the performance on the new UBRug data set and the benchmark SUN397 data set and achieve accuracy rates of 98.24% and 97.06%, respectively. Features obtained from the ImageNet pretrained CNN turned out to be more effective than those obtained from the Places pre-trained CNN. Fusing the feature vectors obtained from these two CNNs is an effective way to improve the classification. In particular, the performance that we achieve on the SUN397 data set outperforms the state-of-the-art.
Wearable cameras capture a first-person view of the daily activities of the camera wearer, offering a visual diary of the user behaviour. Detection of appearance of people the camera user interacts with for social interactions analysis is of high interest. Generally speaking, social events, life-style and health are highly correlated, but there is a lack of tools to monitor and analyse them. We consider that egocentric vision provides a tool to obtain information and understand users social interactions. We propose a model that enables us to evaluate and visualize social traits obtained by analysing social interactions appearance within egocentric photostreams. Given sets of egocentric images, we detect the appearance of faces within the days of the camera wearer, and rely on clustering algorithms to group their feature descriptors in order to re-identify persons. Recurrence of detected faces within photostreams allows us to shape an idea of the social pattern of behaviour of the user. We validated our model over several weeks recorded by different camera wearers. Our findings indicate that social profiles are potentially useful for social behaviour interpretation.
There is an increasing interest in using adaptive technologies in cultural heritage field to personalize and enhance the user's visit experience. However, personalizing the cultural experience is a challenging task that requires a deep knowledge of those user's aspects that influence the visit, such as some physical, socio-cultural, and educational aspects. In other words, to be effective and to facilitate the learning process during the visit, personalized systems should consider differences between users. In this paper, we propose a user model ontology for cultural tourism applications with the aim at adapting their functionalities to the end-users' background and interests.
In the last years, a new set of recommender systems has been recently developed to demonstrate the potential of emotion-based recommendations. Most of them are based on the acquisition and the analysis of data from micro blogging sites in order to predict the users' current emotional state. Considering that emotional aspects play a key role in culture heritage consumption, in this paper we present a facial emotion recognition approach in the context of cultural personalized recommendations. This work represents a first step for obtaining a cultural recommendation system capable to modulate users' emotions to make predictions.
Sustainability of buildings depends on their usage as much as on their static, physical properties. One can improve on the latter by the effective employment of information technology. Sensing technology can provide real-time accurate context information, in turn enabling intelligent building management. In this view, people's needs, preferences, habits, and commands become central. The present work aims at defining the scope of and trends in the area of Energy Intelligent Buildings by stating ten relevant and timely questions, together with a brief discussion on each one of them. The questions are based on the experience on Energy Intelligent Buildings gained by the Distributed Systems group of the University of Groningen over a decade of funded research and development projects on the topic.
Indoor energy consumption can be understood by breaking overall power consumption down into individual components and appliance activations. The classification of components of energy usage is known as load disaggregation or appliance recognition. Most of the previous efforts address the separation of devices with high energy demands. In many contexts though, such as an office, the devices to separate are numerous, heterogeneous, and have low consumptions. The disaggregation problem becomes then more challenging and, at the same time, crucial for understanding the user context. In fact, from the disaggregation one can deduce the number of people in an office room, their activities, and current energy needs. In this paper, we review the characteristics of office appliances load disaggregation efforts. We then illustrate a proposal for a classification model based on Recurrent Neural Network (RNN). RNN is used to infer device activation from aggregated energy consumptions. The approach shows promising results in recognizing 14 classes of 5 different devices being operated in our office, reaching 99.4% of Cohen's Kappa measure.