Ebook: Artificial Intelligence Research and Development
The Catalan Association for Artificial Intelligence (ACIA) was formed in 1994 with the aim of promoting cooperation between researchers in artificial intelligence within the Catalan speaking community. This objective has been achieved and widened since the association held their first conference in 1998, and the annual conference of the association has become an international event presenting and discussing the latest research in AI, which attracts AI researchers from around the world.
This book presents the proceedings of the 19th International Conference of the Catalan Association for Artificial Intelligence (CCIA 2016), held in Barcelona, Spain, on 19-21 October. From a total of 50 original contributions, 16 long papers and 22 short papers were accepted for presentation at the conference on the basis of their relevance, originality and technical validity. The book is divided into 7 sections: Invited Talks (synopsis only); Vision and Robotics; Logic, Constraint Satisfaction and Qualitative Theory; Classification and Clustering; Modelling; Planning and Recommender Systems; Lexical Knowledge Representation and Natural Language Processing.
Providing an overview of the latest developments in the field, this book will be of interest to all those whose work involves research into, and the application of, artificial intelligence.
The 19th edition of the International Conference of the Catalan Association for Artificial Intelligence was held in Barcelona and hosted by the Universitat Pompeu Fabra, from the 19th to the 21st of October 2016 and was organized by the AI groups of the Universitat Pompeu Fabra (UPF), the Artificial Intelligence Research Institute (IIIA-CSIC) and the Universitat Politècnica de Catalunya (UPC-BarcelonaTech). From the first edition of the conference in 1998 held in Tarragona, until today, it has been a long way. The main goal was to create a forum that allows communication and research collaboration among the Catalan AI community. This objective has been achieved and at the same time has widened over the last 19 years. CCIA, the conference organized by the Catalan Association for Artificial Intelligence (ACIA), is currently an international event for researchers in the Catalan Countries which also involves researchers from other countries around the world.
CCIA 2016 received 50 original contributions that were carefully reviewed by two members of the program committee. Of these 50 submissions, 16 were accepted as long papers (10 pages) and 22 were accepted as short papers (6 pages). These 38 submissions were accepted for inclusion in this book for their relevance, originality and technical validity. In the current edition the majority of Catalan scientific institutions are represented by the participation of 142 authors, from the following institutions: Universitat Pompeu Fabra (UPF), Universitat de Vic (UVIC), Universitat Politècnica de Catalunya (UPC-BarcelonaTech), Universitat Rovira Virgili (URV), Consejo Superior de Investigaciones Científicas (CSIC), Universitat de Barcelona (UB), Universitat de Girona (UdG), Universitat Ramon Llull (URL), Universitat de Lleida (UdL), Hospital Universitari Sant Joan de Reus, Eurecat – Centre Tecnològic de Catalunya, Institut d'Investigació Sanitària Pere Virgili, Institut d'Investigació Biomèdica de Girona Dr. Josep Trueta, Centre Mèdic Teknon – Quirón. We are very pleased to have also the participation of authors from the rest of Spain (10), Colombia (3), France (2), Germany (1), India (2), Iran (5), Italy (1), Mexico (1), Netherlands (1), Norway (1), Poland (1), Romania (2) and the United States (3).
In this conference we prefer single track presentations instead of parallel sessions, with the goal of promoting discussions among the conference attendees. Sixteen of the accepted papers were selected for oral presentations and 22 for short oral presentations followed by posters in the posters sessions. The conference program also featured invited talks by three outstanding researchers: Emilia Gómez (Associate Professor (Serra-Húnter and ICREA Fellow) at the Department of Information and Communication Technologies, Universitat Pompeu Fabra), Petia Radeva (Head of the Computer Vision Group at the University of Barcelona and Head of the Medical Imaging Laboratory at the Computer Vision Center) and Jordi Nin (Senior data scientist at BBVA D&A).
We would like to express our sincere gratitude to the authors of the contributed papers, to the invited speakers for their enlightening talks, to all members of the scientific and organizing committees who worked hard to make this conference a success, to Eva Armengol, former scientific programme chair in CCIA'2015, for her help during this last year and to Beatriz López, president of ACIA, for her kind support.
Financial support was generously provided by BigML and Universitat Politècnica de Catalunya (UPC-BarcelonaTech).
We are confident that this book is a good example of the quality of the research that is currently being carried out within the Catalan AI community.
Barcelona, October 2016
Àngela Nebot, Scientific Programme Chair, UPC-BarcelonaTech
Xavier Binefa, Local Arrangement Chair, UPF
Ramon López de Mántaras, General Chair, IIIA-CSIC
In this talk I will provide an overview of my research in the field of Music Information Retrieval (MIR), which tries to understand the way humans describe music and emulate these descriptions by computational models dealing with big music data. By integrating knowledge from signal processing, music theory, cognition and artificial intelligence, we have developed methods to automatically describe music audio signals in terms of melody, tonality and rhythm; to measure similarity between pieces and automatically classify music according to style, emotion or culture. Over the last years, we have focused on two different application contexts. On one hand, we try to innovate the way we experience classical music concerts. On the other hand, we research on the computational modeling of flamenco music, improving current techniques for singing voice description and style classification.
The analysis of people's nutrition habits is one of the most important mechanisms for applying a thorough monitorisation of several medical conditions (e.g. diabetes, obesity, etc.) that affect a high percentage of the global population. Methods for automatically logging one's meals could not only make the process easier, but also make it objective to the user's point of view and interpretability. One of the solutions adopted recently that could ease the automatic construction of nutrition diaries is to ask individuals to take photos with their mobile phones. An alternative technique is visual lifelogging that consists of using a wearable camera that automatically captures pictures from the user point of view (egocentric point of view) with the aim to analyse different patterns of his/her daily life and extract highly relevant information like nutritional habits. In this talk we will show how deep learning applied to the food detection and food recognition problems can help to automatically infer the user's eating pattern.
Data is changing our society. Because of data we are rethinking our industries to build better products: agriculture, education, finance, legal, etc. With the advent of data, a prodigal son of the machine learning family has returned to the fore to play a main role: artificial neural networks, also known as Deep Learning. In this talk, I will provide some insights about its application to detect fraudulent credit card transactions conducted in online stores and retailers. I will also describe the data we use, how neural networks are trained and how their performance is measured. Besides, I will discuss more general thoughts about how the possibility of processing huge amount of data has boosted deep learning and machine learning in the industry.
Automated video and image analysis can be a very efficient tool to behavior study, especially in hard access environments for researchers. The understanding of this social behavior can play a key role in the sustainable design of control policies of many species. This paper proposes the use of computer vision algorithms to identify and track, the Norway lobster, Nephrops norvegicus, a burrowing decapod with relevant commercial value which is captured by trawling. These animals can only be captured when are engaged in seabed excursions, which are strongly related with their social behavior. This emergent behavior is modulated by the day-night cycle, but social interactions remain unknown to the scientific community. The paper introduces an identification scheme made of four distinguishable black and white tags (geometric shapes). The project has recorded 15-day experiments in laboratory, under monochromatic blue light (472 nm.) and darkness conditions (recorded using Infra Red light). Using this massive image set, we propose a comparative of state-of-the-art computer vision algorithms to distinguish and track the different animals' movements. We evaluate the robustness to the high noise presence in the infrared video signals and free out-of-plane rotations due to animal movement. The experiments show promising accuracies under a cross-validation protocol, being adaptable to the automation and analysis of large scale data. In a second contribution, we created an extensive dataset of shapes (46027 different shapes) from four daily experimental video recordings, which will be available to the community.
Robots equipped with a set of simple action skills should complete complex tasks, defined as the concatenation of a number of those basic abilities. Traditionally, planners have been used to decide skills to be activated, as well as in which sequence, like state machines. Recently, cognitive architectures like SOAR have been proposed to act as the reasoner by selecting which competence the robot should perform, addressing it towards the goal. However, they have been unidirectionally integrated: once the plan is completely designed by the cognitive architecture, it is sent to the robot, but no feedback is provided to the reasoner. Instead, our proposal allows to establish bi-directional communication between the reasoner and the robot. In this form, the reasoner can develop incomplete plans under the assumption that a part of the information to complete the plan for achieving the goal will arrive delayed from the robot's environment as well as the user. Our work develops this bi-directional communication between the SOAR cognitive architecture and the ROS (Robot Operating System) environment, usual in mobile robotics. The proposed architecture has been tested on a UAV (Unmanned Aerial Vehicle) Parrot AR.Drone 2.0, which acts as mobile robot, in a searching task.
Convolutional Neural Networks (CNN) are the most popular of deep network models due to their applicability and success in image processing. Although plenty of effort has been made in designing and training better discriminative CNNs, little is yet known about the internal features these models learn. Questions like, what specific knowledge is coded within CNN layers, and how can it be used for other purposes besides discrimination, remain to be answered. To advance in the resolution of these questions, in this work we extract features from CNN layers, building vector representations from CNN activations. The resultant vector embedding is used to represent first images and then known image classes. On those representations we perform an unsupervised clustering process, with the goal of studying the hidden semantics captured in the embedding space. Several abstract entities untaught to the network emerge in this process, effectively defining a taxonomy of knowledge as perceived by the CNN. We evaluate and interpret these sets using WordNet, while studying the different behaviours exhibited by the layers of a CNN model according to their depth. Our results indicate that, while top (i.e., deeper) layers provide the most representative space, low layers also define descriptive dimensions.
Diabetic Retinopathy (DR) has become nowadays a considerable world-wide threat due to increased growth of blind people at early ages. From the engineering viewpoint, the detection of DR pathologies (microaneurysms, hemorrhages and exudates) through computer vision techniques is of prime importance in medical assistance. Such methodologies outperform traditional screening of retinal color fundus images. Moreover, the identification of landmark features as the optic disk (OD), fovea and retinal vessels is a key pre-processing step to detect the aforementioned potential pathologies. In the same vein, this paper works with the well-known Convexity Shape Prior algorithm to segment the main anatomical structure of the retina, the OD. At first, some pre-processing techniques such as the Contrast Limited Adaptive Histogram Equalization (CLAHE) and Brightness Preserving Dynamic Fuzzy Histogram Equalization (BPDFHE) are applied to enhance the image contrast and eliminate the artifacts. Subsequently, several morphological operations are performed to improve the post-segmentation of the OD. Finally, blood vessels are extracted through a novel fusion of the average, median, Gaussian and Gabor wavelet filters.
In real life environments where robots must deal with complex situations and humans, generalist robots that adapt to novel situations are needed. They are composed by two sub-systems: perception/actuation and knowledge representation, and they need that symbols in the high-level area are coupled to objects and actions of the low-level area. This is the so-called Anchoring Problem. In this paper we present the system we are using to study this problem. It is based on ROSPlan, a framework that provides a generic method for task planning in a ROS system. The high-level area is composed by a planner that uses PDDL files and a knowledge representation system, while the low-level area is defined as a set of robot services exported using ROS actions, services and topics. We plan to contribute to this problem by applying human-robot interaction and learning techniques, and our main objectives are: (1) link an existing symbol with a learned action by interaction, and (2) automated code generation of ad-hoc ROS nodes that connect symbols to specific perceptions/actions.
This paper analyzes and discusses the performance of Bag of Visual Words (BoVW), a well-known image encoding and classification technique utilized to recognize object categories, in the particular application scope of complex scene recognition. Given a set of training images containing examples of the different objects of interest, a dictionary of prototypical SIFT descriptors (visual words) is first obtained by applying unsupervised clustering. The contents of any input image can then be encoded by computing a histogram that denotes the relative frequency of every visual word in the SIFT descriptors of that input image. A Support Vector Machine (SVM) is then trained for every object category by using as positive examples the histograms corresponding to training images with objects belonging to that category, and as negative examples, the histograms from the other categories. Given an image with an unknown object to be classified, its histogram of visual words is obtained and fed into the available SVMs. The image is then classified into the object category of the SVM that yields the highest score. Although BoVW has proven to provide acceptable results for recognizing complex object categories, its performance for scene recognition needs to be assessed, since scenes have intrinsic properties that make them significantly differ from individual object categories.
Diabetic Retinopathy is one of the main causes of blindness and visual impairment for diabetic population. The detection and diagnosis of the disease is usually done with the help of retinal images taken with a mydriatric camera. In this paper we propose an automatic retina image classifier that using supervised deep learning techniques is able to classify retinal images in five standard levels of severity. In each level different irregularities appear on the image, due to micro-aneurisms, hemorrages, exudates and edemas. This problem has been approached before using traditional computer vision techniques based on manual feature extraction. Differently, we explore the use of the recent machine learning approach of deep convolutional neural networks, which has given good results in other image classification problems. From a training dataset of around 35000 human classified images, different convolutional neural networks with different input size images are tested in order to find the model that performs the best over a test set of around 53000 images. Results show that it is possible to achieve a quadratic weighted kappa classification score over 0.75 not far from human expert reported scores of 0.80.
Human activity recognition methods are used in several applications such as human-computer interaction, robot learning, and analyzing video surveillance. Although several methods have been proposed for activity recognition, most of them ignore the relation between adjacent video frames and thus they fail to recognize some actions. In this study we propose an unsupervised algorithm to segment the input video into subsequences. Each subsequence contains a part of the main action happening in the video. This algorithm analyzes the temporal coherence of the adjacent frames using several similarity measures. We show preliminary results using two state-of-the-art action recognition datasets, namely HMDM51 and Hollywood2.
Recent advances in lifelogging technologies, and in particular, in the field of wearable cameras, have made possible to capture continuously our daily life from a first-person point of view and in a free-hand fashion. However, given the huge amount of images captured and the rate to which they increase (up to 2000 images per day), there is a strong need for efficient and scalable indexing and retrieval systems over egocentric images. To cope with those requirements, we develop a full Content-Based Image Retrieval system based on Convolutional Neural Network (CNN) features. We use egocentric images to create a Lucene index with off-the-shelf features extracted from a pre-trained CNN. Finally, we provide a web-based prototype for egocentric image search and retrieval and tested its performances on the EDUB egocentric dataset.
This contribution deals with the problem of aggregating T-equivalence relations, in the sense that we are looking for functions that preserve reflexivity, symmetry and transitivity with respect to a given t-norm T. Under any extra condition on the t-norm, we obtain a complete description of those functions in terms of that we call T-triangular triplets.
We define a clause tableau calculus for MinSAT, and prove its soundness and completeness. The calculus allows one to compute the maximum number of clauses that can be falsified in a multiset of clauses by applying, finitely many times, tableaux-like inference rules. We also describe how the calculus can be extended to solve weighted MinSAT and weighted partial MinSAT.
This paper presents a new qualitative color harmony (QCharm) theory based on a Qualitative Color Descriptor (QCD) together with the definition of 5 basic color combination operations. QCharm can construct color palettes (or color schemes) of compatible colors. Moreover, in this paper, palettes of 3 colors are constructed and their harmonious quality is predicted using a model defined using data from large datasets such as ColorLovers
http://www.colorlovers.com/
Tasks in architectural and interior design range from defining the building floor plans and ensuring desired functionality, to deciding furnishing styles and arrangement choices. The process of design, as a whole, has remained hard to master for computer-based optimization in general and for computational intelligence approaches in particular. Numerous attempts to tackle different subfields of this problem in a machine learning fashion have emerged over the last few years. In this paper, we present an overview of current advances of computational intelligence in architectural science with a focus on interior design. This is accompanied by the description of ongoing research towards the development of a commercial robust and scalable solution for automatic furniture arrangement.
Breast cancer is one of the most dangerous diseases for women. Although mammographies are the most common method for its early detection, thermographies have been used to detect the temperature of young women using infrared cameras to analyze breast cancer. The temperature of the region that contains a tumor is warmer than the normal tissue, and this difference of temperature can be easily detected by infrared cameras. This paper proposes a new method to model the evolution of the temperatures of women breasts using texture features and a learning to rank method. It produces a descriptive and compact representation of a sequence of infrared images acquired during different time intervals of a thermography protocol, which is then used to discriminate between healthy and cancerous cases. The proposed method achieves good classification results and outperforms the state of the art ones.
Freezing of gait is one of the most disturbing and incapacitating symptoms in Parkinson's Disease. This is defined as a sudden block in effective stepping, provoking anxiety, stress and falls. FoG is usually evaluated by means of different questionnaires; however, this method has shown not to be reliable since it is subjective due to its dependence on patients' and caregivers' judgement. Several authors have analysed the usage of MEMS inertial systems to detect FoG with the aim of objective evaluate this symptom. So far, a threshold-based method based on accelerometer's frequency response has been employed in many works; nonetheless, since it has been developed and tested in laboratory conditions, it provides a much lower accuracy at patients' home. This work proposes a new set of features to detect FoG by using accelerometers, which is compared with three previously reported approaches to detect FoG. The different feature sets are trained by means of several machine learning classifiers; furthermore, different window sizes are also evaluated. Results show that the proposed method detects FoG at patients' home with 91.7% and 87.4% of sensitivity and specificity, respectively, enhancing the results of former methods between a 5% and 11% and providing a more balanced rate of true positives and true negatives.
This paper proposes the representation of tweets using a novel set of features, which include a bag of negated words and the information provided by seven lexicons. The polarity of tweets is determined by a classifier based on a Support Vector Machine. The system has been evaluated on the standard tweet sets used in the SemEval 2015 competition, obtaining results that, in most cases, outperform those of the state-of-the-art sentiment analysis systems.
The relationship between interpretations of nested partitions is analyzed in this work, since there are multiple situations where a refinement of the original partition arises. As a result, a new methodology NCI-IMS is proposed in order to maintain the consistency between interpretations of nested partitions. This methodology extends a previous methodology that obtains classes' descriptors by determining the significance's robustness of the characteristics significance. Then, NCI-IMS takes advantage of the descriptors robustness obtaining a deeper analysis of the relations between superclass's and subclasses' descriptors.