Ebook: Artificial Intelligence Research and Development
Artificial intelligence is no longer solely the preserve of computer scientists and researchers; it is now a part of all our lives, and hardly a day goes by without discussion and debate about the implications of its many applications in the mainstream media.
This book presents the proceedings of CCIA 2023, the 25th International Conference of the Catalan Association for Artificial Intelligence, held from 25 - 27 October 2023 in Barcelona, Spain. CCIA serves as an annual forum welcoming participants from around the globe. The theme of the 2023 conference was Supportive AI, the main goals of which are to strengthen collaboration between research and industry by sharing the latest advances in artificial intelligence, and opening discussion about how AI can better support the current needs of industry. A total of 54 submissions were received for the conference, of which the 26 full papers, 18 short papers and 6 abstracts included here were selected after peer review. The papers cover a wide range of topics in Artificial Intelligence, including machine learning, deep learning, social media evaluation, consensus-building, data science, recommender systems, and decision support systems, together with crucial applications of AI in fields such as health, education, disaster response, and the ethical impact of AI on society. The book also includes abstracts of the keynotes delivered by Professor Aida Kamišalić and Dr. Lluis Formiga.
Providing a useful overview of some of the latest developments in artificial intelligence, the book will be of interest to all those working in the field.
The International Conference of the Catalan Association for Artificial Intelligence (CCIA) serves as a unique forum, bringing together not only researchers in Artificial Intelligence within the Catalan-speaking territories (spanning southern France, Catalonia, Valencia, the Balearic Islands, and Alghero in Italy) but also drawing participants from around the globe.
This volume represents the proceedings of the 25th edition of CCIA, which took place in Sant Fruitós de Bages in October 2023. Past editions of CCIA have been hosted in various captivating locations, including Tarragona (1998), Girona (1999), Vilanova i la Geltrú (2000), Barcelona (2001, 2004, 2014, 2016), Castelló de la Plana (2002), Mallorca (2003), Alghero (Sardinia) (2005), Perpignan (France) (2006), Andorra (2007), Sant Martí d’Empúries (2008), Cardona (2009), L’Espluga de Francolí (2010), Lleida (2011), Alacant (2012), Vic (2013), València (2015), Deltebre (2017), Roses (2018), Colònia de Sant Jordi (2019), Lleida (2021), and Sitges (2022). Notably, CCIA was regrettably canceled in 2020 due to the COVID-19 restrictions.
This compilation includes 26 long papers, 18 short papers and 6 long abstracts meticulously selected from a pool of 54 submissions, a process made possible through the dedicated efforts of a program committee consisting of 104 artificial intelligence experts. We extend our heartfelt gratitude to them for their diligent reviews and also wish to acknowledge the hard work of the authors of the 54 submissions.
The accepted papers cover a diverse range of artificial intelligence domains, including machine learning, deep learning, social media evaluation, consensus-building, data science, recommender systems, decision support systems or computer vision, its crucial applications of AI in fields such as health, education, disaster response and serious games, and its ethical impact on society. Furthermore, this proceedings includes the abstracts of the keynote addresses delivered by Professor Aida Kamišalić and Dr. Lluis Formiga.
We extend our sincere appreciation to the Catalan Association for Artificial Intelligence (ACIA), the Universitat Jaume I, PAL Robotics, and Universitat Ramon Llull ESADE for their invaluable support.
Sant Fruitós de Bages, Catalonia, October 2023
Ismail Sanz, Universitat Jaume I
Raquel Ros, PAL Robotics
Jordi Nin, Universitat Ramon Llull, ESADE
Distributed Space Systems (DSS) are gaining prominence in the space industry due to their ability to increase mission performance by allowing cooperation and resource sharing between multiple satellites. In DSS where communication between heterogeneous satellites is necessary, achieving autonomous cooperation while minimizing energy consumption is a critical requirement, particularly in sparse constellations with nano-satellites. In order to minimize the functioning time and energy consumed by the Inter-Satellite Links (ISL) established for satellite-to-satellite communication, their temporal encounters must be anticipated. This work proposes an autonomous solution based on Supervised Learning that allows heterogeneous satellites in circular polar Low-Earth Orbits to predict their encounters, given the Orbital Elements (OE) and assuming isotropic antenna patterns. The model performance is evaluated and compared in two different scenarios: 1) a simplified scenario assuming that satellites follow Kepler orbits and 2) a realistic scenario assuming that satellites follow Simplified General Perturbations 4 (SGP4) orbits. This work could be considered the first stage of a promising and alternative approach in the field of DSS.
Neural network-based treatment effect estimation algorithms are well-known in the causal inference community. Many works propose new designs and architectures and report performance metrics over benchmarking data sets, in a Machine Learning manner. Nevertheless, most authors focus solely on binary treatment scenarios. This is a limitation, as many real-world scenarios have a multivalued treatment. In this work, we present a novel approach where we generalize a top-performing, neural network-based algorithm for binary treatment effect estimation to a multi-valued treatment setting. Our approach yields an estimator with desirable asymptotic properties, that delivers very good results in a wide range of experiments. To the best of our knowledge, this work is opening ground for the benchmarking of neural network-based algorithms for multi-valued treatment effect estimation.
The scarcity and imbalance of datasets for training deep learning models in a specific task is a common problem. This is especially true in the physiological domain where many applications use complex data collection processes and protocols, and it is difficult to gather a significant number of subjects.
In this paper, we evaluate generative deep learning algorithms by training them to create data based on open physiological datasets and conduct a study on their potential for transfer learning. We measure the performance change of classifiers when the training data is augmented with the synthetic samples and also perform experiments in which we fine-tune classification models trained with the generated data adding increasing amounts of the real data to investigate the transfer learning capabilities of synthetic datasets.
Finally, we advise and provide the best option for researchers interested in augmenting ECG datasets using these algorithms and the best fine-tuning strategies that would generalize correctly when tested on new data from the same domain but for a different classification task.
This paper proposes an integrated framework for automatically segmenting road surface cracks that utilize a Multi-Attention-Network and a modified U-Net, combined through neural network stacking, to segment the crack regions accurately. To evaluate the effectiveness of the proposed framework, we introduce a road crack dataset containing complex environmental noise. We explore several stacking scenarios and perform thorough evaluations to assess the performance of the proposed model. Our results show that the proposed method improves the IOU score of 1.5% compared to the original network, indicating its effectiveness in segmenting road cracks. The proposed framework can be a valuable tool for road maintenance and inspection, enabling timely detection and repair of cracks and improving road safety and longevity. Our findings demonstrate the importance of exploring various stacking scenarios and performing comprehensive evaluations to establish the efficacy of the proposed framework.
In nature, the generalization in object recognition is based not in the stimuli space but the internal representation as Shepard proposed in his Universal Law of Generalization. The fact that this law applies to many biological system gives it its universal character. The extension of this universality to artificial systems remains to be studied. In this paper, we present how an artificial agent generates the internal representation of a collection of stimuli from a previous experience. More-over, from the internal representation and the classification of the artificial agent, we are already able to verify Shepard’s law. It should be noted that the presented methodology is independent of the classifier. In our case, we have applied it to an artificial system that captures haptic information from a collection of stimuli. We have verified that the relationship between the perceived distances in the internal representation and the probability of confusion between stimuli follows Shepard’s law. Verifying compliance with this law in artificial systems and studying its implications can be relevant to understanding generalization in learning.
Citizen Science brings together scientists and public participants to collaborate on a wide range of applications and fields. With this approach, Citizen Science advances scientific research and communication while accounting for various stakeholders. Across many Citizen Science projects, digital discussion platforms play an essential role for self-governance and self-organisation. In order to increase the quality of the discussions held on these platforms, we propose a model that recommends users to new discussions in which they are likely to contribute meaningful content. Our model learns relevant user representations based on the quality of past interactions between users and discussion threads, as well as the text content of questions, using a ranking loss function, an approximation of the NDCG metric, and matrix factorization. We demonstrate that our approach is able to predict potential experts on unseen discussion threads and outperforms several baselines. Compared to state-of-the-art expert finding techniques, the architecture of our model is significantly less complex, while focusing on a mostly overlooked ranking loss function.
In this research, we are attempting to extract relevant knowledge from sensor data of 3D printers. This is in continuation of our previous work to characterize the operational status of 3D printers using Bootstrap-CURE technique, including the post-processing techniques oriented to understand and conceptualize the clusters. Four clusters were identified regarding normal successful printing and abnormal situations due to different reasons like insufficient operating conditions, sensor failure, imbalance in internal temperature, etc. In the current work, the representation of data moves from punctual sensor readings to a vectorial description of a printing job in terms of the sequence of operational status over time. By representing a job by its sequence of clusters, it can be analyzed which sequential patterns are associated with successful or failing jobs. This opens the door to identifying trends and anticipating failures that are crucial to improving 3D printers’ control and management. To cluster qualitative time series, specific qualitative distances are required, like χ2-distance. One of the main challenges in clustering a time series dataset is to similarities among various time series rows. However, when the irregular length of the series and highly skewed series appear, special representation methods are required to avoid biases in results and the curse of dimensionality effects. The paper proposes to overcome these limitations.
Cognitive similarity is a fundamental concept in the peer review process and the research funding programs. It can help to find qualified experts in a field in the refereeing or evaluation process. The main objective of this work is to find an automatic way to determine cognitive similarity based on bibliographic information retrieval using the Scopus database. In this paper, we will compare different measures of cognitive similarity between pairs of authors through the whole publication portfolio of both. These measures are based on the authors’ publications and citations, at different levels of depth of authorship and citation networks. We have applied bibliographic coupling and text-based techniques. We have expressed these measures as cosine similarity. For this work, we use a small empirical case study with authors in the fields related to artificial intelligence to compare the measures of cognitive similarity. The results of this study show that the use of combined measures using bibliographic coupling and text-based can provide more information on the cognitive similarity between a pair of authors.
Head pose estimation, a crucial task in computer vision, involves determining the orientation of a person’s head in 3D space through yaw, pitch, and roll angles. While recent techniques present excellent results in estimating head pose from a single 2D RGB image when the head faces the camera directly, few methods exist for pose estimation from arbitrary viewpoints. This problem is emphasised when the input data is in 3D, such as heads reconstructed models from magnetic resonances, where an accurate estimation of the pose is necessary for diagnostic purposes. To overcome these limitations, we make a first step by proposing a method for fine-grained head pose estimation across the full-range of yaw angles using 3D head synthetic models. Our approach involves transforming the 3D pose estimation problem into a multi-class 2D image classification problem by representing 3D head models as multi-view projection images. Leveraging a fine-tuned ResNet50 convolutional neural network, we tackle the task of head pose estimation with fine granularity of 5°, effectively discretizing the 360° yaw orientations. For the evaluation of our proposal, we train and test our models with the publicly available FaceScape and 3D BIWI datasets obtaining promising results.
This ongoing work outlines a computer vision and deep learning-based pipeline to identify and detect brain biomarkers of diagnostic potential from magnetic resonance imaging (MRI) scans. In this context, this paper describes and analyses two strategies for brain landmark detection, which is a key step in brain biomarker identification: one based on a single Deep Convolutional Neural Network (DCNN) that detects multiple landmarks, and the other based on an ensemble of DCNNs trained to detect one landmark each. Based on our evaluation using two distinct datasets, our preliminary findings demonstrate that the ensemble of DC-NNs achieves superior accuracy in landmarking. Specifically, it successfully detects 84% of the landmarks within a 3mm proximity to their actual locations, with an average error of less than 2mm. In contrast, a single DCNN exhibits an average error of approximately 3mm and locates only 59% of the landmarks within a 3mm distance from their true positions.
This work is part of the research project “Sons al Balcó” conducted by La Salle - Universitat Ramon Llull, which examines the impacts of noise pollution on human perception and mental health, specifically focusing on the perception of noise in Catalonia during the lockdown in 2020 and the return to normalcy in 2021. The purpose of this research is to identify patterns between the soundscape and the visual landscape of participants’ environments. To achieve this, we have developed a pipeline to automatically analyse the visual landscape of participants’ environments by semantically segmenting the keyframes of their videos using deep neural networks. Specifically, we use the SegFormer model, a Transformer-based framework for semantic segmentation that integrates Transformers with lightweight MLP decoders. This pipeline facilitates the efficient and accurate identification of different objects, to understand the complex relationships among the acoustic environment, visual landscape, and human perception. We expect that our findings will offer insights into the design of urban and suburban areas that promote well-being and quality of life.
The appearance of new trends in the field of cognitive neuroscience, for example object persistence, has paved the way for the evolution of deep CNNs into Siamese Neural Network architectures such as OPNet. These networks allow for image recognition without the need for expensive labelled data. In this work, we apply this technology to a small Spanish tech e-commerce struggling with the production of their customizable products. Our goal was to automatically identify each product’s order in the company’s internal system by matching photos of the products taken by workers with system-generated images. After testing various architectures, we achieved 91% accuracy with a triplet loss model using deep CNN embedding networks. The algorithm was trained on a dataset of 9696 unique product images captured in the company’s production department. The paper details the technical aspects of the Siamese Neural Network architecture, including the triplet loss and SoftMax distance function used to train it. Our results demonstrate the potential of these deep learning models to generate practical benefits for firms, since it reduces human errors, while improving the effectiveness and efficiency of the company’s internal processes.
This study examine whether visits to the Car Configurator website from a specific area in Spain, referred to as “compound,” have a similar impact compared to visits from other locations. The impact is measured by the correlation between clickstream data and sales records. To analyze this relationship, genetic algorithms are employed. The findings reveal that the genetic algorithm surpasses the bench-mark values by more than 65 points, indicating its effectiveness. Moreover, the correlation achieved from locations outside the compound is found to be equivalent to the fitness obtained from the regions comprising these compounds. This suggests that the impact of website visits on sales is consistent across different geographic locations.
This study evaluates Reinforcement Learning (RL) techniques for financial trading during unpredictable market conditions, such as black swan events. Three experiments were conducted: one where the algorithms were trained and tested over the same period; another where they were trained and tested over different periods; and the final one where they were trained over a certain period and then tested during a period that included a black swan event (the market crash of March 2020). Results show that RL methods outperform traditional strategies in the in-sample period, but struggle to adapt during the black swan event. The results show the potential of RL techniques in financial trading with the right approach.
Particle physics is a source of engineering challenges, also for Machine Learning techniques. We showcase three current uses of Machine Learning in the LHCb experiment, one of the four main experiments of the Large Hadron Collider (LHC) at CERN. Two are in the Real Time Analysis framework, which is in charge of processing the detector 4TB/s dataflow in real time: one to locate the points where particles issued from the accelerator collisions decay, and the other to ensure a smooth choice in the data to be stored. A third use is about speeding the detector simulation with generative techniques. In all three cases, computing speed is the key factor for using Machine Learning algorithms.
Existing research has shown the effectiveness of genetic strategies in generating Petrin-Net (PN)-based controllers, but limitations exist in the ease of controller generation due to the designer’s ability and the system’s complexity. In the case of automated controller generators based on genetic programming (GP), limitations arise from the static nature of their chromosome over the evolution process. In this short paper we introduce a first discrete PN-based controller designer that can accept systems modeled either continuously or discretely, making it more flexible in handling a wide range of systems. By utilizing genetic algorithms and PNs, the program can generate controllers tailored to the specific requirements of a given system, including the optimal size of the controller. This novel approach has the potential for far-reaching applications in various fields.
Semantic segmentation of LiDAR point clouds has received significant attention due to its applications in autonomous driving, forestry, and urban planning. Despite their potential, accurately classifying three-dimensional points remains a significant challenge due to the irregular distribution of data and density variation. To address this, state-of-the-art approaches use various techniques, such as voxelization, point-based networks, and graph-based methods. However, these techniques have limitations regarding the point cloud size they can handle and can be computationally expensive. Therefore, in this work, we propose a method to process point clouds of different scales and densities for point classification.
Although the negative consequences of noise during induction have been widely studied, previous work often lacks the use of validated data to measure its impact. We propose a framework based on Bayesian Networks for modeling class noise and generating synthetic data sets where the kind and amount of class noise are under control. The benefits of the proposed approach are illustrated evaluating the filtering of noise completely at random in class labels when inducing decision trees. Unexpectedly, this kind of noise showed a low effect on accuracy and a low occurrence on real datasets. The framework and the methodology developed here seem promising to study other kinds of noise in class labels.
According to a survey by the National Institute of Statistics in Spain, around 2.2% of the hearing impaired population, equivalent to 27 300 people, use Sign Language to communicate. This highlights the importance of ensuring accessibility to Sign Language as a fundamental resource to guarantee inclusion and equal opportunities for deaf and hearing impaired people. To achieve this goal, it is essential to promote and disseminate Sign Language and to remove barriers to its usage and access. Technology can be of great help in this field. Therefore, this work presents EduSign, a new AI-based application that classifies the alphabet of the Spanish Sign Language in real time aiming at facilitating the communication of people with hearing disabilities. Specifically, EduSign is a web application that runs a machine learning algorithm (i.e., a neural network) where users can learn and practice the Spanish Sign Language alphabet. The methodology that enabled the creation of EduSign consisted on capturing the coordinates of the gestures made in Sign Language using the MediaPipe tool. These coordinates were then used to train several machine learning classifiers (i.e., K-Nearest Neighbors, Ridge Regression, Gradient Boosting, Random Forest, Logistic Regression and a Neural Network). Out of these classifiers, the neural network offered the best performance. More specifically, using a custom dataset with the complete Spanish alphabet, it shown an accuracy of 99%. These results reinforce the idea that the application of artificial intelligence techniques can be of great help in improving communication for deaf and hearing impaired people. Hence, it is hoped that the system can be used effectively by the target audience.
Accurate evaluation of capsule endoscopy videos plays a crucial role in diagnosing gastrointestinal disorders. One important aspect is the assessment of the cleansing quality, which indicates the visibility of the gastrointestinal mucosa during the examination. In this study, we propose a novel method for assigning a cleansing score to capsule endoscopy videos. Moreover, our system proposes a segmentation masks for each frame, highlighting the regions of the image that are not visible. This additional information aids in the interpretation and analysis of the videos, enabling more accurate diagnoses and improved patient care.