Ebook: Artificial Intelligence Research and Development
In its broadest sense, artificial intelligence (AI) is intelligence exhibited by machines; computer systems in particular. Founded as an academic discipline in 1956, it was not until 2012, when deep learning outperformed previous AI techniques, that interest and investment increased enormously, ultimately leading to the situation today where it is sometimes hard for us to imagine a life without AI.
This book presents the proceedings of CCIA2024, the 26th edition of the International Conference of the Catalan Association for Artificial Intelligence, held from 2 – 4 October 2024 in Barcelona, Spain. The CCIA conferences now serve as a forum to bring together participants working on AI research from around the world. A total of 68 papers was received for the conference, and this book contains 26 long papers, 19 short papers, and 4 long abstracts (49 papers) which were ultimately accepted following a thorough peer review process, resulting in an acceptance rate of 72%. These are divided into 5 sections: machine learning and deep learning applications; natural language processing (NLP); artificial intelligence ethics and decision support systems; robotics and autonomous systems; and specialized applications of AI, and cover a wide range of AI domains, such as machine learning, deep learning, natural language processing, computer vision, data science, and reinforcement learning. Critical AI applications across a number of fields are explored, including healthcare, smart services, environmental monitoring, and manufacturing, and topics such as the ethical considerations of AI, voice cloning, and decision-making support systems are also examined.
Providing a current overview of new developments and ideas, the book will be of interest to all those working in the field of AI development.
The International Conference of the Catalan Association for Artificial Intelligence (CCIA) serves as a unique forum, bringing together not only researchers in Artificial Intelligence within the Catalan-speaking territories (spanning southern France, Catalonia, Valencia, the Balearic Islands, and Alghero in Italy) but also drawing participants from around the globe.
This volume represents the proceedings of the 26th edition of CCIA, which took place in La Salle Barcelona – Universitat Ramon Llull (Barcelona) in October 2024. Past editions of CCIA have been hosted in various captivating locations, including Tarragona (1998), Girona (1999), Vilanova i la Geltrú (2000), Barcelona (2001, 2004, 2014, 2016), Castelló de la Plana (2002), Mallorca (2003), Alghero (Sardinia) (2005), Perpignan (France) (2006), Andorra (2007), Sant Martí d’Empúries (2008), Cardona (2009), L’Espluga de Francolí (2010), Lleida (2011), Alacant (2012), Vic (2013), València (2015), Deltebre (2017), Roses (2018), Colònia de Sant Jordi (2019), Lleida (2021), Sitges (2022) and Sant Fruitós de Bages (2023). Notably, CCIA was regrettably canceled in 2020 due to the COVID-19 restrictions.
This compilation features 26 long papers, 19 short papers, and 4 long abstracts, all carefully chosen from a total of 68 submissions. This selection process was made possible thanks to the tireless efforts of our program committee, comprised of 83 experts in artificial intelligence. We deeply appreciate their thorough reviews and also wish to recognize the hard work of the authors.
The accepted papers cover a diverse range of artificial intelligence domains, including a wide array of artificial intelligence domains, such as machine learning, deep learning, natural language processing, computer vision, data science, and reinforcement learning. They explore critical applications of AI across various fields, including healthcare, smart services, environmental monitoring, and manufacturing. The proceedings also delve into the ethical considerations of AI, such as voice cloning and decision-making support systems.
We extend our sincere appreciation to the Catalan Association for Artificial Intelligence (ACIA), the Universitat Jaume I, PAL Robotics, and Universitat Ramon Llull ESADE for their invaluable support.
La Salle Barcelona – Universitat Ramon Llull (Barcelona), Catalonia, October 2024
Teresa Alsinet, Universitat de Lleida
Xavier Vilasís, La Salle – Universitat Ramon Llull
Daniel García, Universitat de València
Elena Álvarez, Universitat de València
The future faces escalating water scarcity due to population growth, climate change, and inefficient resource management. Therefore, innovative solutions for sustainable access and usage are needed. Seawater Reverse Osmosis (SWRO) desalination stands out as a key technology in tackling this dilemma. However, SWRO is energy-intensive, primarily due to the need to pressurize seawater to overcome the osmotic pressure to produce fresh water. In this regard, real-time management of operating parameters in SWRO plants enables minimizing energy consumption and chemical usage and adjusting water production in response to demand and water conditions, highlighting the need for real-time monitoring and advanced simulation tools such as digital twins. In response, this study explores the potential of eleven machine learning algorithms to simulate the SWRO process using a vast dataset of 18.816 scenarios generated through a solution diffusion transport model. Our investigation covers both non-ensemble and ensemble models. Additionally, a Shapley additive explanation analysis was carried out to gain insights into the most influential predictors and confirm the model’s ability to comprehend the Reverse Osmosis (RO) process. The findings underscore the high accuracy of the algorithms, particularly XGBoost, CatBoost and ANN, in predicting key parameters such as permeate flow, permeate salinity and specific energy consumption. Furthermore, Support Vector Machine regression model shows promising in predicting permeate flow. These findings highlight the potential of data-driven models, particularly ensemble-based algorithms, in simulating SWRO behavior, laying the groundwork for future process optimization.
A fundamental task in science is to determine the underlying causal relations because it is the knowledge of this functional structure what leads to the correct interpretation of an effect given the apparent associations in the observed data. In this sense, Causal Discovery is a technique that tackles this challenge by analyzing the statistical properties of the constituent variables. In this work, we target the generalizability of the discovery method by following a reductionist approach that only involves two variables, i.e., the pairwise or bi-variate setting. We question the current (possibly misleading) baseline results on the basis that they were obtained through supervised learning, which is arguably contrary to this genuinely exploratory endeavor. In consequence, we approach this problem in an unsupervised way, using robust Mutual Information measures, and observing the impact of the different variable types, which is oftentimes ignored in the design of solutions. Thus, we provide a novel set of standard unbiased results that can serve as a reference to guide future discovery tasks in completely unknown environments.
In an era characterized by the rapid evolution of data-driven applications, the generation of high-quality synthetic data has become increasingly indispensable. This serves as a crucial element for advancing research, development, and ensuring the responsible management of sensitive information. However, the synthesis of fundus images presents unique challenges due to the intricate and highly detailed structures inherent in retinal images. While Generative Adversarial Networks (GANs) show promise in image synthesis, they often encounter training difficulties and struggle to produce truly realistic images. This paper introduces SynthRetina, an innovative system that harnesses the capabilities of GANs to generate lifelike fundus images. SynthRetina amalgamates a generator network and a discriminator network, facilitating the creation of synthetic fundus images with diverse applications across the medical field. The generator network specializes in transforming input fundus images from one class to another, while the discriminator network rigorously evaluates the authenticity of the generated images. SynthRetina effectively addresses the challenge of limited availability of medical data for research and development, offering a solution that enhances data augmentation and improves the performance of fundus image classification tasks. An evaluation of the SynthRetina architecture using a real fundus image dataset demonstrates its ability to produce a more diverse and realistic collection of fundus images compared to other GAN-based methods.
As the usage of the edge-cloud continuum rises, Kubernetes presents itself as a solution that allows easy control and deployment of applications in these highly-distributed and heterogeneous environments. In this context, Artificial Intelligence methods have been proposed to aid in the task allocation process to optimize different aspects of the system, such as application execution time, load balancing or energy consumption. In this paper, we propose a space-time combinational model that uses Deep Reinforcement Learning (DRL) to recommend node allocations for Kubernetes pods with the objective of optimizing the overall energy consumption of the cluster while maintaining pod execution ratio. In particular, our approach uses Proximal Policy Optimization (PPO) with custom Neural Networks to train a DRL agent and includes a custom Kubernetes operator to enforce allocations based on the node recommendations generated by the agent. Using our custom solution, we performed a series of experiments with different workloads and compared the performance with the base Kubernetes scheduler. Our experimental results demonstrate a notable reduction of up to 24% in the energy consumption of the Kubernetes cluster.
Predictive maintenance has emerged as a crucial strategy in industrial settings to enhance operational efficiency and minimize downtime. This study focused on implementing predictive maintenance in the context of the Monoblock machine, utilizing data from vibration sensors integrated with production stoppage records. The objective was to forecast future stoppages in the production line with a narrow time window to enable proactive maintenance interventions. Xtreme Gradient Boosting with oversampling techniques was employed to address the challenges of imbalanced data classes. Through extensive experimentation and analysis, promising results were achieved, demonstrating the ability to predict future stoppages with high accuracy within a 3-minute time window. The findings underscore the effectiveness of machine learning approaches in predictive maintenance applications, mainly when dealing with real-time sensor data and complex operational environments.
This study delves into the characterization of synthetic lung nodules using latent diffusion models applied to chest CT scans. Our experiments involve guiding the diffusion process by means of a binary mask for localization and various nodule attributes. In particular, the mask indicates the approximate position of the nodule in the shape of a bounding box, while the other scalar attributes are encoded in an embedding vector. The diffusion model operates in 2D, producing a single synthetic CT slice during inference. The architecture comprises a VQ-VAE encoder to convert between the image and latent spaces, and a U-Net responsible for the denoising process. Our primary objective is to assess the quality of synthesized images as a function of the conditional attributes. We discuss possible biases and whether the model adequately positions and characterizes synthetic nodules. Our findings on the capabilities and limitations of the proposed approach may be of interest for downstream tasks involving limited datasets with non-uniform observations, as it is often the case for medical imaging.
Multiple sclerosis (MS) is a chronic autoimmune disease that predominantly affects the central nervous system (CNS) and is a leading cause of neurological disability among young adults. MS diagnosis heavily relies on clinical symptoms coupled with the detection of demyelinating lesions in the CNS, as depicted in conventional magnetic resonance images (MRI). These lesions may evolve over time and serve as critical biomarkers for assessing disease activity and the effectiveness of therapeutic interventions. Manual lesion delimitation in images is a tedious and error-prone process, prompting active research into automated systems. MS lesion segmentation can be conducted in two very different approaches: cross-sectionally: by using a single MRI, or longitudinally, by identifying changes such as new lesions or variations in lesion size across consecutives MRIs. Deep learning architectures such as U-Net, which employs convolutional neural networks with skip connections, have proven effective for MRI lesion segmentation. In this paper, we utilize the nnUNet v2 architecture and introduce an enhanced method capable of segmenting both cross-sectional and longitudinal MRIs. We used the ISBI 2015 dataset, training our U-Net model on a subset of patients using a generative pipeline and testing on another subset of patients from the same dataset. For training the model, we used a Bayesian generative approach to create different synthetic data simulating random variations in lesion masks. In this process, we have randomly simulated multiple temporal variations of the lesion masks, including erosion, dilation, and the removal of individual lesions within each mask. Then, we adapted the U-Net to train using a single MRI and its corresponding synthetic segmentation, which simulates the lesions in a previous time point. Thanks to the large amount of synthetic data generated, the U-Net is able to learn the intrinsic behavior of the lesions which include expansion, shrinkage, appearance and disappearance. This approach enables, for the first time, the two classical operational modes for MS lesion segmentation within a single model: cross-sectional segmentation, where the model receives a MRI and an empty mask as input, and longitudinal segmentation, where the input includes a MRI and a mask from a previous time point. Our approach achieved a Dice Similarity Coefficient (DSC) of 0.75 for cross-sectional segmentation and 0.81 for longitudinal segmentation, indicating improved performance when incorporating temporal information.This study demonstrates that it is possible to apply a generative method in conjunction with a U-Net architecture for both cross-sectional and longitudinal MRI lesion segmentation in MS, yielding promising results.
The rise of wearable EEG devices has opened the opportunity to develop new tools for neurological disorder monitoring, particularly for conditions like epilepsy. Machine learning plays a key role in processing the EEG signal towards an assessment of the person’s state, and eventually evaluating some condition risk. However, existing approaches often rely on raw EEG data, keeping a numerical representation of the information contained in the data. Conversely, in a previous work, we explored representing the EEG signals using sequential patterns. In this work, we analyze the potential of such a representation through several machine learning methods, including decision trees, support vector machines, k-nearest neighbors, and random forest. The experiments carried out with the CHB-MIT scalp EEG database of Physionet show the outperformance of random forest.
Birthweight (BW) is a critical marker for neonatal health and subsequent development. Neonates born with low BW (LBW) face heightened risks of various health complications, both immediate and long-term. While previous research has predominantly delved into physiological, ultrasound, and lifestyle determinants of BW, recent clinical studies indicate a potential association between neonatal BW and maternal nutrition, including factors like vitamin B12, plasma folate, and iron levels. However, this area remains largely unexplored in terms of predictive analytics. This study aims to bridge this gap by exploiting various machine learning (ML) models to estimate BW and investigate the influence of maternal nutritional factors on BW. Leveraging a cohort of 729 pregnant women monitored during their second trimester in Reus and Tarragona, Spain, the extreme gradient boost model demonstrated robust predictive performance, achieving a mean absolute error (MAE) of 210 grams and an R-squared (R2) value of 0.857. The findings of this study not only validate the predictive significance of maternal nutritional status but also underscore plasma folate and Vitamin B12 as the most influential predictors of BW. Furthermore, the predictive capacity of the proposed model is clarified through partial dependence plots on how variations in maternal factors significantly affect BW.
Diagnosis using time series data is an important area in medical domains. Machine learning models rely on large collections of data for successful generalisation. However, data collection in fields such as medicine is difficult, which limits the effectiveness of these models. Although techniques like data augmentation can help increase dataset sizes, they work best mainly with image data and not as well with other types of data. This is where generative models can fill this gap. We present a study of the application of a range of generative models (TimeGAN, WaveGAN, DDPM) and data inpainting models (SSSD, ExtraMAE) for time series on the domain of the classification of Inertial Measurement Units (IMUs). The aim is to assess their capabilities and the improvements obtained when used for data augmentation with different training and transfer learning methods. The results show that these methods generate synthetic data that, when added to the training data or used as pretraining data, improve accuracy. GAN methods lag behind diffusion denoising methods in generating realistic data, and are also more difficult to train. Inpainting methods obtained results similar to GAN methods but generated samples more similar to the real data and with a more stable training.
Breast density is a crucial biomarker for predicting BC risk and recurrence. Women with dense breast tissues have a higher likelihood of developing BC, and dense tissue can obscure lesions, reducing detection sensitivity. Mammograms are vital for evaluating breast density, typically classified using the BI-RADS system. The main challenge in breast density segmentation is accurately localizing dense tissues. While segmentation models require detailed pixel-wise annotations, obtaining these labels is time-consuming and requires medical expertise. This paper proposes a weakly supervised approach for breast density localization, allowing deep neural network classifiers to generate saliency maps that highlight dense tissue regions based on image-level labels. We validate this model on the RSNA dataset and achieve a Dice score of 0.754, comparable to state-of-the-art supervised methods.
Electronic Health Records (EHRs) contain valuable historical information for building clinical decision support systems. In this study, we focus on exploring novel techniques for improving the prediction of the severity degree of Diabetic Retinopathy (DR) in Diabetes Mellitus patients. In a previous paper, we evaluated the behaviour of different classifiers using the patients’ retrospective EHR data to assess their current level of DR, achieving good results. Continuing that work, we now focus on studying different methods for encoding numerical variables, in order to improve the accuracy of these predictions. We propose three normalization methods based on fuzzy sets for encoding numerical data. Because of the inherent uncertainty of medical data, using fuzzy logic to represent the numerical variables can enhance the accuracy of a classifier. The results of the experimental tests, conducted on a dataset of 2108 patients, show that for low-complexity classifiers (such as KNN or CNN) a classical fuzzification technique works the best, while for more complex architectures (like TapNet or ResNet) a fuzzy two-hot encoding gives the best performance. The final aim of the research is to build a clinical decision support system that can make an accurate and personalised prediction of DR evolution.
Brain imaging techniques, particularly magnetic resonance imaging (MRI), play a crucial role in understanding the neurocognitive phenotype and associated challenges of many neurological disorders, providing detailed insights into the structural alterations in the brain. Despite advancements, the links between cognitive performance and brain anatomy remain unclear. The complexity of analyzing brain MRI scans requires expertise and time, prompting the exploration of artificial intelligence for automated assistance. In this context, unsupervised deep learning techniques, particularly Transformers and Autoencoders, offer a solution by learning the distribution of healthy brain anatomy and detecting alterations in unseen scans. In this work, we evaluate several unsupervised models to reconstruct healthy brain scans and detect synthetic anomalies.
Structural defects, such as cracks, are crucial in various infrastructures, with their accurate delineation paramount for maintenance. However, existing methods often struggle to precisely segment cracks. Despite the advent of deep learning in image segmentation, the recurrent convolution and pooling operations tend to overlook vital edge information, thus compromising the final segmentation accuracy. This paper proposes a pixel-level crack segmentation network using a UNet architecture with a pre-trained ConvNext as the encoder, combined with Multiple Dimension Attention Enhancement (MDAE) blocks. The MDAE block enhances local edge information acquisition, leading to more precise crack segmentation. Experimental results on a public dataset, Crack500, demonstrate the proposed network’s effectiveness, achieving an IoU of 59.6% and an F1-score of 74.7%, thus significantly improving crack segmentation performance.
LHCb is one of the four largest high-energy physics experiments at CERN focused in high precision measurements of particle physics. The LHCb detector has undergone a recent upgrade [1] implying changes at subdetectors, data taking conditions and data processing model. Information from subdetectors is processed at 30MHz at a first trigger phase builded entirely with GPUs to reduce this rate down to 1MHz. Afterwards, the same information is processed in a second trigger phase that runs in CPUs, performing a complete reconstruction and identification of particles. This upgrade implies an evolution of the algorithms used at trigger level. In order to keep performance and speed up processing time, some of them have been replaced by machine learning algorithms. To perform particle identification, one of the LHCb approaches uses a neural network using the information from all subdetectors. In this paper we explain the advantages of this method and the capabilities that machine learning brings to LHCb focused in the global particle identification and throughput improvement achieved with it.
In the recent years, there has been a strong concern over the robustness of machine learning systems, specially when working in critical systems. One of such critical domains is cybersecurity, and a particular example is malware detection. This works aims to provide a formal technique to check the robustness of neural networks applied to the detection of malware. The technique is based on the automatic translation of the neural network to an equivalent set of equations that can be subsequently rigorously analyzed with respect to certain conditions for its input and output. That is, given a particular input for the neural network, check if there exist slight variations of such an input that can modify the output of the neural network. As a case study, we present preliminary results of a robustness analysis for a neural network that detects Windows PE malware. The results of the robustness analysis can be used to certify the robustness of the classifier or for improving such a classifier by fixing the flaws detected.
We formalise human teamwork in tasks involving judgment as a public goods game. Our focus is on tasks where members’ contributions are combined through weighted averaging, such as brain-storming. Using a multiagent system, we examine the alignment between learned agent strategies and Nash Equilibria. Overall, our results demonstrate that our multiagent system effectively approximates the Nash Equilibria of the game.
Accurate detection of white matter (WM) lesions is essential for diagnosing and monitoring Multiple Sclerosis (MS), but manual lesion identification is challenging and time-consuming. This study employs the “no new U-Net” (nnU-Net) version 2 architecture to enhance the lesion segmentation process. We trained our model with a fine-tuned version of the default nnU-Net configuration incorporating extreme oversampling and a smaller learning rate to improve new or evolving lesion detection. Results showed that our nnU-Net v2 achieved a F1 score of 0.73 for baseline lesions and 0.75 for new or evolving lesions, demonstrating notable performance in identifying both types of lesions, and that the model generalized well to the MSSEG-2 dataset. This study highlights the capabilities of the nnU-Net v2 architecture for robust WM lesion detection in longitudinal cohorts. The final phase involved packaging our top-performing ensemble of models into a Docker container for easy usage, enabling the automatic distinction between baseline and new or evolving lesions.
Home assistants are essential today, but they typically support only popular languages. Promoting products that enhance underrepresented languages is crucial for preservation. Using a home assistant in one’s native language, such as Catalan, is a significant step toward this goal. Keyword spotting (KWS) and speech recognition are two potential solutions. The lightweight architecture of KWS models is promising for low-powered edge devices in domotic environments. However, there is a lack of resources to train such models, especially for Catalan. This paper presents a solution using forced alignment techniques with speech-to-text models to extract any set of words from any speech resource. While our focus is on Catalan, this methodology can be applied to other languages.
Cardiac digital twins represent the required functional mechanisms of patient hearts to evaluate therapies and inform clinical decision-making virtually. A scalable generation of cardiac digital twins can enable virtual clinical trials on virtual cohorts to fast-track therapy development. Here, we present an open-source digital twinning framework for personalising electrophysiological function based on routinely acquired magnetic resonance imaging (MRI) data and the standard 12-lead electrocardiogram (ECG). We extended a Bayesian-based inference framework to infer electrical repolarisation characteristics. Fast simulations are conducted with a decoupled reaction-Eikonal model, including the Purkinje network and biophysically-detailed subcellular ionic current dynamics. Parameter uncertainty is represented by inferring a population of ventricular models rather than a single one, which means that parameter uncertainty can be propagated to virtual therapy evaluations. The framework is demonstrated in a healthy female subject, where our inferred reaction-Eikonal models reproduced the patient’s ECG with a Pearson’s correlation coefficient of 0.93. The methodologies for cardiac digital twinning presented here are a step towards personalised virtual therapy testing. The tools developed for this study are open-source, ensuring accessibility, inclusivity, and reproducibility, this is available on GitHub.
In this article, we introduce a tool, Popinns, to implement Deep Neural Networks (DNNs) on fixed point architectures. Popinns takes as input the Tensorflow model of a DNN whose coefficients are floating-point numbers and generates a C code in fixed-point arithmetic. The approach implemented in Popinns is based on a formal semantics describing the propagation of the errors through the computations performed by the network. From this semantics, we deduce a system of constraints made of inequalities between linear expressions among integers and of min and max operations. The solution of this system, computed by an optimizing SMT solver, gives the optimal formats of the fixed-point numbers at each point of the DNN. As a result, we synthesize a fixed-point C code that satisfies an error bound set by the user with respect to the initial Tensorflow model. The present article describes Popinns architecture, its features as well as the intermediary and final results computed by the tool.
Users can share their opinion visiting a restaurant or a hotel by Online Generated Reviews OGRs on platforms such as TripAdvisor Booking or Yelp. Put all together they are thousands of sentences which are quite difficult to seize for a human and to get a comprehensive opinion of the location. This study proposes a Decision Support System DSS composed of three modules extraction of information from TripAdvisor comments summarizing and rating. Compared to prior Research our solution proposes a Neural Network Transformer-based system to summarize and rate thousands of TripAdvisor comments. Our results are bifold. First the analysis of massive comments downloads reveals a bias between the real customer experience based on verbal opinions and the ratings scored in stars. Second we present and online host a DSS which provides a summary of customer experiences per hotel. For Research in Tourism and Hospitality it represents a new milestone in the artificial Intelligence journey and an application of Generative Pretrained Transformer GPT model. For operation Managers it is a novel application of the use of artificial intelligence to embrace the digital revolution. Indeed it helps to determine what customers value most and determine adequate action plan to business requirements.