
Ebook: Artificial Intelligence Research and Development

Artificial intelligence has become an integral part of all our lives. Development is rapid in this exciting and far-reaching field, and keeping up to date with the latest research and innovation is crucial to all those working with the technology.
This book presents the proceedings of the 24th edition of CCIA, the International Conference of the Catalan Association for Artificial Intelligence, held in Sitges, Spain, from 19 – 21 October 2022. This annual event serves as a meeting point not only for researchers in AI from the Catalan speaking territories (southern France, Catalonia, Valencia, the Balearic Islands and Alghero in Italy) but for researchers from around the world. The programme committee received 59 submissions, from which the 26 long papers and 23 short papers selected for presentation at the conference by the 62 experts who make up the committee are included here. The book is divided into the following sections: combinatorial problem solving and logics for artificial intelligence; sentiment analysis and text analysis; data science, recommender systems and decision support systems; machine learning; computer vision; and explainability and argumentation. This book also includes an abstract of the invited talk given by Prof. Fosca Giannotti.
Providing a comprehensive overview of research and development, this book will be of interest to all those working in the field of Artificial Intelligence.
The International Conference of the Catalan Association for Artificial Intelligence (CCIA) is an event which serves as a meeting point, not only for researchers in Artificial Intelligence based in the area of the Catalan speaking territories (southern France, Catalonia, Valencia, the Balearic Islands and Alghero in Italy), but also for researchers around the world.
This book constitutes the proceedings of the 24th edition of the CCIA, held in Sitges, in October 2022. Previous editions of the CCIA were held in Tarragona (1998), Girona (1999), Vilanova i la Geltrú (2000), Barcelona (2001, 2004, 2014, 2016), Castelló de la Plana (2002), Mallorca (2003), Alghero (Sardinia) (2005), Perpignan‘ (France) (2006), Andorra (2007), Sant Martí d’Empúries (2008), Cardona (2009), L’Espluga de Francolí (2010), Lleida (2011), Alacant (2012), Vic (2013), València (2015), Deltebre (2017), Roses (2018), Colònia de Sant Jordi (2019) and Lleida (2021). CCIA was cancelled in 2020 due to the restrictions caused by the COVID-19 outbreak.
The 26 long papers and the 23 short papers presented in this volume were carefully reviewed and selected from 59 submissions. This reviewing process was made possible thanks to the 62 artificial intelligence experts who make up the programme committee. We especially thank them for their efforts in this task, and would also like to express our appreciation for the work of the authors of the 59 submissions.
The accepted papers deal with all aspects of artificial intelligence, including combinatorial problem solving and logics for artificial intelligence, sentiment analysis and text analysis, data science, recommender systems and decision support systems, machine learning, computer vision, and explainability and argumentation. This book of proceedings also includes the abstract of the invited talk, given by Prof. Fosca Giannotti.
We would like to express our sincere gratitude to the Catalan Association for Artificial Intelligence (ACIA), the Institut d’Investigació en Inteligència Artificial (IIIA-CSIC), the Universitat de València (UV) and the Barcelona Supercomputing Center (BSC) for their support.
Institut d’Investigació en Inteligència Artificial (IIIA-CSIC), October 2022
Tommaso Flaminio, Institut d’Investigació en Intel·ligència Artificial Francisco Grimaldo, Universitat de València Atia Cortés, Barcelona Supercomputing Center
It is a known fact that the order in which touristic activities are experienced plays a role in how enjoyable they are. This is the reason why tourists prefer to book carefully prepared day tours on arrival to a new destination, as they allow them to see the essence of the destination while traversing scenic routes. Tours are great, but they are expensive, do not allow room for personal exploration, and are built as a one-size-fits-all which does not consider the individual preferences of the tourist. In contrast, it is possible to make an optimal selection and ordering of touristic activities from a larger set of possibilities that match a tourist’s personal preferences, balancing important aspects like diversity, spatial proximity, or degree of interest on popular places. We propose a multi-objective genetic algorithm that uses a weighted averaging operator to balance four diverse objective functions crafted to maintain diversity, proximity, interest on popularity, and cultural preference. The system has been evaluated against four baseline algorithms and found to perform significantly better for the specified purpose.
In the last decades, many families of aggregation functions have been presented playing a fundamental role in many research fields such as decision making, fuzzy mathematical morphology, etc. For this reason, it is necessary to study different types of operators to be potentially used in a concrete application as well as the properties they can satisfy. In this paper, conjunctive and disjunctive rational bivariate aggregation functions of degree two in the numerator and degree one in the denominator are studied. In particular, a characterization of conjunctive and disjunctive rational aggregation functions of degrees (2,1) is presented. Moreover, the symmetry property of these operators are investigated.
In a recent work we introduced a problem about finding the highest polarized bipartition on a weighted and labeled graph that represents a debate developed trough some social network, where nodes represent user’s opinions and edges agreement or disagreement between users. Finding this target bipartition is an optimization problem that can be seen as a generalization of the maxcut problem, so we first introduced a basic local search algorithm to find approximate solutions of the problem. In this paper we go one step further, and we present an exact algorithm for finding the optimal solution, based on an integer programming formulation, and compare the performance of a new variant of our local search algorithm with the exact algorithm. Our results show that at least on real instances of the problem, obtained from Reddit debates, the approximate solutions obtained are almost always identical to the optimal solutions.
Obviously, we do not prove P = NP in this article. In fact, the title only refers to the first part, where the proof that we present contains an error that, to make reading more attractive, is only revealed in the second part.
In the second part, we describe how the reduction of SAT to Max2XOR and the proof system presented in the first part –although they do not solve one of the Millennium Prize Problems– may trigger new complementary ways of solving the SAT problem.
We define a new MaxSAT tableau calculus based on resolution. Given a multiset of propositional clauses ϕ, we prove that the calculus is sound in the sense that if the minimum number of contradictions derived among the branches of a completed tableau for ϕ is m, then the minimum number of unsatisfied clauses in ϕ is m. We also prove that it is complete in the sense that if the minimum number of unsatisfied clauses in ϕ is m, then the minimum number of contradictions among the branches of any completed tableau for ϕ is m. Moreover, we describe how to extend the proposed calculus to solve Weighted Partial MaxSAT.
Prioritization of activities is one of the many facets in digital products management. In this study, we propose an alternative point of view on Portofolio Projects Management, in order better understand their relationship between performance and importance and how resources can be allocated taking into account these two categories. An application of the presented methodology to a real use case in the Digital Department of SEAT S.A. is discussed to show current results and future developments.
Natural Language Processing involves reshaping and refining data sets into data that can be used for analysis, ensuring that the data is well formatted. The efficiency gap of data scientists spending most of their time preparing data is an opportunity for the technology sector to work on solutions to the problem. For this reason, a web tool has been developed that is capable of, on the one hand, speeding up the text cleaning process and, on the other hand, facilitating the extraction of metrics by analyzing and processing the texts through customized dictionaries in LIWC format, uploaded by the users themselves, and through sentiment analysis. All this, from a single interface that allows the user to customize the whole pipeline offering different modules for pre-processing and metrics extraction in order to be a solution to facilitate, streamline and automate the whole process.
Social media offers an invaluable wealth of data to understand what is taking place in our society. However, the use of social media data to understand phenomena occurring in populations is difficult because the data we obtain is not representative and the tools which we use to analyze this data introduce hidden biases on characteristics such as gender or age. For instance, in France in 2021 women represent 51.6% of the population [1] whereas on Twitter they represent only 33.5% of the french users [2]. With such a difference between social networks user demographics and real population, detecting the gender or the age before going into a deeper analysis becomes a priority. In this paper we provide the results of an ongoing work on a comparative study between three different methods to estimate gender. Based on the results of the comparative study, we evaluate future work avenues.
This paper presents an approach for analysing food-porn images and their related comments published by the cooking school Getcookingcanada Instagram account. Our approach processes the published images to extract colour parameters, counts the number of likes, and also analyses the comments related to each publication. A dataset containing all these was built, and methods were applied to study correlations among the data: a regression analysis, an ANOVA and a sentiment analysis of the comments on the dataset to explain the relation between the quantity of likes and the sentiment obtained from the food images. Our results show a correlation between the number of likes and the sentiment analysis of the comments. Images that evoke a positive sentiment have a higher number of likes and comments. Users’ experience on creating posts is also analysed and confirms a positive correlation between the number of likes and the publisher’s experience.
Memes evolve and mutate through their diffusion in social media. They have the potential to propagate ideas and, by extension, products. Many studies have focused on memes, but none so far, to our knowledge, on the users that post them, their relationships, and the reach of their influence. In this article, we define a meme influence graph together with suitable metrics to visualize and quantify influence between users who post memes, and we also describe a process to implement our definitions using a new approach to meme detection based on text-to-image area ratio and contrast. After applying our method to a set of users of the social media platform Instagram, we conclude that our metrics add information to already existing user characteristics.
Cities are becoming data-driven, re-engineering their processes to adapt to dynamically changing needs. A.I. brings new capabilities, effectively enlarging the space of policy interventions that can be explored and applied. Therefore, new tools are needed to augment our capacity to traverse this space and find adequate policy interventions. Digital twins are revealing themselves as powerful tools for policy experimentation and exploration, allowing faster and more complete explorations while avoiding costly interventions. However, they face some problems, among them data availability and model scalability. We introduce a digital twin framework based on an A.I. and a synthetic data model on NO2 pollution as a proof-of-concept, showing that this approach is feasible for policy evaluation and (autonomous) intervention and solves the problems of data scarcity and model scalability while enabling city level Open Innovation.
When working with Intelligent Decision Support Systems (IDSS), data quality could compromise decisions and therefore, an undesirable behaviour of the supported system. In this paper, a novel methodology for time-series online data imputation is proposed. A Case-Based Reasoning (CBR) system is used to provide such imputation approach. The CBR principle (i.e., solving the current problem using past solutions to similar problems) may be applied to data imputation, using values from similar past situations to replace incorrect or missing values. To improve the performance of the data imputation process, optimal case feature weights are obtained using genetic algorithms (GA). The proposed methodology is validated with data obtained from a real Waste Water Treatment Plant (WWTP) process.
Federated learning implies the integration of shared data. Privacy-enforcing platforms should be implemented to provide a secure environment for federated learning. We are proposing the integration of real world data from local data lakes and the generation and use of general synthetic data to simplify, eventually avoid, encryption or differential learning and use general architectures for data spaces.
This paper describes a preliminary approach towards automating the compliance checking of constructions with respect to building regulations. We describe a prototype that supports such automated checking by specifying regulations in terms of an ontology, and reasoning with the Building Information Models (BIM) of constructions. The first step in our approach is to translate regulations into a machine-readable format with the support of controlled natural language specifications of rules. Then, we propose a formal specification of the building regulations in OWL2, the de facto standard for ontology engineering on the web. We sub-sequently populate this ontology with data of real-world BIM specifications based on Industry Foundation Classes (IFC) in order to check their compliance with the formalized regulations. Finally, our prototype offers to the end-users a verification report in text and a graphical visualiser with the results of the compliance check. To explain how our prototype works and to demonstrate its applicability, we show some examples taken from a concrete use case.
Darknet is an encrypted portion of the internet for users who intend to hide their identity. Darknet’s anonymous nature makes it an effective tool for illegal online activities such as drug trafficking, terrorist activities, and dark marketplaces. Darknet traffic recognition is essential in monitoring and detection of malicious online activities. However, due to the anonymizing strategies used for the darknet to conceal users’ identity, traffic recognition is practically challenging. The state-of-the-art recognition systems are empowered by artificial intelligence techniques to segregate the Darknet traffic data. Since they rely on processed features and balancing techniques, these systems suffer from low performance, inability to discover hidden relations in data, and high computational complexity. In this paper, we propose a novel decision support system named Tor-VPN detector to classify raw darknet traffic into four classes of Tor, non-Tor, VPN, and non-VPN. The detector discovers complex non-linear relations from raw darknet traffic by our deep neural network architecture with 79 input artificial neurons and 6 hidden layers. To evaluate the performance of the proposed method, analyses are conducted on a benchmark dataset of DIDarknet. Our model outperforms the state-of-the-art neural network for darknet traffic classification with an accuracy of 96%. These results demonstrate the power of our model in handling darknet traffic without using any preprocessing techniques, like feature extraction or balancing techniques.
Reproducibility is a challenging aspect that considerably affects the quality of most scientific papers. To deal with this, many open frameworks allow to build, test, and benchmark recommender systems for single users. Group recommender systems involve additional tasks w.r.t. those for single users, such as the identification of the groups, or their modeling. While this clearly amplifies the possible reproducibility issues, to date, no framework to benchmark group recommender systems exists. In this work, we enable reproducibility in group recommender systems by extending the LibRec library, which stands out as one of the richest, with more than 70 different recommender algorithms, good performance and several evaluation metrics. Specifically, we include several approaches for all the stages of group recommender systems: group formation, group modeling strategies, and evaluation. To validate our framework, we consider a use-case that compares several group building, recommendation, and group modeling approaches.
Recommender systems are a form of artificial intelligence that is used to suggest items to users of digital platforms. They use large data sets to infer models of users’ behavior and preferences in order to recommend items that the user may be interested in. Following the trend imposed by digital media companies and willing to adapt to the media consumption habits of their customers, TV broadcasters are starting to realize the potential of recommender systems to personalize the access to their online catalog. By understanding what viewers are watching and what they might like, TV broadcasters can improve the quality of their programming, increase viewership, and attract new viewers.
In this work, we analyze one specific group of users that TV broadcasters must take into account when creating a recommender system: non-logged users. In this scenario the challenge is to use contextual information about the interaction in order to predict recommendations, as it is not feasible to use any kind of information about the user. We propose a method to leverage data from other type of users (logged users and identified devices) by using Graph Convolutional Networks in order to come up with a more accurate recommender system for unidentified users.