Ebook: Fuzzy Systems and Data Mining X
A fuzzy system is a system that uses fuzzy logic to associate variables and process information. Fuzzy systems are used in many applications, including machine control, air conditioning, and traffic control. Fuzzy systems also underlie almost every item of news that has any qualitative part, and numbers are translated into words in many daily activities, such as the weather forecast, or topics such as the economy and industry.
This book presents the proceedings of FSDM 2024, the 10th International Conference on Fuzzy Systems and Data Mining, held from 5 to 8 November 2024 in Matsue, Japan. With an emphasis on fuzzy theory, algorithm and system, fuzzy application, data mining and the interdisciplinary field of fuzzy logic and data mining, FSDM 2024 also included special sessions on hot topics in related research fields, including special sessions on applied mathematics and intelligent algorithms for modern industry (AMIAMI), the application of generative AI, and safeguarding AI-based automotive and automation products. A total of 237 submissions were received for the conference, and after undergoing a thorough review process, 71 papers were accepted for presentation and inclusion here, resulting in an acceptance rate of 30%. The papers are divided into 3 sections: fuzzy set theory, algorithm and system; data mining; and the interdisciplinary field of fuzzy logic and data mining.
Providing an overview of current research and development, the book will be of interest to all those using fuzzy systems and data mining as part of their work.
The 10th edition of FSDM, FSDM2024, attracted many professors and post-graduate students – particularly Ph.D. scholars from numerous universities worldwide – as well as technicians, policymakers and top managers of organizations and businesses from various industries. This year, as the conference celebrates its first decade, the theme is fuzzy systems and data mining. The book is divided into three parts: Part I, Fuzzy Set Theory, Algorithm and System; Part II, Data Mining; and Part III, Interdisciplinary Field of Fuzzy Logic and Data Mining. The most popular topics are feature selection, instance selection, outlier detection and missing values imputation; nonetheless in the last couple of years data normalisation has also become a trend. Fuzzy systems are present in almost every item of news containing any qualitative part, and numbers have been replaced by the word in most daily activities such as the weather forecast, the rhythm of development, or many topics such as the economy, industry and so on.
All the submitted papers were thoroughly reviewed by dedicated international Technical Programme Committee (TPC) members and anonymous reviewers and, who took into account the breadth and depth of the research topics falling within the scope of FSDM. The 71 most promising and FAIA mainstream-relevant contributions from about 237 submissions have been included in this book, resulting in an acceptance rate of 30%.
I would like to take this opportunity to thank all keynote and invited speakers, authors, programme committee members and anonymous reviewers whose work made this conference possible. Thanks are also due to the members of TPC and local committees for their efforts in fostering a successful FSDM conference. Last but not least, I would like to thank the editors and other colleagues from IOS Press for their joint efforts in publishing this volume in the book series Frontiers in Artificial Intelligence and Applications (FAIA) as the tenth edition of the FSDM conference series.
November 2024
Antonio J. Tallón-Ballesteros
unmapped: uri https://orcid.org/0000-0002-9699-1894
Department of Electronic, Computer Systems and Automation Engineering, University of Huelva (Spain), Huelva city, Spain
Considering the uncertainty of stock data, this paper studies interval-valued fuzzy portfolio decision model with ESG and some realistic constraints. First, a screening process is employed to identify investable stocks based on their ESG ratings. Then, a multi-period portfolio decision model with short-selling constraint is established by maximizing terminal wealth and minimizing terminal risk. By using the weighted programming method the proposed portfolio model is transformed into a single-objective model to solve the optimal decision strategy. Finally, a numerical example of stock data from the Shenzhen Stock Exchange is given to illustrate the efficiency of the presented portfolio decision model.
In today’s world, making decisions as a group is common, whether choosing a restaurant or deciding on a holiday destination. Group decision-making (GDM) systems play a crucial role by facilitating consensus among participants with diverse preferences. Discussions are one of the main tools people use to make decisions. When people discuss alternatives, they use natural language to express their opinions. Traditional GDM systems generally require participants to provide explicit opinion values to the system. However, in real-life scenarios, participants often express their opinions through some text (e.g., in comments, social media, messengers, etc.). This paper introduces a sentiment and emotion-aware multi-criteria fuzzy GDM system designed to enhance consensus-reaching effectiveness in group settings. This system incorporates natural language processing to analyze sentiments and emotions expressed in textual data, enabling an understanding of participant opinions besides the explicit numerical preference inputs. Once all the experts have provided their preferences for the alternatives, the individual preferences are aggregated into a single collective preference matrix. This matrix represents the collective expert opinion regarding the other options. Then, sentiments, emotions, and preference scores are inputted into a fuzzy inference system to get the overall score. The proposed system was used for a small decision-making process – choosing the hotel for a vacation by a group of friends. Our findings demonstrate that integrating sentiment and emotion analysis into GDM systems allows everyone’s feelings and opinions to be considered during discussions and significantly improves consensus among participants.
Pseudorandom number generators are essential for various stochastic simulations in physics, economics, engineering, etc. In this study, we propose a practical method for estimating appropriate sample sizes for lagged Fibonacci generators, one of the most traditional and widely used pseudorandom number generators. This estimation is based on an analogy with the theoretical one-dimensional random walk. The proposed method is formulated using a weight enumeration polynomial and the MacWilliams identity in coding theory. Generally, a weight enumeration polynomial requires an intractable exhaustive check of the seeds. However, under certain conditions, the MacWilliams identity allows for the direct derivation of the weight enumeration polynomial. To improve efficiency, we employ a heuristic technique to approximate the weight enumeration polynomial by truncating “non-essential” terms motivated by Fourier analysis in signal processing.
This study explored an improved English language evaluation algorithm for virtual learning environments, particularly in metaverse educational settings. By integrating big data analysis with fuzzy measure theory, the model extracted learners’ language usage habits, progress, and difficulties from large datasets, overcoming the limitations of traditional subjective evaluation methods. Fuzzy measures quantified subjective factors such as clarity, fluency and intonation, while a Sugeno integral approach combined these measures into an overall score. Comparisons with traditional methods have shown significant improvements in the assessment of speaking skills across a range of proficiency levels.
There are many different types of numerical analysis for partial differential equations. Numerical methods that have been studied over a long period of time and have been used in engineering practice include the finite difference method (FDM), finite element method (FEM), and boundary element method (BEM). This paper focuses on the FDM. Conventionally, numerical calculations using the FDM have mostly been performed with second-order accuracy. It usually guarantees a calculation accuracy of 3 to 4 significant figures, which is considered to be sufficient accuracy from an engineering perspective. We generally recognize that numerical calculations are approximate calculations. However, when the FDM is generalized in a way that incorporates the conventional theory, it becomes possible to perform unlimitedly high-accuracy numerical calculations, and it becomes possible to easily perform calculations that converge to a theoretical numerical solution with a certain number of significant figures. This type of numerical calculation system is defined as the interpolation FDM (IFDM). The essential requirement for the establishment of IFDM is the algebraic polynomial approximation of the (real number) analytic function. This paper describes this concept and outlines the theoretical features and computational examples of IFDM, which has three characteristics: (i) the ability to handle arbitrary domains and arbitrary boundary conditions, (ii) unlimited high-accuracy calculations, and (iii) high-speed calculations.
Q-rung orthopair hesitant fuzzy set (q-ROHFS) is a powerful instrument for addressing uncertainty problems. Nevertheless, the classification methods of three-way multi-attribute group decision-making (TWD-MAGDM) under this new model have been seldom researched, and the current TWD-MAGDM method in a hesitant fuzzy environment fails to consider the psychological behavior and fuzzy correlation of decision-maker, resulting in not enough distinction among classified objects. To resolve this issue, we present a novel TWD-MAGDM classification model for a q-rung orthopair hesitant fuzzy (q-ROHF) environment. Firstly, this paper considers the fuzzy correlation by allocating weight through Shapely and combines the prospect theory and Gaussian function to develop a preference function that can accurately describe the loss and gain. Based on this function, it presents a relative utility function that can more accurately measure the utility. Secondly, we provide a conditional probability that considers psychological factors and has enhanced recognition capabilities. Finally, a novel TWD-MAGDM classification model for q-ROHF is provided based on the new relative utility function and conditional probabilities. We subsequently verify the efficacy of the proposed approach.
Interval-valued hesitant Fermatean fuzzy sets (IVHFFSs) offer a powerful mathematical tool for addressing decision-making problems full of uncertainty and ambiguity. Despite their potential, multi-attribute group decision-making (MAGDM) methods within this context have not been extensively explored. This paper addresses this gap by introducing a consensus-based MAGDM approach for IVHFFSs. First, we propose an interval-valued hesitant Fermatean fuzzy (IVHFF) distance measure without normalization, which does not generate data redundancy. Next, we propose an objective expert weighting method based on hesitancy and consensus level. Furthermore, combined with the proposed distance measure and expert weighting method, we propose an IVHFF consensus model. Since the consensus model directly adjusts the expert evaluation value through the distance matrix, it effectively improves the efficiency of consensus reaching. Finally, based on the proposed consensus model, we propose a novel MAGDM method under IVHFF environment.
This study explores the effectiveness of the traditional Firefly algorithm (FA) in optimizing the Gaussian Kernel-based Fuzzy C-means clustering (GKFCM) algorithm by adjusting ‘sigma’ and ‘m’. We compare GKFCM with FA optimization (With FA) to without it (Without FA) using the Calinski Harabasz (CH) index and the number of iterations. For all four datasets analyzed in this study, the findings consistently indicate that the GKFCM algorithm optimized with the Firefly algorithm (FA) performs substantially better than its non-optimized counterpart, achieving higher Calinski Harabasz (CH) scores and requiring fewer iterations across various data types. Results from two initial particle distribution styles confirm FA’s robustness in refining clustering outcomes and emphasize its role in enhancing clustering quality and efficiency.
This study evaluates various distance functions in Gaussian Kernel-based fuzzy C-means clustering across six datasets. Key findings include the superior performance of the Cosine distance, which consistently yielded the lowest Davies-Bouldin scores, notably 0.3823 and 0.5226 for Datasets 5 and 6, and required fewer iterations to converge, with figures as low as 7 and 8. Squared Euclidean distance also showed effectiveness, particularly with fewer iterations needed for convergence, such as 22 and 41 for Datasets 1 and 3. In contrast, Chebyshev and Minkowski distances, requiring up to 119 iterations for Dataset 2, demonstrated lower efficiency. The evaluation process also involved additional clustering quality metrics, including the Silhouette and Calinski-Harabasz indices, though only the Davies-Bouldin results are detailed in this paper. This analysis highlights the importance of choosing appropriate distance metrics to optimize clustering quality and computational efficiency.
Interval-value Fermatean hesitant fuzzy sets (IVFHFSs) are new data model for dealing with complex, uncertain information. However, the researches on IVFHFSs score functions (SFs) have low discrimination rate and existing multi-attribute group decision-making under IVFHFSs are scarce and have no ability to classify. Therefore, this paper establishes a new three-way multi-attribute group decision-making (3W-MAGDM) based on IVFHFSs. First, novel IVFHFSs SFs are proposed. Next, the objective conditional probabilities calculation method is derived by using the probabilistic dominance relation and the relative loss functions calculation method in three-way decision is developed. The subjectivity of 3W-MAGDM under IVFHFSs is greatly reduced. Finally, a new 3W-MAGDM framework based on IVFHFSs is constructed. The new approach has a high discrimination rate in SFs and it not only has a ranking function but also has a categorization function.
Multiple connected regions bounded by circles are crucial from the point of view of analyzing physical problems and reducing the amount of computation. However, finding a conformal mapping function that maps a multiple connected region to circular domain is challenging. Koebe’s iterative method provides a theoretically feasible path for the conformal mapping of a multiple connected region to a circular region. In this study, a numerical implementation of Koebe’s iterative method is accomplished using the charge simulation method, and an algorithm for conformal circular mapping of unbounded multiple connected regions is proposed. Through numerical experiments, this paper successfully verifies the effectiveness of the proposed algorithm.
The motivation behind fuzzy logic in data mining is to address the inherent uncertainty and imprecision in real-world data and make the mined results more interpretable for humans. Temporal Fuzzy High Utility Itemset Mining, which incorporates transaction time, is an emerging field with significant potential for analyzing time-sensitive data. Although several studies in this area have been conducted, for instance, recently fuzzy list-based approaches, a significant challenge remains in joining operations of conditional fuzzy lists when generating candidate itemsets. To solve this, we have proposed a pruning strategy based on item co-occurrences to reduce the number of join operations using anti-monotonic property. Experiments on real datasets show our approach outperforms traditional algorithms in terms of runtime and candidate generations with little memory overhead, up to 95% of non-promising candidates are pruned.
Density peak clustering is a clustering strategy that groups data points based on their density in the datasets, which determines cluster centers by finding density peak points and clustering around these centers. It does not require iterative process, nor does it require the user to input too many parameters, which makes it more efficient and easy to use. However, selecting cluster centers manually by decision graph is a major limitation of the algorithm. In the existing research, automatically generate cluster centers methods were proposed, but it didn’t take the contribution of different distances when calculating the local density. In this paper, fuzzy neighborhood was employed to measure the proximity between data points to automatically identify the cluster centers. We redefine the fuzzy local density and the fuzzy relative distance in density peak clustering based on fuzzy neighborhood, and automatically generate cluster center based on their statistics. Compared to traditional algorithms, this method has not added any additional parameters. To verify the effectiveness of the proposed algorithm, we conducted comparative experiments with existing algorithms.
As one interpretable technique, fuzzy neural network (FNN) can be equipped into any deep models, but it faces the problem of high dimension when FNN is used for designing deep networks. To break the curse of dimensionality, in this paper, we use regularization method to reduce the influence of dimensionality on FNN, and based on L2,1 norm (Group Lasso) two different regularizer terms are designed. Using the gradient method, FNN can learn to evaluate the importance of features and rules, respectively, and then realizes feature selection (FS) and rule generation (RG). By dint of the simulation results on the benchmark classification problems Iris and Sonar, the validity of our proposed fuzzy classifier is verified, i.e., without decreasing the classification performance, the structure of the fuzzy model can be simplified. The regularized FNN can be easily used for interpretable deep model design.
Boolean Networks (BN) and Probabilistic Boolean Networks (PBNs) are useful models for genetic regulatory networks, healthcare service systems, manufacturing systems and financial risk. This paper focuses on the construction problem of PBNs. We propose the Division Pre-processing algorithm (DPre), which breaks a non-negative integer matrix P with constant positive column sum into two non-negative integer matrices Q˜ and R˜, each with constant column sum, such that P = dQ˜ + R˜ for some positive integer d. We combine DPre with two existing PBN construction algorithms to form a novel PBN construction algorithm called the Single Division Pre-processed SER2-GER algorithm (SDS2G). Our computational experiments reveal that SDS2G gives significantly better performance compared with other existing PBN construction algorithms, and that SDS2G is fast. Lastly, we derive a new lower bound related to PBN construction, and this new theorem generalizes a lower bound theorem in a previous paper.
Image processing has become a central topic in the era of big data, particularly within computer vision, due to the growing volume and diverse resolutions of images. Low-resolution images introduce uncertainty, underscoring the need for high-performance classification methods. Convolutional Neural Networks (CNN), especially the U-Net architecture, are widely applied for pixel-level segmentation due to their encoder-decoder structure. This study applied U-Net on a CT scan image dataset to segment lung images, followed by a CNN classifier to classify lung cancer stages (I, II, IIIa, IIIb). The U-Net model outperformed standard CNNs, achieving 99% in accuracy, precision, sensitivity, and F1 score, compared to the conventional CNN’s 97%, 95%, 97%, and 96%, respectively.
Community search (CS), aiming to find a densely connected subgraph containing given query vertices, is an important research topic in social network analysis and is widely used in similar recommendation, team organization, friendship recommendation and other practical applications. The purpose of the CS system is to display searched community in a visual form to users. It can help users better understand and analyze networks, making better decisions. However, the exist CS systems are mostly designed for static graphs, they cannot capture the dynamic attributes and cannot intuitively display the dynamic changes of the community. In this paper, we develop a CS system over dynamic graph based on graph neural network (GNN), aiming to locate the community with cohesive attributes over dynamic graph and visualize the community to intuitively display the dynamic changes of vertices and the relationships between them. We design a GNN-based method to capture the dynamic changes of attributes and design a friendly front-end interface that visualizes the result community in the form of a timeline. It allows users to view the status of the result community at any snapshot and fine-tune the result community according to their own conditions.
Sequential recommendation aims to predict users’ next preferred items according to their interaction sequences. Existing methods mainly utilize user-item interaction information, which may suffer from the issue of semantic information loss. In the paper, a Meta-Path guided Pre-training method for sequential Recommendation (MPPRec) is proposed to capture rich and meaningful semantic information between users and items. Specifically, MPPRec firstly learns the node embeddings guided by meta-paths in the pre-training phase. Then, the node embeddings are optimized according to task in the fine-tuning phase. Extensive experiments conducted on four real datasets demonstrate MPPRec outperforms the baseline methods.
Precise and effective software promotion strategies are crucial for software companies. By introducing causal machine learning techniques to analyze software promotion data provided by software companies, we aim to reveal the extent to which the promotion strategy of technical support services affects sales effectiveness and explore the heterogeneity of this effectiveness across different customer groups. The results show that the introduction of technical support can generate a 60% increase in sales revenue for software companies. By comparing multiple causal machine learning models, Linear DML emerge as the optimal model, which is used to further explore customer subgroups with high causal effects and to interpret the model with SHapley Additive exPlanations (SHAP). We find that features such as larger company size, more personal computers, fewer employees, and less IT spending are customer features that are more responsive to technical support services. The findings are expected to provide software companies with a theoretical basis for developing more effective promotion strategies and identifying promotional customer groups.
In the rapidly evolving landscape of e-commerce, online shopping has gradually gained widespread acceptance. Yet, physical retail channels continue to exert a significant influence on product sales. Within this framework, sales forecasting occupies a pivotal position in traditional commerce and is indispensable for guiding corporate strategy formulation. However, most existing forecasting models fail to fully harness the potential impact of multi-source external information on consumer purchasing behavior. To address these challenges, this study proposes a novel sales forecasting model that synergizes the Extreme Gradient Boosting (XGBoost) algorithm with multi-source data integration techniques. Not only have we comprehensively collected an extensive range of external data, but we have also integrated the embedding and aggregation techniques of Point of Interest (POI) data to enrich the set of external information features available to the model. Leveraging feature engineering techniques, these heterogeneous data are transformed into formats amenable to model training. Relying on the nonlinear modeling capabilities of XGBoost and its efficacy in handling large-scale datasets, we trained and optimized the model. Empirical results indicate that our proposed method, predicated on multi-source data integration, significantly outperforms traditional models based on a single data source, thereby enhancing prediction accuracy and providing more precise inventory management strategy support for businesses.
As a knowledge representation tool, knowledge graph (KG) has been widely used. In this study, a question answering (Q&A) system for geriatric diseases based on knowledge graph was constructed to help the elderly obtain medical information. Initially, a total of 6,376 disease data items were collected and analyzed in order to identify the characteristics of these diseases. Then, the KG is constructed by Neo4j graph database. The establishment of Q&A system starts from semantic recognition. The Aho-Corasick (AC) automaton is utilized to filter user input questions. The Cypher language is employed for querying graph databases, and the obtained results are then imported into predefined templates for output. The accuracy of our system for different categories of questions is 87% and 94%, respectively. Finally, the random forest model is introduced to solve the problem of disease diagnosis. The feature variables were vectorized using TF-IDF model and the target variables were vectorized using one-hot model. In general, we introduce a novel Knowledge graph-driven Q&A system. Provide a new tool for health management of the elderly population. And the construction of Q&A system will promote the development of smart medicine and solves the health confusion of the elderly.
Defect detection of substation equipment components is an indispensable part of grid security situational awareness, and the regular inspection of equipment is related to the secure operation of the power system. For the current problem of low recognition accuracy of the defect classification model of substation equipment components, this paper proposes a defect classification method based on the improved Dilated Convolution Swin Transformer (DC-Swin). First, an improved Dilated Convolution Self-Attention Module is constructed for extracting the regions of equipment components that contain rich defect-specific information; Secondly, this paper constructs an image dataset using infrared imaging of defects of equipment components to pre-train the module, which enables the Self-Attention Module to learn the important regions in the image and reduces other ineffective information interfering with the model; Finally, this paper incorporates pre-trained modules, which have undergone preliminary training on flawed infrared imaging datasets, with the Swin Transformer, specifically in the channel dimension. By seamlessly integrating crucial feature regions into the original image, the network model is empowered to delve deeper into the intricate dependencies among various features, resulting in an enhanced and more discriminative feature representation capability. The proposed method improves the accuracy by 6.17% compared to the original Swin Transformer model. The results show that the method achieves the optimal utilization of the defect classification model on the acquired dataset and provides a solid foundation for substation safety situational awareness.