Ebook: Fuzzy Systems and Data Mining VII
Fuzzy systems and data mining are indispensible aspects of the computer systems and algorithms on which the world has come to depend.
This book presents papers from FSDM 2021, the 7th International Conference on Fuzzy Systems and Data Mining. The conference, originally due to take place in Seoul, South Korea, was held online on 26-29 October 2021, due to ongoing restrictions connected with the COVID-19 pandemic. The annual FSDM conference provides a platform for knowledge exchange between international experts, researchers, academics and delegates from industry. This year, the committee received 266 submissions, and this book contains 52 papers, including keynotes and invited presentations, oral and poster contributions. The papers cover four main areas: 1) fuzzy theory, algorithms and systems – including topics like stability; 2) fuzzy applications – which are widely used and cover various types of processing as well as hardware and architecture for big data and time series; 3) the interdisciplinary field of fuzzy logic and data mining; and 4) data mining itself. The topic most frequently addressed this year is fuzzy systems.
The book offers an overview of research and developments in fuzzy logic and data mining, and will be of interest to all those working in the field of data science.Fuzzy Systems and Data Mining (FSDM) is a yearly well-established international conference dealing with four main groups of topics: a) fuzzy theory, algorithm and system, b) fuzzy application, c) interdisciplinary field of fuzzy logic and data mining and d) data mining. Following the great success of previous FSDM editions, which started in 2015, FSDM 2021 has become the seventh edition in the FSDM series, to be held in Seoul (South Korea) initially, and finally as an online conference, in a forum for experts, researchers, academics and industry people to introduce the last advances in the field of Fuzzy Sets and Data Mining. The most frequent topic this year is Fuzzy Systems.
This volume contains the papers accepted and presented at the 7th International Conference on Fuzzy Systems and Data Mining (FSDM 2021), held online on 26–29 October 2021, due to the pandemic. All papers were carefully reviewed by program committee members and took into account the breadth and depth of the research topics that fall into FSDM scopes. From 266 submissions, the 52 most promising and relevant contributions were included in this volume, which presents original ideas or results of general significance supported by clear reasoning and compelling evidence, and methods.
The conference program of FSDM 2021 includes keynote and invited presentations, oral and poster contributions. This event brought together more than 100 qualified and high-level researchers and experts from over 20 countries, including 6 keynote speakers, which created a good platform for researchers and engineers worldwide to enjoy academic communication.
I would like to thank all the keynote and invited speakers, authors and anonymous reviewers for putting their effort in preparing a contribution for the conference. We are also very grateful to the people, especially the program committee members and reviewers, who devoted time to assessing the papers. It is an honour to continue with the publication of these proceedings in the prestigious series Frontiers in Artificial Intelligence and Applications (FAIA) published by IOS Press. Our particular thanks also go to Joost Breuker, Nicola Guarino, Pascal Hitzler, Joost N. Kok, Jiming Liu, Ramon Lopez de Mantaras, Riichiro Mizoguchi, Mark Musen, Sankar K. Pal, Ning Zhong, who are the FAIA series editors, for supporting this conference.
Last but not least, I hope you enjoyed your online stay and first virtual contact with Seoul city, which may be followed with a face-to-face conference in Seoul in the future if the situation of the global pandemic of COVID-19 is brought under control.
Antonio J. Tallón-Ballesteros
University of Huelva, Spain
Recent developments of Artificial Intelligence, Machine Learning and Big Data technologies are posing a threat to white-collar workers and managers alike. Whereas until a few years ago only manual jobs were at risk of replacement, nowadays there is no guarantee that intellectual jobs will remain in human’s hands for long. Hopes that new technologies will generate more new jobs than those lost are vanishing. The new challenge faced by human workers is made up of entities that display more intelligence, whether real or just simulated, than them. We, as humans, need to invent new strategies to survive in a job market that seems ready, in one or two generations, to get rid of us. This paper identifies the main issues around intelligence and provides hints about what intelligence imitation/simulation cannot do that human intelligence can. It also discusses aspects of creativity, allegedly a human-only skill that looks like the most promising area in which human development may progress, reducing risk of silicon replacement.
With the widely use of recommendation systems in various mobile applications, privacy leakage has been a longstanding threat, for which many researchers have come up with a great number of methods that achieve the protective effect to a certain extent. However, the protection scope of these methods is limited, especially in the protection of original data. To address this issue, we propose a data perturbation based Rényi differential privacy algorithm to protect the SVD recommendation model. This paper uses the data perturbation method to perturb the original training dataset in the data preprocessing stage, then leverages the perturbed data to train the SVD model, and the unperturbed data is used as a test set to verify the accuracy of the model. Compared with the objective perturbation, gradient perturbation, and output perturbation, the data perturbation can protect a broader range and realize the corresponding functions of the other three perturbed methods by using the post-processing property of differential privacy. Experimental results show that the proposed method can effectively protect user privacy, improve the effectiveness of data, and generate better recommendation results without seriously affecting the accuracy of the model.
Cardiovascular disease (CVD) is one of the major causes of death all over the world and the mortality rate is higher than other causes. Hence, we propose a novel deep neural network (DNN)-based prediction model for the major adverse cardiovascular event (MACE) occurrences in patients with non-ST-Elevation myocardial infarction (NSTEMI) to improve the prediction accuracy of CVD. The research contents are described as follows. First, for the experiment, we use the Korean Acute Myocardial Infarction Registry (KAMIR-NIH) dataset with 2 years follow-ups and then preprocess the extracted data, such as processing the missing values, solving the imbalance problem, and applying the normalization meth to scale all the datasets in the same range for the experiment. Then we design a DNN-based prognosis model for the occurrences of MACE in NSTEMI patients. Finally, we evaluate the proposed model’s performance and compare it with several applied machine learning algorithms, such as logistic regression, K-Nearest Neighbors, decision tree, and support vector machine. The result shows that the performance of our proposed method outperformed other machine learning-based prediction models.
In the Internet era, since online user reviews play an important role in various fields, various industries including real estate industry attach great importance to that. However, according to the existing literatures, there is no clear conclusion that whether online user reviews have an impact on real estate marketing. In order to figure out this problem, this paper will combine traditional real estate theories and machine learning technology to mine data on Chinese real estate online user reviews. We use natural language processing technology and panel data regression analysis method to explore whether the emotional tendency of online user reviews have an impact on the price of second-hand housing in real estate companies, and research more deeply about its impact on marketing of real estate companies. Our research provides a reference for real estate companies to make effective marketing strategies.
Bipolar fuzzy numbers plays a vital role in any Decision-making problem modelled under a bipolar fuzzy environment. In 2018, Akram and Arshad [1] introduced a new ranking function on the class of Trapezoidal Bipolar fuzzy numbers based on the area of the left and right membership function of a TrBFN, and they have discriminated any two TrBFNs by using it. The ranking principle introduced by Akram and Arshad [1] works better only when two bipolar fuzzy numbers have different rankings. We describe that the ranking function does not work with counterexamples when two or more bipolar fuzzy numbers have the same rankings. In this paper, we improve the ranking principle introduced in [1] by introducing a new Improved Score function. Firstly, we discuss the drawbacks and limitations of the ranking function introduced by Akram and Arshad [1]. Secondly, we introduce a new ranking function and study its properties. Thirdly, we introduce a new ranking principle by combining Akram and Arshad’s [1] ranking function and the proposed ranking function. Finally, we show the efficiency of the proposed ranking principle in comparing arbitrary TrBFNs.
Reviews have been commonly used to alleviate the sparsity problem in recommender systems, which has significantly improved the recommender performance. The review-based recommender systems can extract users features and items from review texts. The existing models such as D-Attn and NARRE employ convolutional neural networks and a coarse-grained attention mechanism to code reviews that have been embedded using the static word embedding, ignoring the long distance text information and lacks interpretability. To overcome these problems, this paper proposes the DNRDR (Dual-feature Neural Recommender with Dual-attention using Reviews) model, which can extract dual features of review text and can also enhance the interpretability using the word-level and review-level attention mechanisms. The proposed model is verified by experiments and compared with the state-of-the-art models. Besides, the dual-level attention mechanism can be visualized to improve interpretability.
This paper discusses a facial expression recognition model and a description generation model to build descriptive sentences for images and facial expressions of people in images. Our study shows that YOLOv5 achieves better results than a traditional CNN for all emotions on the KDEF dataset. In particular, the accuracies of the CNN and YOLOv5 models for emotion recognition are 0.853 and 0.938, respectively. A model for generating descriptions for images based on a merged architecture is proposed using VGG16 with the descriptions encoded over an LSTM model. YOLOv5 is also used to recognize dominant colors of objects in the images and correct the color words in the descriptions generated if it is necessary. If the description contains words referring to a person, we recognize the emotion of the person in the image. Finally, we combine the results of all models to create sentences that describe the visual content and the human emotions in the images. Experimental results on the Flickr8k dataset in Vietnamese achieve BLEU-1, BLEU-2, BLEU-3, BLEU-4 scores of 0.628; 0.425; 0.280; and 0.174, respectively.
A hierarchical multiprocessor system designed to control complex production facilities is investigated. It is shown that distributed computing in such systems is performed at the strategic, tactical and functional-logical levels. Since the control process unfolds in real physical time, in which both digital controllers and a control computer operate, when developing software for such systems, the problem of estimating the time complexity of control algorithms arises. A matrix equation is obtained that describes a closed multi-loop control system, in which data asymmetry and pure delays are taken into account. It is shown that time delays worsen the characteristics of transient processes during the transition of an object to a steady state of operation. A method for soft estimation of time complexity is developed, based on the sequential simplification of the semi-Markov process, represented by the algorithm, with recalculation of its time characteristics at each step. It is shown that the method allows one to estimate both data distortions and delays in the feedback loops.
After being introduced to approximate two-dimensional geographical surfaces in 1971, the multivariate radial basis functions (RBFs) have been receiving a great amount of attention from scientists and engineers. In 1987 the idea was extended into the construction of neural networks corresponding to the beginning of the era of artificial intelligence, forming what is now called ‘Radial Basis Function Neural Networks (RBFNs)’. Ever since, RBFNs have been developed and applied to a wide variety of problems; approximation, interpolation, classification, prediction, in nowadays science, engineering, and medicine. This also includes numerically solving partial differential equations (PDEs), another essential branch of RBFNs under the name of the ‘Meshfree/Meshless’ method. Amongst many, the so-called ‘Multiquadric (MQ)’ is known as one of the mostly-used forms of RBFs and yet only a couple of its versions have been extensively studied. This study aims to extend the idea toward more general forms of MQ. At the same time, the key factor playing a very crucial role for MQ called ‘shape parameter’ (where selecting a reliable one remains an open problem until now) is also under investigation. The scheme was applied to tackle the problem of function recovery as well as an approximation of its derivatives using six forms of MQ with two choices of the variable shape parameter. The numerical results obtained in this study shall provide useful information on selecting both a suitable form of MQ and a reliable choice of MQ-shape for further applications in general.
As the largest number of fasteners on power distribution network, bolts are the cornerstone of ensuring the safety and reliability of power system. However, pin losting, nut losting, nut loosening, and rusting can cause damage to power system and even cause terrible accidents. In order to solve the problem that the large number of bolt defects causes traditional manual identification to be difficult and inefficient, this paper proposes a bolt defect identification algorithm based on attention models. The method in this paper improves the traditional deep residual network ResNet network, adds a channel attention mechanism to obtain key channel features, and uses random flipping, translation and other data augment methods to expand the bolt defect dataset. The experimental results show that compared with the traditional model, the improved model can more accurately identify different types of bolt defect images, and the mAP on the testing set reaches 85.9%, which verifies the feasibility and reliability of the ATT-ResNet50 model in bolt defect recognition. The method proposed in this paper has high recognition accuracy, realizes the intelligent recognition of common bolt defects.
Advanced technologies of Sensorics and Internet of Things (IoT) enable real-time data analytics based on multiple sensors covering the target industrial production system and its manufacturing processes. The rolling bearings fault diagnosis is one of the most urgent problems and can be solved by using convolution neural networks and edge artificial intelligence (edge AI) devices. The limitations of the hardware platform must be taken into account to achieve maximum performance. In this paper, we analyze efficient CNN architecture for bearings fault diagnosis that is able to process data in real-time on edge AI devices. We observe that the accuracy of the proposed CNN is unsatisfactory for practical use, and better accuracy is possible with increasing the number of bearings in the training dataset.
Under the influence of environmental issues and energy crises, wind and solar power generation technologies have developed rapidly. Compared with terrestrial micro-grid, this technology has relatively few applications on ships. Aiming at the problems of low energy utilization rate of ship micro-grid, imperfect control strategy, and single simulation situation, this paper uses the construction method of terrestrial micro-grid to build a detailed ship micro-grid model based on wind and solar power generation technology on the MATLAB/simulink simulation platform, and uses hill climbing search Method and disturbance observation method to control wind and photovoltaic power generation system. In the simulation process, several situations of wind speed, light intensity, and load sudden changes during the operation of the micro-grid were simulated. The simulation results show that the micro-grid model can track the maximum power point in real time, and the wind energy utilization rate is increased to 0.48, and the bus voltage and current are equal. The actual operation requirements are met, and the correctness and effectiveness of the simulation model and control strategy are verified, which is helpful to the in-depth study of the construction of the ship micro-grid model.
The objective of this research project was to spark academic discourse on the need for practitioner-oriented research in big data marketing applications. For this purpose, a specific TikTok Hashtag Challenge, #FordWatchMe, was selected for its over 2.7 billion campaign impressions. Little to no scholarly research was identified to critically appraise the value of TikTok’s user engagement and reported campaign metrics, despite TikTok’s growing relevancy for marketers. A sequential mixed research method was designed, consisting of interviews, qualitative content reviews and a user engagement experiment to assess the relevancy of the #FordWatchMe campaign for a defined sample. Analyzing over 450 campaign videos resulted in more than 88% campaign unrelated user generated content contributions. The user perception experiment revealed a low probability for both brand recall and content engagement. Findings showcase the need for more scholarly research on the value of superlative impression counts and their implied effect on brand recall, brand perception and purchase intent stimuli.
Because of the problems that the current virtual experimental platforms focus on a single situation and small number of compatible users, a virtual experimental platform for international trade is designed. The platform uses desktop virtual reality technology to build the platform, modular to construct it. The platform has improved the international trade course system. Finally, compared with the existing virtual experiment platform, the results show that the designed International Trade Virtual Experiment Platform has more compatible users and faster response speed.
Globalization and information technology are developing rapidly. Chaozhou Chinese paper cutting exists physically. Chaozhou Chinese paper cutting is facing the difficulties of preservation and dissemination. The purpose of this study is to further realize the preservation and personalized recommended dissemination of intangible cultural heritage represented by Chaozhou Chinese paper cutting. In this paper, information technologies such as knowledge graphs, multimedia applications, Bi-RNN neural networks, and probabilistic decomposition models are used. This paper delves into the visual communication and personalized recommendation of intangible culture to preserve, pass on, and spread it better.
Sentence semantic matching (SSM) is central to many natural language processing tasks. This is especially the case for Chinese sentence semantic matching due to the complexity of the semantics, missing semantics and semantic confusion are more likely to occur. Existing methods have used enhanced text representations and multiple matching strategies to address these problems but there is still great potential to capture deep semantic information for Chinese text. This paper proposes a Multi-Granularity and Internal-External correlation Residual model (MGIER) to better capture the deep semantic information and to alleviate the missing semantics effectively. First, the MGIER model utilizes character/word granularity to capture fine-grained information. Then, soft alignment attention is employed to enhance the correlation between characters/words in a sentence, called internal correlation, and the correlation between sentences, called external correlation. In particular, this method uses residual connections to preserve more semantic information from the bottom embedding layer to the top prediction layer. Experimental results show that the proposed method achieves state-of-the-art performance for Chinese SSM, and, compared with pre-trained models, the method also achieves better performance with fewer parameters.
A sampling method is one of the popular methods to deal with an imbalance problem appearing in machine learning. A dataset having an imbalance problem contains a noticeably different number of instances belonging to different classes. Three sampling techniques are used to solve this problem by balancing class distributions. The first one is an undersampling technique removing noises from a class having a large number of instances, called a majority class. The second one is an over-sampling technique synthesizing instances from a class having a small number of instances, called a minority class, and the third one is the combined technique of both undersampling and oversampling. This research applies the combined technique of both undersampling and oversampling via the mass ratio variance scores of instances from each individual class. For the majority class, instances with high mass ratio variances are removed whereas for the minority class, instances with high mass ratio variances are used in synthesizing minority instances. The results of this proposed sampling technique help improve recall over standard classifiers: a decision tree, a random forest, Linear SVM, MLP on all synthesized datasets; however it may have low precision. So the combined measure of precision and recall is used, F1-score. Recall and F1-scores of synthesized datasets and UCI datasets are significantly better for collections of datasets having small imbalance ratio. Moreover, the Wilcoxon signed-rank test is used to confirm the improvement for datasets having imbalance ratio smaller than or equal to 0.2.
Granularity analysis and measurements for complex data environment are important tools to describe the essential attributes of the granular computing model(GCM). Firstly, based on fuzzy multiple relations, this paper difines a multi-fuzzy granular structure, and studies the hierarchical characteristics and relates mathematical conclusions of the four fuzzy multiple partial-order relations on the structure; Secondly, the measurement method of multi-fuzzy information granularity is proposed, and its properties of measurements are analyzed; Finally, the axiomatic definition of multi-fuzzy information granularity and its properties are discussed.
Excess weight and obesity are indicators of an unhealthy or harmful accumulation of fat that can be dangerous to health. Body mass index (BMI) refers to height-to-weight radio and is often used to identify overweight and obesity in adults. Although BMI is commonly used to diagnose obesity and overweight, it is ineffective in differentiating between high muscle mass and elevated body fat mass. Body fat percentage (BF%) is one of the best predictors of obesity because it quantifies adipose tissue. The Deurenberg equation is among the indirect methods to measure BF%; it uses BMI, age, and sex as parameters to calculate the BF%. Machine learning techniques demonstrated to be a good classifier of overweight, obesity, and diseases related to insulin resistance and metabolic syndrome. This study intends to evaluate anthropometric parameters as classifiers of BF% alteration using support vector machines and the Deurenberg equation for BF% estimation. The database used consisted of 1978 individuals with 24 different anthropometric measurements. The results suggest the SVM as a suitable technique for classifying individuals with normal and abnormal BF% values. Accuracy, F1 score, PPV, NPV, and sensitivity were above 0.8. Besides, the specificity value is below 0.7, which indicates that false positives may occur. As future work, this research intends to apply neural networks as a classification technique.
Pythagorean fuzzy sets (PFS) can better express and handle the uncertainty information and has the more lager representation space. Hence, the reasonable and effective method to measure the uncertainty of PFS can better analyze information. From the view of Dempster-Shafer evidence theory, hesitancy degree can include the two focal elements (member-ship, non-membership). Hence, considering the number of focal elements for hesitancy degree to measure uncertainty is important. In addition, the difference between membership and non-membership degree plays an essential role in uncertainty measure. From the above views, the paper proposed the new uncertainty measure. Based on the new uncertainty measure, cross entropy and divergence of PFS can be presented. In addition, some numerical examples are used to explain the proposed methods by comparing other methods. Finally, the proposed divergence can be used in pattern recognition to verify its effectiveness.
The ethnic group domain, in particular, is characterized by rich and diverse data sets in the Mekong River Basin (MRB). Ethnic groups’ vocabulary and relevant data come from various sources that cross history, language, and geography. As a result, distinct language is used by specialized groups to characterize their artifacts. Data interoperability among multiple catalogs is highly challenging as a result of this. The usage of controlled vocabularies and thesauri is generally considered a major practice for making preparations for standardization, which is essential for data reuse and sharing. In contrast, when used together, thesauri eliminate ambiguity in natural language, making it easier to identify and integrate data from different sources and allow scholars and computer programs to understand data more efficiently. This paper describes the modeling process of the EGMRB Thesaurus, its integration and role in the infrastructure, its publication as Linked Open Data, and the results of this work after six months of development. This paper presents the rationale behind the realization of this thesaurus. Thesaurus EGMRB (http://thesaurus.asiana.net/vocab/) provides a semantic resource on ethnic groups in the Mekong river basin. EGMRB is the outcome of interdisciplinary cooperation of specialists from the domains of ethnic groups and information science, who collaborated in the context of collaborative research. The thesaurus was developed in Simple Knowledge Organization System (SKOS), a standard data format based on the Resource Description Framework (RDF), using semantic web standard technologies. EGMRB is freely available online, with a SPARQL endpoint (http://thesaurus.asiana.net/vocab/sparql.php) for querying and an API (http://thesaurus.asiana.net/vocab/services.php) for system integration. Digital collections, digital exhibits, and a virtual study environment are being built as part of a digital platform that will give scholars and the general users search and content curation services. EGMRB, which provides unified ideas with related unique and resolvable URIs, can profoundly reduce the barriers to data discovery, integration, and sharing if adopted as a standard and carefully implemented and expanded by the academic community.
To solve the excessive utilization of back-end data caused by the sharp increase in the visit and consultation on the intelligent learning platform in the era of novel coronavirus epidemic, this study proposed to introduce Co-attention mechanism (Co-attention) into the Bidirectional Long Short Term Memory model (Bi-LSTM). The study employed Multi-layer Perception Network (MLP) for classification and screening to accurately judge the semantic repeatability. Lastly the study carried out contrast experiments for different models, using 1150 consultation posts about transposed determinant, using Newton’s Leibniz formula to calculate definite integral, using Laplace’s theorem to calculate determinant, how to do model analysis of STATA panel data, under what circumstances is the weighted least square method applicable, how to realize the Pareto optimality and finding the area of trapezoid with curve side from MOOC platform of Chinese universities. Results show that this model performs better than other existing models on the judgment accuracy and its accuracy is up to 89.42%.
This short note aims to make some modification for improving the results presented in [1]. A. Ghodousian et al. discussed the resolution of a system of max-Dubois-Prade fuzzy relation equations based on their proposed index sets Ji, i ∈ I. It is found that not every e ∈ E = J1 × J2 × unmapped: inline-formula unmapped: math unmapped: mo ⋯unmapped: mspace × Jm corresponds to a solution. To overcome this flaw, we modify the expression of the index sets, denoted by J=i , i ∈ I. Based on the modified index sets, resolution of the max-Dubois-Prade fuzzy relation equations becomes easier, regarding the computational cost.
The traditional analytic hierarchy process (AHP), fuzzy evaluation method and the Delphi method of group decision-making are organically combined and a new method of system analysis called Fuzzy Delphi Analytic Hierarchy Process (FDAHP) is proposed. Based on the Delphi survey of the group decision-making information, the group’s pairwise judgment of objects is fuzzily processed, and the results of the group’s overall judgment are used as decision-making environmental parameters. Furthermore, the group’s comprehensive weight of objects is determined according to the optimistic coefficient of group decision-making consideration. A simple example is given to illustrate the specific implementation steps and feasibility of the method. Finally, the advantages and disadvantages of the method are briefly discussed, and the possible research topics in the future are proposed.