Ebook: Fuzzy Systems and Data Mining VI
The interdisciplinary field of fuzzy logic encompass applications in the electrical, industrial, chemical and engineering realms as well as in areas of management and environmental issues, while data mining covers new approaches to big data, massive data, and scalable, parallel and distributed algorithms.
This book presents papers from the 6th International Conference on Fuzzy Systems and Data Mining (FSDM 2020). The conference was originally due to be held from 13-16 November 2020 in Xiamen, China, but was changed to an online conference held on the same dates due to ongoing restrictions connected with the COVID-19 pandemic. The annual FSDM conference provides a platform for knowledge exchange between international experts, researchers academics and delegates from industry. This year, the committee received 316 submissions, of which 76 papers were selected for inclusion in the conference; an acceptance rate of 24%. The conference covers four main areas: fuzzy theory; algorithms and systems, which includes topics like stability; foundations and control; and fuzzy applications, which are widely used and cover various types of processing as well as hardware and architecture for big data and time series.
Providing a current overview of research and developments in fuzzy logic and data mining, the book will be of interest to all those working in the field of data science.
This year’s edition of the International Conference on Fuzzy Systems and Data Mining (FSDM), the first of the 2020s, has been disrupted due to the COVID-19 pandemic. FSDM has had a short but also very intense history to reach this 6th edition of the series, and has become one of the top conferences the proceedings of which are published in the prestigious book series, Frontiers in Artificial Intelligence and Applications (FAIA), by IOS Press. The last two proceedings of the European-Japanese Conference on Information Modelling and Knowledge Bases (EJC-IMKB)can be found in [1] and [2] respectively; the 2019 proceedings of Legal Knowledge Based Systems (JURIX) can be accessed from [3]; and papers for JURIX 2020 are currently under assessment. The – International Conference on Software Methodologies, Tools and Techniques (SOMET) – SOMET 2020 is a newly published book in FAIA [4], and SOMET 2019 can be found in [5]. These are the three conferences with the highest number of edited volumes published as part of the FAIA series.
FSDM 2015 [6] was held in Shanghai, China; FSDM 2016 [7] took place in Macau, China; FSDM 2017 [8] visited Hualien, Taiwan; FSDM 2018 [9] travelled to Bangkok, Thailand; and finally FSDM 2019 [10] was held in Kitakyushu City, Japan. Nowadays, data science, big data and deep learning are the typical keywords which abound in any high-level conference in the field of machine learning. FSDM is focused on data mining and fuzzy systems, and the aforementioned topics are dealt with in many of the papers included here.
This volume includes the papers accepted and presented at the 6th International Conference on Fuzzy Systems and Data Mining (FSDM 2020), which was initially scheduled to be held from 13–16 November 2020 in Xiamen, China, but which was ultimately changed to an online conference due to ongoing restrictions connected with the COVID-19 pandemic. All sessions were available via live streaming with a high interaction from all participants.
All papers were carefully reviewed by program committee members with regard to the quality, novelty, breadth and depth of the research themes falling within the scope of FSDM. FSDM 2020 was a reference and outstanding conference which attracted three remarkable keynote speakers: Prof. Dr. Rongrong Ji from Xiamen University (China), Prof. Dr. Juan Manuel Corchado, Director of the European IoT Digital Innovation Hub and Director of the BISITE Research Group from the University of Salamanca (Spain) and Prof. Dr. Milan Tuba, Vice Rector for International Relations, Singidunum University, Serbia. This meant that the conference enjoyed three keynotes from very different locations.
The current FAIA volume contains selected contributions from FSDM 2020. If you have contributed to the FSDM conference before or in this edition, we would like to see you on board again. Otherwise, if you have not yet submitted a paper to FSDM, we would like to invite you to prepare a good contribution and for our visit to either Europe or South Korea, one of which will be the location for FSDM 2021.
I am very glad to inform you that FSDM received 316 submissions this year. After an intense discussion stage, the committee, which included many experts, decided to accept 76 papers, which represents an acceptance rate of 24%. The profile of the authors is very remarkable and the number of full professors who contributed is very high. As a follow-up to the conference, some special issues in well-regarded journals such as the International Journal of Information Technology and Web Engineering (IJITWE), CMES-Computer Modeling in Engineering & Sciences, International Journal of Fuzzy Systems and Mathematics are scheduled to be published; this is an important leap in the number of journal issues, which is increasing yearly. Special issues with Intelligent Data Analysis and Journal of Nonlinear and Convex Analysis have been published in previous years.
I would like to take this opportunity to thank all the keynote and invited speakers, as well as the authors who made the effort to prepare a contribution to the conference. Furthermore, I also wish to express our gratitude to everyone, especially the program committee members and reviewers, who devoted time to assessing the papers. It is an honor to continue with the publication of these proceedings in the outstanding series FAIA by IOS Press. My particular thanks and regards also go to J. Breuker, N. Guarino, J.N. Kok, R. López de Mántaras, J. Liu, R. Mizoguchi, M. Musen, S.K. Pal and N. Zhong, the FAIA series editors, for supporting this conference.
1st October 2020
Huelva city, Spain
Antonio J. Tallón-Ballesteros
University of Huelva (Spain)
References
[1] B. Thalheim, M. Tropmann-Frick, H. Jaakkola & Y. Kiyoki (Eds.). (2020). Proceedings of the 30th International Conference on Information Modelling and Knowledge Bases (EJC 2020).
[2] A. Dahanayak, J. Huiskonen, & Y. Kiyoki (Eds.). (2020). Information Modelling and Knowledge Bases XXXI (Vol. 321). IOS Press.
[3] M. Araszkiewicz, & V. Rodríguez-Doncel (Eds.). (2019). Legal Knowledge and Information Systems: JURIX 2019: The Thirty-second Annual Conference (Vol. 322). IOS Press.
[4] H. Fujita, A. Selamat, & S. Omatu (Eds.). (2020). Knowledge Innovation Through Intelligent Software Methodologies, Tools and Techniques. IOS Press. ISBN 978-1-64368-114-6 DOI:10.3233/FAIA200574.
[5] H. Fujita, & A. Selamat (Eds.). (2019). Advancing Technology Industrialization Through Intelligent Software Methodologies, Tools and Techniques: Proceedings of the 18th International Conference on New Trends in Intelligent Software Methodologies, Tools and Techniques (SoMeT_19) (Vol. 318). IOS Press.
[6] G. Chen, F. Liu, & M. Shojafar (Eds.). (2016). Fuzzy system and data mining: proceedings of FSDM 2015 (Vol. 281). IOS Press.
[7] S. L. Sun, A. J. Tallón-Ballesteros, & D. S. Pamučar (Eds.). (2016). Fuzzy Systems and Data MiningII: Proceedings of FSDM 2016 (Vol. 293). IOS Press.
[8] A. J. Tallón-Ballesteros, & K. Li (Eds.). (2017). Fuzzy Systems and Data Mining III: Proceedings of FSDM 2018 (Vol. 299). IOS Press.
[9] A. J. Tallón-Ballesteros, & K. Li (Eds.). (2018). Fuzzy Systems and Data Mining IV: Proceedings of FSDM 2018 (Vol. 309). IOS Press.
[10] A. J. Tallón-Ballesteros (Ed.). (2019). Fuzzy Systems and Data Mining V: Proceedings of FSDM 2019 (Vol. 320). IOS Press.
Widely used recommendation systems do not meet all industry requirements, so the search for more advanced methods for creating recommendations continues. The proposed new methods based on Generative Adversarial Networks (GAN) have a theoretical comparison with other recommendation algorithms; however, real-world comparisons are needed to introduce new methods in the industry. In our work, we compare recommendations from the Generative Adversarial Network with recommendation from the Deep Semantic Similarity Model (DSSM) on real-world case of airflight tickets. We found a way to train the GAN so that users receive appropriate recommendations, and during A/B testing, we noted that the GAN-based recommendation system can successfully compete with other neural networks in generating recommendations. One of the advantages of the proposed approach is that the GAN training process avoids a negative sampling, which causes a number of distortions in the final ratings of recommendations. Due to the ability of the GAN to generate new objects from the distribution of the training set, we assume that the Conditional GAN is able to solve the cold start problem.
This paper discusses the gH-directional differentiability of fuzzy mappings, and proposes the concept of gH-directional differentiability of fuzzy mappings. Based on the concept of gH-directional differentiability of interval-valued mappings and its related properties, two properties of gH-directional differentiability fuzzy mappings are proposed. At the same time, the relation between gH-differentiability and gH-directional differentiability for a fuzzy mapping is discussed, and it is proved that both gH-derivative and gH-partial derivative are directional derivatives of fuzzy mappings in the direction of the coordinate axis.
With the development of science and technology, magnetic suspension products are gradually favored by people for the characteristics of no friction and wear, no lubrication and long life. With the increasing consumption of fossil energy, magnetic levitation green energy-saving equipment is gradually studied by researchers. Taking magnetic suspension flywheel (MSMFW) as typical representative, magnetic suspension products in aerospace field have been greatly developed. As the core component of magnetic suspension flywheel, magnetic levitation motor has been widely studied. To obtain the better performance of magnetic levitation motor when it was used in magnetic suspended momentum flywheel, the motor rotor system structure was optimized in this paper. Firstly, the rotor system moment of inertia was calculated and analyzed, and the optimum ratio of polar moment of inertia to equatorial moment of inertia was obtained. Then the basic structure of motor rotor was designed on the basis of the rotor system dynamic analysis. Thirdly, based on the flywheel motor system model, the ANSYS Parametric Design Language (APDL) file was established, and it was applied to optimization software ISIGHT to complete the optimization. In the optimization process, the design variables boundary conditions were given, and the Sequence Quadratic Programming method was used for maintaining the optimization process and the optimization convergence results was obtained. The magnetic flux density as the optimization objective is increased from 0.393 T to 0.53 T through the optimization, which is 34.9% larger than before. It is of great significance for the magnetic levitation motor design, and the engineering application of magnetic levitation motor based on optimization results will be done in the future.
The article is devoted to the current topic of choosing the optimal organizational option of multilateral integration of the scientific and educational sphere, business and the state in the process of globalization of the world economy. The authors justify the format of formation of scientific and educational complex on the basis of network interaction, which allows to obtain the greatest synergistic effect. In order to justify the effective network interaction of the scientific and educational complex, an analysis of existing methods of assessing the efficiency of its functioning was carried out and an author’s system of performance indicators and its assessment was proposed in accordance with the general purpose of the integration mechanism and the specific purpose of each interaction subject. The model contains a system of heterogeneous indicators reflecting the principles of formation of a scientific and educational complex on the basis of network interaction, which allows, along with an evaluation task, to determine, using factor models, further directions of inter-network relations of subjects in order to better understand the current processes and identify problem areas of coordination of their innovative activity.
In the case of extremely unbalanced data, the results of the traditional classification algorithm are very unbalanced, and most samples are often divided into the categories of majority samples, so the accuracy of judgment of the minority classes will be reduced. In this paper, we propose a classification algorithm for unbalanced data based on RSM and binomial undersampling. We use RSM’s random part features rather than all each classifier to make each training classifier reduce the dimensions, and dimension reduction makes relatively minority class samples indirectly lift. Using the above characteristics of the RSM to reduce dimension can solve the problem that unbalanced data classification in the minority class samples is too little, and it can also find the important attribute of variables to make the model have the ability of explanation. Experiments show that our algorithm has high classification accuracy and model interpretation ability when classifying unbalanced data.
Concerning the problem that the current facial expression analysis based on convolutional neural network (CNN) only uses the features of the last convolutional layer but the recognition rate is not high, this paper proposes the use of sub-deep convolutional layer features and builds a CNN model which fuses the features of multi-layer convolutional layers. The model uses a CNN for feature extraction and saves the deepest feature vectors and sub-deep feature vectors of the expression images. The sub-deep feature vector is used as the input of the multilayer CNN established in this paper. The processed fourth convolution layer feature is fused with the deepest feature previously saved to perform facial expression analysis. Experiments are performed on FERPLUS dataset, Cohn-Kanade dataset (CK+) and JAFFE dataset. The experimental results show that the improved network structure proposed in this paper can capture richer feature information during facial expression analysis, which greatly improves the accuracy of expression recognition and the stability of the network. Compared with the original CNN-based facial expression analysis using only the last layer of convolution layer features, using multi-layer fusion features on three kinds of datasets can improve the expression recognition rate by 33.3%, 2.3% and 22%, respectively.
The DIVA (Directions Into Velocities of Articulators) model is an adaptive neural network model that is used to control the movement of the analog vocal tract to generate words, syllables, or phonemes. The input signal to the DIVA model is the EEG (electroencephalogram) signal acquired from the human brain. However, due to the influence of power frequency interference and other forms of noise, the input signal can be non-stationary and can also contain a variety of multi-form waveforms in its instantaneous structure. Input of such a signal into the DIVA model affects normal speech processing. Therefore, based on the concept of sparse decomposition, this paper applies and improves an adaptive sparse decomposition model for feature extraction of the general EEG signal structure and then uses the Matching Pursuit algorithm to compute the optimal atom. The original EEG signal can then be represented by atoms in a complete atomic library. This model removes noise from the EEG signal resulting in a better signal than the wavelet transform method. Finally, applies the EEG signal de-noised by this model to DIAV model. Simulation results show that the method improves phonetic pronunciation greatly.
In several published literatures, some new measurement concepts have been proposed to reinterpret measurement theory, which leads to a new interpretation method of measurement theory. By directly comparing the measurement concept differences between the new theory and the traditional theory, this paper reveals that the root of these differences is due to the different understanding of the mathematical concept of random variable, and clarifies the evolution process of other conceptual differences. Also, by reviewing the mathematical concept, it points out that the traditional theory has been gone astray, and the concept of error classification is actually the product of going astray, while the new concept theory should get active attention and research.
The rapid development of technology and increasing numbers of customers have saturated the communication market. Communication operators must give focused attention to the problem of customer churn. Analyzing the customer’s communication behavior and building a prediction model of customer churn can provide the advance evidence for communication operators to minimize churn. This paper describes how to design a HMM to predict customer churn based on communication data. First, we oversample churners to increase the number of positive samples and establish the relative balance of positive and negative samples. Second, the continuous numerical attributes that affect communication customer churn are relatively discretized and their monthly values are converted into monthly change tendencies. Next, we select the communication features by calculating the information gains and information gain rates of all communication attributes. We then construct and optimize a prediction model of customer churn based on HMM. Finally, we test and evaluate the model by using a Spark cluster and the communication data set of Taizhou Branch of China Telecom. Experimental evaluation provides proof that our prediction model is exceptionally reliable.
Alzheimer’s disease (AD) is one of the most common forms of dementia. The early stage of the disease is defined as Mild Cognitive Impairment (MCI). Recent research results have shown the prospect of combining Magnetic Resonance Imaging (MRI) scanning of the brain and deep learning to diagnose AD. However, the CNN deep learning model requires a large scale of samples for training. Transfer learning is the key to enable a model with high accuracy by using limited data for training. In this paper, DenseNet and Inception V4, which were pre-trained on the ImageNet dataset to obtain initialization values of weights, are, respectively, used for the graphic classification task. The ensemble method is employed to enhance the effectiveness and efficiency of the classification models and the result of different models are eventually processed through probability-based fusion. Our experiments were completely conducted on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) public dataset. Only the ternary classification is made due to a higher demand for medical detection and diagnosis. The accuracies of AD/MCI/Normal Control (NC) of different models are estimated in this paper. The results of the experiments showed that the accuracies of the method achieved a maximum of 92.65%, which is a remarkable outcome compared with the accuracies of the state-of-the-art methods.
At present, there are many kinds of electricity theft and the corresponding approaches to combat this are insufficient. Manual approaches result in a heavy staff workload and are inefficient. In this paper, the data from an electricity information acquisition system is collected and mined using Python. Based on an understanding of the business and an analysis of the information value (IV) measure, important characteristic indexes are selected and an improved decision tree algorithm is used to construct a model of power theft by users. This method effectively narrows the range of users suspected of power theft, improving the pertinence of audit, and providing strong support for reducing the financial losses of power supply enterprises and ensuring the safety of power grid operation.
We extend the classical approach in supervised classification based on the local likelihood estimation to the functional covariates case. The estimation procedure of the functional parameter (slope parameter) in the linear model when the covariate is of functional kind is investigated. We show, on simulated as well on real data, that classification error rates estimated using test samples, and the estimation procedure by local likelihood seem to lead to better estimators than the classical kernel estimation. In addition, this approach is no longer assuming that the linear predictors have a specific parametric form. However, this approach also has two drawbacks. Indeed, it was more expensive and slower than the kernel regression. Thus, as mentioned earlier, kernels other than the Gaussian kernel can lead to a divergence of the Newton-Raphson algorithm. In contrast, using a Gaussian kernel, 4 to 6 iterations are then sufficient to achieve convergence.
Statistics show that power theft is one of the main reasons for the dramatic increase in power grid line loss. In this paper, a genetic algorithm is used to optimize a neural network and establish a power theft prediction model. With the grey prediction model, the predicted values of variables are obtained and then applied to the prediction model of a GA-BP neural network to obtain relatively accurate predictions from limited samples, reducing the absolute error. Through the two levels of prediction and analysis, the model is demonstrated to have good universality in predicting power theft behavior, and is a practical and effective method for power companies to carry out power theft analysis.
This paper presents the stabilization for positive nonlinear systems using polynomial fuzzy models. To conform better to the practical scenarios that system states are not completely measurable, the static output feedback (SOF) control strategy instead of the state feedback control method is employed to realize the stability and positivity of the positive polynomial fuzzy system (PPFS) with satisfying L1-induced performance. However, some troublesome problems in analysis and control design will follow, such as the non-convex problem. Fortunately, by doing mathematical tricks, the non-convex problem is skillfully dealt with. Furthermore, the neglect of external disturbances may lead to a great negative impact on the performance of positive systems. For the sake of guaranteeing the asymptotic stability and positivity under the satisfaction of the optimal performance of the PPFS, it is significant to take the L1-induced performance requirement into consideration as well. In addition, a linear co-positive Lyapunov function is chosen so that the positivity can be extracted well and the stability analysis becomes simple. By using the sum of squares (SOS) technique, the convex stability and positivity conditions in the form of SOS are derived. Eventually, for illustrating the advantages of the proposed method, a simulation example is shown in the simulation section.
Belief function has always played an indispensable role in modeling cognitive uncertainty. As an inherited version, the theory of D numbers has been proposed and developed in a more efficient and robust way. Within the framework of D number theory, two more generalized properties are extended: (1) the elements in the frame of discernment (FOD) of D numbers do not required to be mutually exclusive strictly; (2) the completeness constraint is released. The investigation shows that the distance function is very significant in measuring the difference between two D numbers, especially in information fusion and decision. Modeling methods of uncertainty that incorporate D numbers have become increasingly popular, however, very few approaches have tackled the challenges of distance metrics. In this study, the distance measure of two D numbers is presented in cases, including complete information, incomplete information, and non-exclusive elements
Schaffrin and Toutenburg [1] proposed a weighted mixed estimation based on the sample information and the stochastic prior information, and they also show that the weighted mixed estimator is superior to the ordinary least squares estimator under the mean squared error criterion. However, there has no paper to discuss the performance of the two estimators under the Pitman’s closeness criterion. This paper presents the comparison of the weighted mixed estimator and the ordinary least squares estimator using the Pitman’s closeness criterion. A simulation study is performed to illustrate the performance of the weighted mixed estimator and the ordinary least squares estimator under the Pitman’s closeness criterion.
Microscopic hyperspectral imaging has become an emerging technique for various medical applications. However, high dimensionality of hyperspectral image (HSI) makes image processing and extraction of important diagnostic information challenging. In this paper, a novel dimensionality reduction method named spatial-spectral density peaks based discriminant projection (SSDP) is proposed by considering spatial-spectral density distribution characteristics of immune complexes. The proposed SSDP coupled with support vector machine classifier (SVM) yields high-precision automatic diagnosis of membranous nephropathy (MN). Detailed ex-vivo validation of the proposed method demonstrates the potential clinical value of the system in identifying hepatitis B virus-associated membranous nephropathy (HBV-MN) and primary membranous nephropathy (PMN).
This paper proposes an intelligent method for the identification of potential customers for electricity substitution. This is developed on the basis of an original model, where a related indicator system of potential customers is constructed through exploratory analysis, improving the results. At the same time, ANOVA is used to screen the indicators and the XGBoost algorithm is employed to output the index importance score and identify likely electricity substitution customers. This method can accurately identify such customers, accelerate the fundamental transformation of energy development, and adapt to the new strategy of Energy Internet development.
By studying the influencing indicators of women’s social status, we perform an ordered Logit regression analysis on the data of the China Comprehensive Social Survey in 2012, 2013 and 2015, and then select the assessment of self-social status in the female sample as the dependent variable. Using the impact indicators as independent variables to explore the impact of each variable on women’s social status. At the same time, applying k-means clustering analysis based on MapReduce to mine the relationship between employment and education level between different genders. We find out the fact that women have a high level of education does not necessarily result in good employment treatment. Gender discrimination in the Chinese labor market is also persistent.
In this paper, we build a deep learning network to predict the trends of natural gas prices. Given a time series, for each day, the gas price trend is classified as “up” and “down” according to the price compared to the last day. Meanwhile, we collect news articles as experimental materials from some natural gas related websites. Every article was then embedded into vectors by word2vec, weighted with its sentiment score, and labeled with corresponding day’s price trend. A CNN and LSTM fused network was then trained to predict price trend by these news vectors. Finally, the model’s predictive accuracy reached 62.3%, which outperformed most of other traditional classifiers.
Virtual Reality has been much researched as an education technology, particularly under the constructivist perspective of learning environments. Whilst some education institutes have already applied VR within their curriculum, or for marketing purposes, interest has calmed much around the technology. However, since the start of the global Covid-19 pandemic, many educators were faced with the need to distance educate, which sparked a new wave of interest in VR as a potential education technology solution; even to extent of classroom substitution. Whilst the advantages of VR to education have been researched, this paper seeks to address a specific research gap to assist education in the selection of VR as a suitable education technology. Based on qualitative interviews, literature research and the practitioner experience of the authors in the field of education and VR, a technology fit assessment model has been proposed to support education institutes in their VR evaluation process.
Terrorism is a major issue facing the world today. It has negative impact on the economy of the nation suffering terrorist attacks from which it takes years to recover. Many developing countries are facing threats from terrorist groups and organizations. This paper examines various terrorist factors using data mining from the historical data to predict the terrorist groups most likely to attack a nation. In this paper we focus on sampled data primarily from India for the past two decades and also consider International database. To create meaningful insights, data mining, machine learning techniques and algorithms such as Decision Tree, Naïve Bayes, Support Vector Machine, Ensemble methods, Random Forest Classification are implemented to analyze comparative based classification results. Patterns and predictions are represented in the form of visualizations with the help of Python and Jupyter Notebook. This analysis will help to take appropriate preventive measures to stop Terrorism attacks and to increase investments, to grow the economy and tourism.
Cyberbullying is a critical issue in society worldwide, however in Ecuador is not given necessary importance to mitigate this cybercrime. It has been proposed to develop an exhaustive analysis of the laws that are currently considered to sanction cyberbullying when denouncing this fact. The deductive method and exploratory research were employed to make the analysis of the information consulted from the various sources that are obtained on the network about the topic discussed. The investigation revealed how cyberbullying cases arise, from which it is obtained that only 0.07% have been reported, with this result it can be deduced that the number of reported cases is very low in relation to the total number of cellphones activated in Ecuador and people who may be victims of this cyber-crime. In addition, a “Criminal Process Diagram” was obtained that determines the sequence of how the judicial process work. The applied method resulted the following: Can be created a law project that battles each type of derived cyberbullying. It was concluded that several institutions in Ecuador work together with organizations to prevent cyberbullying, however, when this happens, the laws are not enough to punish the act.
As a feature selection technique in rough set theory, attribute reduction has been extensively explored from various viewpoints especially the aspect of granularity, and multi-granularity attribute reduction has attracted much attention. Nevertheless, it should be pointed out that multiple granularities require to be considered simultaneously to evaluate the significance of candidate attribute in the corresponding process of computing reduct, which may result in high elapsed time of searching reduct. To alleviate such a problem, an acceleration strategy for neighborhood based multi-granularity attribute reduction is proposed in this paper, which aims to improve the computational efficiency of searching reduct. Our proposed approach is actually realized through the positive approximation mechanism, and the processes of searching qualified attributes are executed through evaluating candidate attributes over the gradually reduced sample space rather than all samples. The experimental results over 12 UCI data sets demonstrate that the acceleration strategy can provide superior performance to the naive approach of deriving multi-granularity reduct in the elapsed time of computing reduct without generating different reducts.