Ebook: Fuzzy Systems and Data Mining V
The Fuzzy Systems and Data Mining (FSDM) conference is an annual event encompassing four main themes: fuzzy theory, algorithms and systems, which includes topics like stability, foundations and control; fuzzy application, which covers different kinds of processing as well as hardware and architectures for big data and time series and has wide applicability; the interdisciplinary field of fuzzy logic and data mining, encompassing applications in electrical, industrial, chemical and engineering fields as well as management and environmental issues; and data mining, outlining new approaches to big data, massive data, scalable, parallel and distributed algorithms. The annual conference provides a platform for knowledge exchange between international experts, researchers, academics and delegates from industry.
This book includes the papers accepted and presented at the 5th International Conference on Fuzzy Systems and Data Mining (FSDM 2019), held in Kitakyushu, Japan on 18-21 October 2019. This year, FSDM received 442 submissions. All papers were carefully reviewed by program committee members, taking account of the quality, novelty, soundness, breadth and depth of the research topics falling within the scope of FSDM. The committee finally decided to accept 137 papers, which represents an acceptance rate of about 30%. The papers presented here are arranged in two sections: Fuzzy Sets and Data Mining, and Communications and Networks.
Providing an overview of the most recent scientific and technological advances in the fields of fuzzy systems and data mining, the book will be of interest to all those working in these fields.
Fuzzy Systems and Data Mining (FSDM) is a consolidated annual international conference comprising four main themes: a) fuzzy theory, algorithm and system, including topics like stability, foundations and control; b) fuzzy application, covering different kinds of processing as well as hardware and architectures for big data and time series with applicability, among other things, to recognition and diagnostics, multimedia and industry; c) the interdisciplinary field of fuzzy logic and data mining, encompassing applications in electrical, industrial, chemical, bio- and many other types of engineering, as well as management and environmental issues; and d) data mining, outlining new approaches to big data, massive data, scalable, parallel and distributed algorithms.
This annual conference was first held in Shanghai in 2015, and has taken place in a different city each year. Up to now, two countries have hosted the conference. Following the great success of FSDM 2015, held in Shanghai (China), the second edition took place in Macau (China), the third edition in the FSDM series was hosted by National Dong Hwa University in Hualien, Taiwan (China), and the fourth edition was held in Bangkok (Thailand). The fifth edition of the series, FSDM 2019, is being held in Kitakyushu, Japan, making Japan the third country to host the conference, and provides an important meeting point for experts, researchers, academics and delegates from industry to present the latest advances in the field of fuzzy systems and data mining.
This book includes the papers accepted and presented at the 5th International Conference on Fuzzy Systems and Data Mining (FSDM 2019), held in Kitakyushu, Japan on 18–21 October 2019. All manuscripts were carefully reviewed by program committee members, bearing in mind the quality, novelty, soundness, breadth and depth of the research topics falling within the scope of FSDM. Additionally, FSDM 2019 was a reference and outstanding conference which attracted four remarkable keynote speakers, three of them from overseas: Prof. Takeshi Yamakawa, Professor Emeritus of Kyushu Institute of Technology, Japan and Founding Director of Fuzzy Logic Systems Institute (FLSI), Japan; Prof. Sheng-Lung Peng from the Department of Computer Science and Information Engineering, National Dong Hwa University, Taiwan; Dr. Chetneti Srisa-An from the College of Digital Innovation and Information Technology, Rangsit University, Thailand; and Prof. Dr. Malin Song, Dean of the Graduate School, Anhui University of Finance and Economics, China. Previous proceedings have been published as part of the prestigious series Frontiers in Artificial Intelligence and Applications (FAIA) by the publishers IOS Press, and are as follows: Vol. 281, Vol. 293, Vol. 299 and Vol. 309 for FSDM 2015, FSDM 2016, FSDM 2017 and FSDM 2018, respectively. You are now reading the FAIA volume introducing the selected contributions from FSDM 2019.
This year, FSDM received 442 submissions. After an intense and avid discussion stage the committee, which is composed of many experts, decided to accept 137 papers, which represents an acceptance rate of about 30%. As a follow-up to the conference, some well-known and respected journals such as IEEE Transactions on Fuzzy Systems, the International Journal of Fuzzy Systems, Entropy, Symmetry, Algorithms and Evolutionary Intelligence, the Journal of Nonlinear and Convex Analysis and the Journal of Information Science and Engineering are planning to publish special issues in; this is an important achievement as the number of journal issues is increasing yearly.
I want to express our special thanks to all the keynote and invited speakers and authors who put their effort into preparing a contribution for the conference. We are also very grateful to all those people, especially the program committee members and reviewers, who devoted time to evaluating the papers. It is an honor to continue with the publication of these proceedings in the prestigious series Frontiers in Artificial Intelligence and Applications (FAIA) from IOS Press. Our particular thanks also go to J. Breuker, N. Guarino, J.N. Kok, R. López de Mántaras, J. Liu, R. Mizoguchi, M. Musen, S.K. Pal and N. Zhong, the FAIA series editors, for supporting this conference.
August 2019
Antonio J. Tallón-Ballesteros
University of Huelva
Seville, Spain
Experimental modal analysis (EMA) is a program that allows structure modal parameters to be extracted from the measured response under impulse excitation. Traditional EMA method is based on the assumption of linear mode theory, which is easy to cause pseudo-mode. Therefore, it is necessary to propose a de-theorizing EMA method. In order to solve the problem of obtaining modal parameters without theorizing, a method using data mining is proposed in this paper. As a data mining method, symbolic regression can obtain modal parameters by only mining function expression rules from single point response data. Taking modal parameter extraction based on modal theory as a reference, we designed a simulation test and an experiment to verify the feasibility of the proposed method. In addition, it has a good application prospect for de-theorizing modal rule mining based on data.
Haze has become one of the most concerning environmental problems recently. A number of cities all over the world have been suffering from severe haze pollution. An increasing number of scholars are dedicated to the research area of haze. Using scientometric and quantitative methods, this paper assesses the status quo and explores the evolution of haze research. The literature data from 1987 to 2017 is retrieved from the Web of Science. We establish co-citation networks and use visualization tools to reveal main research clusters. Based on our results, corresponding key elements, highly cited papers, the most influential and outstanding authors are determined. Furthermore, we note active authors, institutions, and countries in academic collaboration. Finally, we use burst detection to trace emerging hotspots and future directions of research. The results may be helpful in obtaining a general knowledge of the evolution of haze research and identifying future research directions.
The published scientific and technical abstracts provide semantic fact data of problems, methods, and results in scientific research activities, as well as reliable factual data for the use of artificial intelligence in knowledge services. If they can be accurately identified and separated, an intelligent and innovative knowledge question and answer system can be realized. This study proposed a semantic recognition and classification method for scientific and technical abstracts. The method first sorted the scientific and technical abstracts according to the syntactic and semantic functions and then performed statistical analysis of the number distribution of class and sentence positions, sentence type, and semantic structure of the sentence. The semantic word order characteristics of sentences were finally classified and merged based on this analysis, and the classification and experiment of the problems of scientific and technical abstracts were realized. The accuracy of the classification was 99%. The effectiveness of semantic recognition and classification algorithm was verified manually. The experimental results showed that this method had a simple algorithm and displayed high classification accuracy and good universality.
An approach to finding a probabilistic characteristics of fuzzy events is considered. The concept of fuzzy probability of a fuzzy event is introduced. The examples of the tasks in fuzzy formulation is given.
Protein structural can be regarded as one of the most significant elements in protein function. In this work, the amino acid residues’ frequency featuring set and the E-H description method have been employed as the classification features. And then, support vector machine and neural network have been employed as the classifier in this work. In these employed features, some novel features have the ability to making differences between α/β type and α+β one. In order to show the performance of the proposed method, the three employed benchmark datasets are utilized to train and test the proposed approach. The results demonstrated that some performances of the proposed method are better than existing method ones, especially in α/β and α+β types classification.
A brand’s official Weibo contains a large number of comments that reflect people’s emotional responses to the brand’s products, services and other aspects. Therefore, negative emotional information can receive much attention and spread widely, so it is very important to manage and control this effectively. This paper uses sentiment analysis to establish a sentiment classification model of brand affect based on the official Huawei Weibo comments. First, the official Huawei Weibo comments are obtained and word2vec is used to preprocess and characterize the comment text. Second, three machine learning algorithms are used to learn sentiment classification for brand affect, namely support vector machines, random forests, and deep belief networks. Through the comparison of the resulting classification accuracy from the experimental results, the best model is selected and the brand’s negative emotional comments are obtained. Third, the word frequencies of these negative comments are obtained and the monthly trends of the proportion of high-frequency words and negative emotional comments are calculated. At the same time, the existing problems are analyzed and corresponding countermeasures and suggestions are proposed in order to solve this problem for an enterprise, so that the company’s brand strategies are adjusted in a timely and reasonable way.
The technology of DNA microarray enables researchers to obtain thousands of gene expression data simultaneously, but for biological and technical reasons, inevitable missing values exist. We developed an optimization algorithm that preselects imputation candidates based on numerical features and biological meanings, combining statistical similarity, functional similarity and another similarity obtained through text mining. We applied the proposed algorithm and found that it improved the accuracy of some traditional imputation algorithms.
The fuzzy models have shown to be ambiguous and characteristically primitive in nature when applied for data analysis problems in most settings. This is owing to the fact that with fuzzy models, it may not be practically possible to extract meaningful information about the underlying process elements when confronted with datasets that are unstructured in nature. To this end, the work in this paper demonstrates that it is possible to extract abstract knowledge and improve the information values of such type of models to a greater level by carefully integrating and tuning the semantics metrics that the fuzzy models lack. Theoretically, the work introduces a semantic fuzzy mining approach that is particularly focused on making use of the fuzzy logic and theories to represent the imprecise and uncertain (unstructured) data about the different domain processes, and then presents the resulting models in a format that allows one to analyse the available datasets based on concepts rather than the tags or labels in the events logs about the processes in question. In other words, this paper adopts the fuzzy logic which permits a proposal to be in another state as true or false, and in turn, evaluates the outcomes of the method using a series of case study experimentation and its comparison against the other benchmark algorithms used for process mining. The result shows that it is possible to determine through the classification process (e.g. using a classifier/reasoner) the presence of different patterns or traces that can be found within the discovered models, as well as the relationships the different process elements share amongst themselves within the knowledge-base. Ideally, the method is described as a fusion theory which integrates the fuzzy model with other tools or method, thus, supports a hybrid intelligent system.
Sliding door is an important part of commercial vehicle. Aiming at the problem of damage caused by the resonance of chuck and wheel in sliding guideway. The three-dimensional force and smoothness of sliding mechanism are studied. An improved guide rail system is designed in the grey model. In the non-linear optimization design [1], the trust domain multi-objective optimization algorithm is used to divide the model into a series of sub-regions. The model is solved iteratively in the trust region of the sub-region, and the design parameters satisfying the requirements of the smoothness of the guide rail are obtained. In ADAMS, the oblique vibration and force motion of components are simulated and analyzed when the track system is running normally. The results show that the force acting on each group of guide wheel mechanism can be effectively decomposed by using the multi-wheel design with separated forces. The smoothness and reliability of the system are improved.
A problem of detection of outside historical routes for radar maritime targets is considered. A new method is proposed for finding of normal historical routes and for detection of abnormal routes using radar data. This method is based on a grid-based modelling, an algorithm for finding of routing angles of all trajectories in each cell, and a modified DBSCAN algorithm with a new introduced distance function of two routing angles. It is shown that the classical distance functions such as Euclidean and Hausdorff cannot be able to use with the DBSCAN algorithm for clustering of routing angles. A test with real data is given. The false alarm rate (FAR) of the test is 0.
Convolutional neural networks have been widely used in object recognition, an important aspect of computer vision. The particular task of face recognition usually combines a softmax-loss function with some other loss function as the cost function in the training phase. In order to enhance the power of feature representation and speed up the training phase, this paper proposes a new supervised method called Improve-Center which is based on feature centers, the same as center-loss. It learns a center vector of features for every label and takes the feature of every sample closest to its center. This approach focuses on moving outer-space features closer to their center. The experimentation demonstrates that the approach is efficient. With softmax-loss and Improve-Center’s joint supervision, a better model can be trained to make intra-class features more compact, and inter-class ones more discrete. In addition, the training process is faster.
Time delays in network transmission have been a concern in teleoperation robot. In order to solve the problem, this paper proposes a delay prediction compensation scheme based on Wolf Pack Algorithm (WPA)-Back Propagation (BP) neural network model combined with generalized predictive control algorithm. The WPA has optimized the initial weights and threshold of the BP neural network and has improved the convergence speed and prediction accuracy of the BP neural network so that the WPA-BP model can accurately predict the time delay of a network. The time delay obtained by the model was combined with the improved generalized predictive control algorithm to calculate control increments for the design of a controller. The improved generalized predictive control algorithm can directly identify controller parameters without having to solve the Diophantine equation, saving time on online computation. The simulation results show that the solution can well compensate the time delay of network transmission ensuring a real-time and stable system.
Collaborative filtering is a well-known technique successfully used in various recommender systems. However, it suffers from major drawbacks of the scalability and the data sparsity problems, when the system makes a recommendation based on the ratings records of similar users. This study aims at solving these problems by exploiting user interest in movie genres to build clusters of users. We make a slight variation of Fuzzy C-means algorithm such that several centers, one for each genre, per cluster are maintained. Experimental results showed that the proposed strategy demonstrated performance comparable to a conventional method without using clustering and significantly better than a well-known clustering algorithm of K-means.
The probabilistic linguistic term set (PLTS) extends the notion of the linguistic variable (LV). The existing operations of PLTSs are mainly based on the subscripts of linguistic terms and their probabilities while the membership functions of linguistic terms are ignored. Consequently, these operations may cause a loss of information. The extension principle is useful in doing the calculations between LVs and it considers membership functions simultaneously. Inspired by this idea, in this study, we introduce the extension principle for PLTSs and then define the operations of PLTSs. A novel representation of the PLTS is given. Then, the union of PLTSs is presented. Afterwards, we define the extension principle for PLTSs and give the algebraic operations of PLTSs. These algebraic operations based on the extension principle contain both the probabilities and membership functions of linguistic terms, and thus can enhance the precision of final results.
The purpose of this project is to determine which Starbucks drinks (Frappuccino blended and Espressos) among all coffee and tea options are best for cardiovascular disease (CVD) prevention. A health index was constructed considering different coffee nutritions, including: saturated fat, cholesterol, sodium, carbohydrates, dietary fiber, sugars, protein, and caffeine. Antioxidant activity of flavonoids from Caffeine component can reduce free radical formation and scavenge free radicals. Principal Components Analysis (PCA) was used to explore all factors in the analysis and to inform on the utility of the health index in relation to its link to CVD prevention. Principle Component 1 is more relevant to unhealthy components such as sugars, carbohydrates, saturated fat, and total fat. Principle Component 2 is more related to Caffeine. Additionally, Dietary Fiber and Caffeine are most opposite against the other unhealthy components along the direction of the 1st Principle Component. PCA Eigen Analysis is very powerful to recognize coffee product types based on Nutrition patterns. In order to avoid variance factor in PCA analysis, original data has been Z-transformed. The new PCA-based Health Index was derived based on the eigenvalues and eigenvectors of the first two Principle Components. The new PCA-based Health Index was also compared to the Science-based Health Index (about 70%-80% R-Square Curve Fitting).
Machine Learning (ML) and Artificial Intelligence (AI) allowed researchers to view and analyze student’s behaviors as never before. By monitoring them online, teachers can beforehand help students who need assistance. Many research papers revealed that there is a positive correlation between those students who exhibit good classroom behavior and academic achievement. RSU-AI-Monitoring System is an effective tool for behavioral improvement. RSU-AI-Monitoring System tracks many attributes on user activities’ logs such as attendances, quiz marks, login and logout timestamps, IP addresses, names, etc. This research traces and directs students online to proper their behaviors. The experiment traced and tracked on two courses, which are THAI106 and ENG101. These subjects are the general education courses offering online. Student can access these subjects on an e-learning platform via mobile devices. Sample data were collected from 1,890 students in one semester. This paper discusses the data mining, ML and AI techniques to construct a new method enabling personalized learning. The experiment revealed a good progress on the overall student marks on their final examinations. 65 of 345 students were received the grade of B+ after participated in the tutoring program by AI Bot. 240 students who failed on the midterm examinations or received the low quiz scores, were able to pass their final examinations. The experiments revealed that 69.5% of the students had passed their final examinations and 18% obtained the B+ grades.
This paper introduces an object storage technology for the building of mass storage systems, and proposes an object-based system model. This model replaces the block-based storage with structured and unstructured P2P network topologies with object-based storage which facilitates data storage with high performance, high scalability, easy management, data sharing and high security. In engineering projects, this model can optimize the I/O performance of mass data storage systems which comprise heterogeneous OSD, following quantitative performance indicators.
Pairwise similarity between data points is usually computed in the traditional clustering methods. But in many cases, especially for high dimensional data in computer vision, it is required that more than two data points should be involved in representing the similarities. In this case, hypergraph clustering is an ideal tool for data analysis, where high order similarities on the data subsets, represented by hyperedges, can reflect the similarity among more than two data points. Hypergraph clustering usually includes hypergraph construction and hypergraph partition. Two important questions in hypergraph construction are how to generate the hyperedges and how many hyperedges should be used to represent the original data. Recently, Pulak Purkait et al. have proposed a method for generating the large pure hyperedges, which is proved to be more effective than the traditional methods for computer vision tasks. However, the method needs a specified number of hyperedges in advance, and uses random sampling to generate hyperedges, which may lead to suboptimal clustering results. Therefore, a novel sampling method called greedy neighborhood search is proposed in this work, which generates large pure hyperedges based on Shared Reverse k Nearest Neighbors (SRNN) and learns the number of hyperedges simultaneously. Experiments show the benefits of applying the proposed method on high dimensional data.
Entropy plays the powerful role in the fuzzy set theory to identify the uncertainty between alternatives as well as the weight of them. In this paper, the entropy is erected, it is an extension of Szmidt and Kacprzyk’s entropy on the interval-valued neutrosophic set which use the cardinality of fuzzy sets and the ratio distances. We will proposed a multiple criteria decision making problem using the entropy on the interval-valued neutrosophic set. An example is performed to illustrate the proposed model.
On May 25th, 2018, European Union has launched the regulation called General Data Protection Regulation (GDPR). The enforcement expands to Non-EU countries that lead to the global impact in 2018. All business and IT sectors need to review and change their system regarding the personal user data processing to respect user privacy’s right. However, there is no study related to user’s action or feedback regarding data privacy or data protection. This research aims to analyse user’s behaviour and perception on personal data protection by using Twitter as a study platform. The population included in our study consists of 560,923 Internet users (61 nationalities), their engagement on Twitter are collect for 8 months starting from Jan 1st, 2018 to Aug 30th, 2018. We do data mining on social media to understand social perception or feedback on important issues. Information fusion from many enriched features of Twitter helps us to do societal analysis that is really useful for authorities or policy makers. We investigate various aspects and found many interesting discoveries regarding the user’s perception on their privacy rights in different regions of the world. Sentiment analysis is performed on the Twitter’s content to show how people in different regions feel about their rights. We visualize word clouds that represent the keyword in the text data from tweet messages to understand user’s opinions. Moreover, we explore the use of Twitter’s hashtags that reveal higher degree of conceptual on tweet message. User engagement is investigated in terms of Likes, Retweet and Reply. Many surprising results are obtained that help us to understand what happen around the world when the new paradigm of data privacy is shifted.
Microwave drying is a rapid dehydration technique that can be applied to preserve agricultural products. For its complex application environment, the temperature and humidity in the drying material cannot be precisely controlled during microwave drying process. However, microwave drying without exactly controlling temperature and humidity is unable to guarantee the quality of agricultural products. To achieve a high quality, this paper proposed a fuzzy control method for the microwave dryer, which could control the temperature and humidity precisely. To identify the dry effect of the proposed fuzzy control scheme, drying time of Chinese jujube during fuzzy microwave (FM) drying, ordinary microwave (OM) drying, and temperature microwave (TM) drying were investigated, quality attributes such as vitamin C (VC), color and total flavonoids content (TFC) of dried samples were evaluated. As the results showed, total drying time used to reduce 500 g jujube moisture content from 74% to 6% on dry basis required 30, 26, and 31 min using FM drying, OM drying and TM drying, respectively. The drying rate with FM drying was the fastest. Jujube dried by FM drying revealed better color and higher retentions of VC and TFC content than TM and OM drying. Compared with TM and OM drying, FM drying performed shorter drying time and better quality of samples, indicating that microwave dryer with fuzzy controller can provide a practical and effective method for drying jujube with acceptable product quality.
Subgroup discovery is a problem in machine learning and data mining in which the population data is mined to discover interesting subgroups with respect to a target property. The goal of subgroup discovery is to find rules describing subsets of the population. In this paper, a new solving approach is proposed (FDG-SD). The new approach adopts fuzzy rule induction that uses a dynamic programming like algorithm to discover fuzzy subgroups. FDG-SD surpasses disadvantages of existing approaches. It is able to find better solutions for almost half out of 30 UCI machine learning repository datasets based on significance, unusualness, support, confidence and running time quality measures. According to Friedmann test results, the new approach (FDG-SD) is ranked first among mostly used algorithms with respect to significance, unusualness, support and running time quality measures.
This research proposes a feature extraction algorithm based on demographic and personality attributes on social media. The attributes including gender, age group, political affiliation, religion, and personality types are analyzed. Two feature sets are extracted for each user, including the comment text and community activity. Naïve Bayes and logistic regression classifier are performed to evaluate the attribute prediction. A dataset of comments from the Reddit website is obtained as a case study. Experimental results measured in term of F1 score are 88% in predicting user’s political affiliation, 85% for gender, 57% for religion, 46% for personality type, and 42% for the age group. We found that the feature set obtained from user activity provides better performance in the user recognition task.