Ebook: Information Modelling and Knowledge Bases XXXI
Information modeling and knowledge bases have become an important area of academic and industry research in the 21st century, addressing complexities of modeling that reach beyond the traditional borders of information systems and academic computer science research.
This book presents 32 reviewed, selected and updated papers delivered at the 29th International Conference on Information Modeling and Knowledge Bases (EJC2019), held in Lappeenranta, Finland, from 3 to 7 June 2019. In addition, two papers based on the keynote presentations and one paper edited from the discussion of the panel session are included in the book. The conference provided a forum to exchange scientific results and experience, and attracted academics and practitioners working with information and knowledge. The papers cover a wide range of topics, ranging from knowledge discovery through conceptual and linguistic modeling, knowledge and information modeling and discovery, cross-cultural communication and social computing, environmental modeling and engineering, and multimedia data modeling and systems to complex scientific problem-solving. The conference presentation sessions: Learning and Linguistics; Systems and Processes; Data and Knowledge Representation; Models and Interface; Formalizations and Reasoning; Models and Modeling; Machine Learning; Models and Programming; Environment and Predictions; and Emotion Modeling and Social Networks reflect the main themes of the conference. The book also includes 2 extended publications of keynote addresses: ‘Philosophical Foundations of Conceptual Modeling’ and ´Sustainable Solid Waste Management using Life Cycle Modeling for Environmental Impact Assessment’, as well as additional material covering the discussion and findings of the panel session.
Providing an overview of current research in the field, the book will be of interest to all those working with information systems, information modeling and knowledge bases.
The conference on Information Modeling and Knowledge Bases has become an important technology contributor for 21st century academic and industry research, addressing the complexities of modeling in digital transformation and digital innovation that reach beyond the traditional borders of information systems and computer science academic research.
The international conference on Information Modeling and Knowledge Bases originated from the co-operation between Japan and Finland in 1982 which was then known as the European Japanese conference (EJC). At that time, Professor Ohsuga from Japan and Professors Hannu Kangassalo and Hannu Jaakkola from Finland (Nordic countries) carried out the pioneering work for this longstanding academic collaboration. Over the years, the organization has been widened to include European countries as well as many other countries worldwide. In 2014, because of this expanded geographical scope, the ‘European Japanese’ part of the title was replaced by ‘International’. The conference is characterized by opening with an ‘appetizer session’ that allows participants to introduce their topic in a short three-minute presentation, followed by presentation sessions with enough time for discussion. A limited number of participants is typical for this conference.
The 29th International Conference on Information Modeling and Knowledge Bases (EJC2019) held in Lappeenranta, Finland constituted a research forum to exchange scientific results and experience attracting academics and practitioners who deal with information and knowledge. The main topics of EJC2019 covered a wide range of themes extending from knowledge discovery through Conceptual Modeling, Knowledge and Information Modeling and Discovery, Linguistic Modeling, Cross-Cultural Communication and Social Computing, Environmental Modeling and Engineering, and Multimedia Data Modeling and Systems to complex scientific problem solving. The conference presentation sessions: Learning and Linguistics, Systems and Processes, Data and Knowledge Representation, Models and Interfaces, Formalizations and Reasoning, Models and Modeling, Machine Learning, Models and Programing, Environment and Predictions, Emotion Modeling and Social Networks reflected the main themes of the conference.
EJC2019 included five keynote presentations on Information Modeling and Knowledge Bases and a panel discussion on ‘Artificial Intelligence and Environment Modeling’ in the context of information and conceptual modeling. ‘Philosophical Foundations of Conceptual Modeling’ was the keynote address given by Professor John Mylopoulos of Toronto University, Canada and the University of Trento, Italy. ‘Quantification of Uncertainties of Mathematical Modeling’ was the keynote address by Professor Heikki Haario of Lappeenranta-Lahti University of Technology, Finland. ‘CFD-based Optimization in Industrial Applications’ was the keynote address by Professor Jari Hämäläinen of Lappeenranta-Lahti University of Technology (LUT), Finland. ‘Sustainable Solid Waste Management using Life Cycle Modeling for Environmental Impact Assessment’ was the keynote address by Professor Mika Horttanainen of Lappeenranta-Lahti University of Technology, Finland, and ‘Object Embedding: Can we Catch the Meaning in a Vector?’ was the keynote address by Professor Aleksei Shpilman of the Higher School of Economics, St Petersburg, Russia.
The contributions of this proceedings of the 29th International Conference of Information Modeling and Knowledge Bases feature thirty-two reviewed, selected, and updated contributions that are the result of presentations, comments, and discussions during the conference. This volume also includes two extended publications of the keynote addresses: ‘Philosophical Foundations of Conceptual Modeling’ and ‘Sustainable Solid Waste Management using Life Cycle Modeling for Environmental Impact Assessment’ with additional material covering the discussion and findings of the panel session. The content of these proceedings also includes the long and short papers presented in the conference revised after the conference before being published in the series ‘Frontiers in Artificial Intelligence’ by IOS Press (Amsterdam). The proceedings volume ‘Information Modeling and Knowledge Bases’ is edited by the editorial committee of the conference.
This paper contributes to the philosophical foundations of conceptual modeling by addressing a number of foundational questions such as: What is a conceptual model? Among models used in computer science, which are conceptual, and which are not? How are conceptual models different from other models used in the Sciences and Engineering? The paper takes a stance in answering these questions and, in order to do that, it draws from a broad literature in philosophy, cognitive science, Logics, as well as several areas of Computer Science (including Databases, Software Engineering, Artificial Intelligence, Information Systems Engineering, among others). After a brief history of conceptual modeling, the paper addresses the aforementioned questions by proposing a characterization of conceptual models with respect to conceptual semantics and ontological commitments. Finally, we position our work w.r.t. to a “Reference Framework for Conceptual” modeling recently proposed in the literature.
Life cycle assessment (LCA) can be used as a tool to find environmentally the most sustainable alternatives for waste management solutions. It is used as a research tool to compare different system and technology solutions and to find the hot spots of the systems where the development measures are the most effective from environmental and economic points of view. LCA can be used as a support for strategic decision making in the waste management operations of cities and industries. It can also be used as a tool for R&D purposes of new waste treatment and recovery technologies to design and apply them as sustainably as possible. One of the challenges of LCA and restrictions for it to be used easily for all the beneficial purposes is its laboriousness. The data for each waste management LCA has to be acquired from many different sources, both primary data from different stakeholders and secondary data from several literature sources. Often secondary data has to be converted to fit into scale of the application and there are many uncertainties and inaccuracies in the data. These facts make it very difficult to automatize the data collection for LCA purposes. Waste management LCA implementation demands very good understanding of the entire waste management and recovery system and so far it can be done properly only by skilled LCA and waste management experts. There is some on-going development to increase the data collection from the existing waste management systems and waste materials, but they don’t solve very large share of the data management problems of waste management LCA.
The aim of our study is to shed light on how academics experience using recorded audio feedback (RAF) as a feedback method in multi-cultural higher e-Education context. We adopted a qualitative content analysis approach, applying thematic network analysis to the data received from three academics (a case study). This approach proposes graphical networks as an aid for analyzing and synthesizing qualitative data into basic, organizing and global themes. The thematic network analysis produced two global, six organizing and 48 basic themes. The two global themes were named “Speaking style” and “Culture neutrality/sensitivity”. Based on our analysis, academics can, by using RAF in multi-cultural e-Education context, provide learners neutral and caring feedback. Culture neutrality in RAF treats all learners equally and culture sensitivity in RAF promotes learning and progress taking learners’ diversity into account. Based on our analysis we introduce a preliminary RAF process model in multi-cultural higher e-Education context.
The paper presents the results of an experimental study, which examined the effectiveness of learning database fundamentals depending on students’ educational background as an influencing factor. In addition, a students’ knowledge perception and their actual knowledge were analyzed regarding their educational background. The results demonstrate that students’ educational background did not have an effect on the overall learning process. We also found that the background does not have the same influence on students’ self-assessment throughout the entire learning process.
For the purpose of text classification or information retrieval, we apply preprocessing to these texts such as stemming and stopwords removal. Almost all the techniques could be useful only to well-formed text information like textbooks and news articles, but is not true to social network services (SNS) or any other texts in internet world. In this investigation, we propose how to extract stopwords in context of social network services. To do that, first we discuss what stopwords mean, how different from conventional ones, and we propose statistical filters TFIG and TFCHI, to identify. We examine categorical estimation to extract characteristic values putting our attention on Kullback Leibler Divergence (KLD) over temporal sequences on SNS data. Moreover we apply several preprocessing to manage unknown words and to improve morphological analysis.
Sequence operators are effective for efficiently combining multiple events when state recognition is performed by combining time series events. Since sensor data are inherently noisy, one can take a strict attitude to deal with them: it is conceivable that all of time series events are regarded as false positives. Then, all complex events should be constructed carefully. Such an attitude is called the skip-till-any-match model in the sequence operator. When using this model, huge amounts of potential complex events are generated. A sequence operator usually supports both Kleene closure and non-Kleene closure. While efficient methods have been studied for Kleene closure so far, that for non-Kleene closure have been still explored. In this paper, we propose the reduced expression method to improve the efficiency of sequence operator processing for the skip-till-any-match model. Experimental results showed that the processing time and memory size were more efficient compared with SASE, which is the conventional method, and that degree is up to several thousand times.
The research results, as a proof of concept published in the article, aim to contribute to the solution of a universal model of a controlled vocabulary for various scientific themes. The primary motivation is the theme of cyber security, from which the internationally accepted definitions of the terms are missing. The literature review has confirmed that a vocabulary model in cyber security is not a frequently published topic, but in some disciplines (for example in simulation), quality online vocabulary is missing. The solution is an ontology-driven application that is in the format of knowledge management system (KMS), and can model the functions of thesauri. The implementation tool is the software ATOM, a product of the Company AION-CS Zlín, Czech Republic. The key part of the research is designing a proper ontology (classes, associations, and characteristics) that offers complexity and flexibility functions. The final shape of the online vocabulary is valuable for analysis, integration, education and study purposes. The ontology includes recursive associations to the class CONCEPT (term) that makes it possible to link terms according to their hierarchy, opposite, similarity or other relation, and increase understanding of the vocabulary theme. The volume of terms in the vocabulary is only several tens and they were chosen for the proof of concept to highlight the advantage of the online vocabulary in a KMS format. A problem is with the terms used because most vocabularies are protected by Copyright, and their permission makes it possible to use only short parts in research. It is one of the complications of preparing an online cyber security vocabulary in practice.
Public administrations are under an increasing pressure to improve their effectiveness and efficiency usually with the means of new IT solutions. However, IT solutions do not necessary bring the desired results if being based on ineffective processes. By considering these, we performed a project in public administration sector, whose aim was to provide precise technical specification for a new IT system based on redesigned business processes. As a result, a 58 pages-long process specification was created, consisting of nine processes being specified and modelled from several viewpoints. This paper evaluates the applicability of (novel) technologies and approaches used in the specification. Additionally, lessons learned and challenges, related to activities and process outcomes, are presented and critically evaluated.
The paper addresses the field of Process Optimization in Slovenian companies, and their attitude toward optimization approaches. A survey was conducted, including professionals from companies, as well as students, examining approaches to assure business process quality, its validation and optimization, containing several methods. Survey results, both in companies (24 participants) and among students (24 participants) indicate that, although a combination of methods provides the best results, modeling, KPI identification and simulation are the best indicators of process strengths and weaknesses, as well as optimization potentials. Therefore, the research was complemented with a review of 5 available modeling and simulation tools, providing evaluation and comparison. To get insight into which of the tools best supports the optimization process, a comparative analysis was conducted, identifying key differences between them, and providing suggestions for optimization enhancement in companies.
There is a large number of NoSQL data systems which can be classified into four different types of NoSQL databases. However, there are no generally well-known and established software quality frameworks that help information system developers decide which database or system is the most useful in their case. We present a comparison and definition of NoSQL systems according to selected quality attributes based on literature review, and an overview and evaluation of how different NoSQL solutions meet the most important quality criteria. Further contribution of this work is definition of a utility function which can, together with the evaluation, help developers, architects and software engineers to understand different quality attributes and how they reflect in selected NoSQL products, and how a certain NoSQL database system would help solve their particular problem, and based on utility function they can actually select the most appropriate solution.
Citizen science project applications can be created in two different ways: use an existing platform or build them from scratch. When creating applications for citizen science projects, data quality, privacy and provenance are important characteristics of such applications. Therefore, the objective of this research is to find out how well data quality, privacy and provenance are handled in ongoing citizen science projects. A number of citizen science projects have been compared against ISO/IEC 25012 data quality standard characteristics. Results show that data quality is mostly lacking in the areas of accuracy, privacy, provenance and availability. Projects have not implemented decent accuracy checks when giving data, projects show the real name and location of participants and such data should not be available to others. Management of provenance is not found in many projects but where it is found, provenance is either really well or extremely poorly handled. Many of these issues could be easily solved with proper data management and testing. At the end of this article, multiple suggestions are given to improve data quality, provenance and privacy in citizen science project applications.
In this article the focus is on software evolution, which is an important part of software engineering. In practice, software development does not stop when a system is delivered but continues throughout the lifetime of the system. After the system has been deployed, external pressure for change can generate new requirements for the existing software. This change aspect, which is a characteristic of software engineering, should be taken into consideration when developing and modeling new software systems. In this paper the theme was studied using experience gained from the piloting of a reference system developed in an earlier research project carried out by Tampere University of Technology. Software evaluation is examined from the point of view of system developers, administrators (maintenance), and end users based on a concrete long-term piloting period.
Precision Agriculture and Smart Farming are increasingly important concepts in agriculture. While the first is mainly related to crop production, the latter is more general, which also involves the carbon capture capacity of crop fields (Carbon Farming), as well as optimization of the farming costs taking into account the dynamics of market prices. In this paper we present our recent work in building a web-based decision support system for farmers to help them comply with these trends and requirements. The system is based on the Oskari platform, developed in Finland for the visualization and analysis of geospatial data. Our main focus so far has been in developing tools for Big Data and Deep Learning based modelling which will form the analytical engine of the decision support platform.We first give an overview on the various applications of deep learning in crop production. We also present our recent results on within-field crop yield prediction using a Convolutional Neural Network (CNN) model. The model is based on multispectral data acquired using UAVs during the growth season. The results indicate that both the crop yield and the prediction error have significant within-field variance, emphasizing the importance of developing field-wise modelling tools as a part of a decision support platform for farmers. Finally, we present the general architecture of the overall decision support platform currently under development.
When viewing the world each of us has a unique perception of his/her surrounding. The way and mode we use to notice important and interesting details is heavily influenced by things like our education, our academic discipline, or our personal interests. The cognitive patterns we prefer, our (working) habits, and in general our focus are often affected by our way to perceive the world.
In contexts like multidisciplinary research projects, companies with various diverse departments, or so to say in contexts where many modes of perception and thus many different interest foci shall collaborate, this can be challenging. Different groups need different information to perform their working tasks. Different (data) structures, formats, and namespaces are requested. Even when examining the exact same object different experts will have different perspectives.
In this paper we will approach this situation from a data managers point of view. In general these perspectives are no indicator for deficiencies. The different perspectives just mirror the differing tasks and purposes which are addressed. Mostly the existing structures and forms of information (re)presentation are reasonable and adapted to the needs of the people following the perspective. Nevertheless the amount of perspectives is possibly numerous, as potentially each task requires a slightly different way of structuring the available information. Thus it is in the general case infeasible to maintain all possible perspectives as physical data structures. We therefore suggest to systematically describe perspectives and use the information from these models to generate appropriate interfaces on demand.
Improved computing environments performing large-scale data processing and high-speed computational processing facilitate the delivery of new algorithms to businesses while considering cost efficiency for small-scale investments. Implementing the proposed method more as a criterion for feasibility and economic rationality in specific problem areas rather than as an approach to generic issues, we aim to develop technologies of practical use in the real world. Recently, it has become possible for customers to monitor their buying behavior through smart devices, and with the improvement of computing performance, it has become possible to improve the accuracy of prediction and recommendation cycles through active online learning.
This study proposes a method for dynamically recommending products that are highly likely to be selected by the user by combining the user’s reaction with reuse of knowledge and real-time online learning to cyclically repeat feedback that is more specific to the user.We propose a method to sense streaming data by utilizing a user’s behavior, intervening a user’s behavioral change through interactions, such as recommendations, and evaluating the userâĂŹs buying intention and interest in each product. Using the evaluation results for recommendations helps achieve positive feedback and effectively support the selection of more exciting or different products. We propose a recommendation method specific to individual customers based on past transaction data, where changes can be monitored in real-time by reusing the knowledge acquired in advance through batch processing of knowledge discovery and data mining and processing the stream data in real-time online.
We will present the implementation of our proposed method targeting the database system and machine learning algorithm.
This study proposes a data labelling scheme for bed position classification task. The labelling scheme provides a set of bed position for the purpose of preventing the bed fall and bedsore injuries which seriously imperil the aging people health. Most of the elderly fall down when they attempt to get out of bed with unassisted bed exit. Also, there is a high possibility of rolling out of bed when an elderly lies close to the edge of the bed. In addition, a bedridden person, who cannot reposition by him/herself, has a high risk of bedsores. Repositioning in every two hours alleviates the prolonged pressure over on the body. We collected the data from a specific set of bed sensor and classified the signal into five positions on the bed, which are off-bed, sitting, lying center, lying left, and lying right. These five positions are the fundamental information for developing a model to capture the movement of the elderly on the bed. The precaution strategy is then able to be designed for the bed fall and bedsore prevention. The data of the five different positions are manually annotated by observing the synchronized video through a specially designed workbench. The combination of the positions of off-bed, sitting, and lying is used to detect a bed exit situation, and the combination of the positions in the lying state, i.e. lying center, lying left, and lying right, is used to detect the rolling out of bed situation. Moreover, to notify for reposition assisting in the bedridden, the three lying positions are used to calculate the time of the abiding position.
We herein introduce the matrix node graph data structure and its application for credibility assessment with temporal transition of intention classification. Information that is disseminated on the World Wide Web (WWW) has two meanings. Those are the apparent meaning and the implicit intention. Implicit intention is defined as the purpose of information dissemination. However, the intention of disseminated information cannot be recognized by only examining text. Especially in the case of Fake News, intention is artfully hidden, and the intention will change in the process of the information spreading. The recognition of the intention of information and following its temporal transition are effective to assess the credibility of the information. The matrix node graph structure is a graph that has a matrix as a node. In this data structure, we propose a method to recognize and classify the intention of information by use of an intention matrix while following the temporal transition of intention using a graph structure. By using this structure, the intention and temporal transition of particular information can be derived. This paper shows the feasibility and usefulness of the structure through experiments. An application for credibility assessment is also introduced.
The article considers ontology as a set of relations (unary and binary) which are represented by specialized matrix like structures – C-systems. That allows us to consider tasks of inference on ontologies as constraint satisfaction problems. A method of a priori analysis and transformation of SPARQL queries patterns into a form which speeds up the subsequent execution of concrete user queries has been developed. The method is oriented on ontologies developed with using content ontology design patterns, that ensures the predictability of the structure of potential queries. The method is based on the combining of the methods of structural decomposition and the original methods for non-numerical constraints satisfaction.
Models are one of the main vehicles in everyday and scientific communication, understanding, learning, explanation, exploration, comprehension, representation, starting points for investigation, pattern detection and exploration, system development, problem solution, hypothetical reasoning, and theory development. Models are mediators, explainers, shortcuts, etc. Models are used as instruments in these scenarios. Their function varies and thus their properties.
This paper investigates the functions of models in dependence on their scenarios. We concentrate the investigation on engineering and construction scenarios which are the main model use in Computer Science and Computer Engineering. The problem solving scenarios, the science scenarios, and the social scenarios are considered as well in a brief form.
The significant computation in global environmental analysis is “context-oriented semantic computing” to interpret the meanings of natural phenomena occurring in the nature. Our semantic computing method realizes the semantic interpretation of natural phenomena and analyzes the changes of various environmental situations. It is important to realize global environmental computing methodology for analyzing difference and diversity of nature and livings in a context dependent way with a large amount of information resources in global environments. Semantic computations contribute to make “appropriate and urgent solutions” to the changes of environmental situations.
It is also significant to memorize those situations and compute environment changes in various aspects and contexts, in order to discover what are happening in the nature of our planet. We have various (almost infinite) aspects and contexts in environmental changes, and it is essential to realize a new analyzer for computing the meanings of those situations and making solutions for discovering actual aspects and contexts.
We propose a new method for semantic computing in our Multi-dimensional World map. We utilize a multi-dimensional computing model, the Mathematical Model of Meaning (MMM) [1–3], and a multi-dimensional space with an adaptive axis adjustment mechanism. In semantic computing for environmental changes in multi-aspects and contexts, we present important functional pillars for analyzing natural environment situations. We also present a method to analyze and visualize the highlighted pillars using our Multi-dimensional World Map (5-Dimensional World Map) System.
We introduce the concept of “SPA (Sensing, Processing and Analytical Actuation Functions)” for realizing a global environmental system, to apply it to Multi-dimensional World Map System. This concept is essential to design environmental systems with Physical-Cyber integration to detect environmental phenomena in a physical-space (real space), map them to cyber-space to make analytical and semantic computing, and actuate the analytically computed results to the real space with visualization for expressing environmental phenomena, causalities and influences. This system currently realizes the integration and semantic-analysis for KEIO-MDBL-UN-ESCAP Joint system for global ocean-water analysis with image databases. We have implemented an actual space integration system for accessing environmental information resources and image analysis.
This paper presents a 5D World Map System’s application for disaster-resilience monitoring as “Environmental AI System” of each player’s implementation of United Nation’s SDG 9 and DGS 11 from global-level to regional-level, country-level, sub-regional-level and city-level. In Asia-Pacific, disaster risk is outpacing disaster resilience. The gap between risk and resilience-building is growing in those countries with the least capacity to prepare for and respond to disasters. Using the Sensing-Processing-Actuation (SPA) functions of 5D World Map System, a disaster risk analysis can be conducted in multiple contexts, including regional, national, and sub-national. At the regional, national and subnational levels, the analysis will focus on identifying disaster risk hotspots through incorporating existing multi-hazard disaster risk and socio-economic risk information. The system will further be used to assess future risks through integration of global climate scenarios downscaled to the region as well as countries. This paper presents the design of two new actuation functions of 5D World Map System: (1) Short-term warning with prediction and push alert and (2) Long-term warning with context-dependent multidimensional visualization, and examines the applicability of these functions by indicating that (1) will support both resident and those who are working at the operational level by being customized to disaster risk analysis for each target region/country/area, and (2) assist both policy-makers and sectoral ministries in target countries to use the analysis for evidence-based policy formulation, planning and investment towards building disaster-resilient society.
Artificial intelligence systems require logic calculation to give true or false judgment. However, artificial intelligence systems cannot be simply implemented by basic Boolean logic calculation. Deep artificial neural networks implemented by multiple matrix calculation is one of the efficient methods to construct artificial intelligence systems. We have presented semantic computing models in which input data are mapped in to a semantic space and presented as points in semantic spaces. From the point view of our semantic computing model, the multiple matrix calculation like artificial neural networks is a data mapping operation. That is, input data are mapped into a semantic space by the multiple matrix calculation. In our method, the true or false logic judgement is transmitted into calculating Euclidean distances of those points in the semantic spaces. In order to apply the semantic computation model for developing artificial intelligence systems, it is important to understand the mechanism between the logic calculation and semantic space and the deep-learning mechanism. In this paper, we present logic calculation implemented by the multiple matrix calculation which is the basic calculation method to implement the artificial intelligence system. The most important contribution of this paper is that we first present the mechanism for implementing logic calculation with semantic space model and machine learning. In the paper, we use three example cases to illustrate the mechanism. We first present an example case on implementing combination logic calculations based on linear space mapping. After that, we present an example case where the semantic space is constructed based on principal component analysis. The third example case is on sequential logic operations. The concept of semantic space, subspace selection and learning mechanism utilized in the example cases are also illustrated.
In this paper we deal with machine learning methods and algorithms applied to the area of geographic data. First, we briefly introduce learning with a supervisor that is applied in our case. Then we describe the algorithm ‘Framework’ together with heuristic methods used in it. Definitions of particular geographic objects, i.e. their concepts, are formulated in our background theory Transparent Intensional Logic (TIL) as TIL constructions. These concepts serve as general hypotheses. Basic principles of supervised machine learning are generalization and specialization. Given a positive example, the learner generalizes, while after a near-miss example specialization is applied. Heuristic methods deal with the way generalization and specialization are applied.
In an extremely fast development of technology era, we are now living in the age of Industry 4.0, the age of realizing Cyber Physical System (CPS). The virtual space being realized by digital space concept will completely merge with our physical dimension in a very near future. Every smart ecosystem could make us more convenient to live. However, this technology could be a severe weapon which is able to damage our life, our assets, organization security, and national sovereignty and could affect the extinction of human kind. We strongly realize this concern and are proposing one of the solutions to secure our life in the next smart world, the Holistic Framework of Using Machine Learning for an Effective Incoming Cyber Threats Detection. We present an effective holistic framework which is easy to understand, easy to follow, and easy to implement a system to protect our digital space in an initial state. This approach describes all steps with the significant modules (I-D-A-R: Idea-Dataset-Algorithm-Result Framework with B-L-P-A: Brain-Learning-Planning-Action concept) and explains all major concern issues for developers. As a result of the I-D-A-R framework, we provide an important key success factor of each state. Finally, a comparison of detection accuracy between using Multinomial Naïve Bayes, Support Vector Machine (SVM) and Deep Learning algorithm, and the application of the feature engineering techniques between Principle Component Analysis (PCA) and Standard Deviation successfully show that we can reduce the computation time by using the proper algorithm that matches with each dataset characteristics while all prediction results still promising.