The field of data mining is currently experiencing a very dynamic period. It has reached a level of maturity that has enabled it to be incorporated in IT systems and business processes of companies across a wide range of industries. Information technology and E-commerce companies such as Amazon, Google, Yahoo, Microsoft, IBM, HP and Accenture, are naturally at the forefront of these developments. In addition, data mining technologies are also getting well established in other industries and government sectors, such as health, retail, automotive, finance, telecom and insurance, as part of large corporations such as Siemens, Daimler, Walmart, Washington Mutual, Progressive Insurance, Portugal Telecom as well as in governments across the world.
As data mining becomes a mainstream technology in businesses, data mining research has been experiencing explosive growth. In addition to well established application areas such as targeted marketing, customer churn, and market basket analysis, we are witnessing a wide range of new application areas, such as social media, social networks, and sensor networks. In addition, more traditional industries and business processes, such as healthcare, manufacturing, customer relationship management and marketing are also applying data mining technologies in new and interesting ways. These areas pose new challenges both in terms of the nature of the data available (e.g., complex and dynamic data structures) as well as in terms of the underlying supporting technology (e.g., low-resource devices). These challenges can sometimes be tackled by adapting existing algorithms but at other times need new classes of techniques. This can be observed by looking at the topics being covered at existing major data mining conferences and journals as well as by the introduction of new ones.
A major reason behind the success of the data mining field has been the healthy relationship between the research and the business worlds. This relationship is strong in many companies where researchers and domain experts collaborate to solve practical business problems. Many of the companies that integrate data mining into their products and business processes also employ some of the best researchers and practitioners in the field. Some of the most successful recent data mining companies have also been started by distinguished researchers. Even researchers in universities are getting more connected with businesses and are getting exposed to business problems and real data. Often, new breakthroughs in data mining research have been motivated by the needs and constraints of practical business problems. This can be observed at data mining scientific conferences, where companies are participating very actively and there is a lot of interaction between academia and industry.
As part of our (small) contribution to strengthen the collaboration between companies and universities in data mining, we have been helping organize a series of workshops on Data Mining for Business Applications, with major conferences in the field:
• “Data Mining for Business” workshop, with ECML/PKDD, organized by Carlos Soares, Luís Moniz (SAS Portugal) and Catarina Duarte (SAS Portugal), which was held in Porto, Portugal, in 2005 (http://www.liaad.up.pt/dmbiz/).
• “Data Mining for Business Applications” workshop, with KDD, organized by Rayid Ghani and Carlos Soares, in Philadelphia, USA, in 2006 (http://labs.accenture.com/kdd2006\ %5Fworkshop/).
• “Practical Data Mining: Applications Experiences and Challenges” workshop, with ECML/PKDD, organized by Markus Ackermann (Univ. of Leipzig), Carlos Soares and Bettina Guidemann (SAS Deutschland), which took place in Berlin, Germany, in 2006 (http://wortschatz.uni-leipzig.de/ macker/dmbiz06/).
• “Data Mining for Business Applications” workshop, with KDD, organized by Rayid Ghani, Carlos Soares, Françoise Soulié-Fogelman (KXEN), Katharina Probst (Accenture Technology Labs) and Patrick Gallinari (Univ. of Paris), that was held in Las Vegas, USA, in 2008 (http://labs.accenture.com/kdd2008\ %5Fworkshop/).
This book contains extended versions of a selection of papers from these workshops. The chapters of this book cover the entire spectrum of issues in the development of data mining systems with special attention to methodological issues. Although data mining has reached a reasonable level of maturity and a large number and variety of algorithms, tools and knowledge is available to develop good models and integrate them into business processes, there is still space for research in new data mining methods. Many methodological issues still remain open, affecting several phases of data mining projects, from business and data understanding to evaluation and deployment. As data mining gets applied to new business problems, new research challenges are encountered opening up large unexplored areas of research. The chapters in Part 1, discuss some of the most important of those issues. The authors offer diverse perspectives on those issues due to the different nature of their backgrounds and experience, which include the automotive industry, the data mining industry and the research community.
The book also covers a wide range of business domains, illustrating both classical applications as well as emerging ones. The chapters in Part 2 describe typical problems for which data mining has proved to be an invaluable tool, such as churn and fraud detection, and customer relationship management (CRM). They also cover some of the more important industries, namely banking, government, energy and healthcare. The issues addressed in these papers include important aspects such as how to incorporate domain-specific knowledge in the development of data mining systems and the integration of data mining technology in larger systems that aim to support core business processes. The applications in this book clearly show that data mining projects must not be regarded as independent efforts. They need to be integrated into larger systems to align with the goals of the organization and those of its customers and partners. Additionally, the output of data mining components must, in most cases, be integrated into the IT systems of the business and, therefore, in its (decision-making) processes, sometimes as part of decision-support systems (DSS).
The chapters in Part 3 are devoted to emerging applications of data mining. These chapters discuss the application of novel methods that deal with complex data like social networks and spatial data, to explore new opportunities in domains such as criminology and marketing intelligence. These chapters illustrate some of the exciting developments going on in the field and identify some of the most challenging opportunities. They stress the need for researchers to keep up with emerging business problems, identify potential applications and develop suitable solutions. They also show that companies must not only pay attention to the latest developments in research but also continuously challenge the research community with new problems. We believe that the flow of new and interesting applications will continue for many years and drive the research community to come up with exciting and useful data mining methods.
This book presents a collection of contributions that illustrates the importance of maintaining close contact between data mining researchers and practitioners. For researchers, it is essential to be exposed to and motivated by real problems and understand how business problems not only provide interesting challenges but also practical constraints which must be taken into account in order for their work to have high practical impact. For practitioners, it is not only important to be aware of the latest technology developments in data mining, but also to have continuous interactions with the research community to identify new opportunities to apply existing technologies and also provide the motivation to develop new ones.
We believe that this book will be interesting not only for data mining researchers and practitioners that are looking for new research and business opportunities in DM, but also for students who wish to get a better understanding of the practical issues involved in building data mining systems and find further research directions. We hope that our readers will find this book useful.
Porto, Chicago – July 2010,
Carlos Soares and Rayid Ghani