Solo disponible en BuenasTareas
  • Páginas : 59 (14591 palabras )
  • Descarga(s) : 0
  • Publicado : 8 de septiembre de 2012
Leer documento completo
Vista previa del texto
Data Mining: Concepts and Techniques

Simon Fraser University Note: This manuscript is based on a forthcoming book by Jiawei Han and Micheline Kamber, c 2000 c Morgan Kaufmann Publishers. All rights reserved.

Jiawei Han and Micheline Kamber


Our capabilities of both generating and collecting data have been increasing rapidly in the last several decades. Contributing factorsinclude the widespread use of bar codes for most commercial products, the computerization of many business, scienti c and government transactions and managements, and advances in data collection tools ranging from scanned texture and image platforms, to on-line instrumentation in manufacturing and shopping, and to satellite remote sensing systems. In addition, popular use of the World Wide Web as aglobal information system has ooded us with a tremendous amount of data and information. This explosive growth in stored data has generated an urgent need for new techniques and automated tools that can intelligently assist us in transforming the vast amounts of data into useful information and knowledge. This book explores the concepts and techniques of data mining, a promising and ourishingfrontier in database systems and new database applications. Data mining, also popularly referred to as knowledge discovery in databases KDD, is the automated or convenient extraction of patterns representing knowledge implicitly stored in large databases, data warehouses, and other massive information repositories. Data mining is a multidisciplinary eld, drawing work from areas including databasetechnology, arti cial intelligence, machine learning, neural networks, statistics, pattern recognition, knowledge based systems, knowledge acquisition, information retrieval, high performance computing, and data visualization. We present the material in this book from a database perspective. That is, we focus on issues relating to the feasibility, usefulness, e ciency, and scalability of techniquesfor the discovery of patterns hidden in large databases. As a result, this book is not intended as an introduction to database systems, machine learning, or statistics, etc., although we do provide the background necessary in these areas in order to facilitate the reader's comprehension of their respective roles in data mining. Rather, the book is a comprehensive introduction to data mining,presented with database issues in focus. It should be useful for computing science students, application developers, and business professionals, as well as researchers involved in any of the disciplines listed above. Data mining emerged during the late 1980's, has made great strides during the 1990's, and is expected to continue to ourish into the new millennium. This book presents an overall pictureof the eld from a database researcher's point of view, introducing interesting data mining techniques and systems, and discussing applications and research directions. An important motivation for writing this book was the need to build an organized framework for the study of data mining | a challenging task owing to the extensive multidisciplinary nature of this fast developing eld. We hope thatthis book will encourage people with di erent backgrounds and experiences to exchange their views regarding data mining so as to contribute towards the further promotion and shaping of this exciting and dynamic eld.

To the teacher
This book is designed to give a broad, yet in depth overview of the eld of data mining. You will nd it useful for teaching a course on data mining at an advancedundergraduate level, or the rst-year graduate level. In addition, individual chapters may be included as material for courses on selected topics in database systems or in arti cial intelligence. We have tried to make the chapters as self-contained as possible. For a course taught at the undergraduate level, you might use chapters 1 to 8 as the core course material. Remaining class material may be...
tracking img