Data Mining

Páginas: 12 (2871 palabras) Publicado: 21 de abril de 2012
Introduction to the Mining of Clinical Data
James H. Harrison, Jr, MD, PhD
Division of Clinical Informatics, Departments of Public Health Sciences and Pathology, University of Virginia, Suite 3181 West Complex, 1335 Hospital Drive, Charlottesville, VA 22908-0717, USA

The progressive increase in the amount of clinical data stored in electronic form isdfor the first time in historydmaking itpossible to carry out large-scale studies that focus on the interaction between genotype, phenotype, and disease at a population level. Such studies have extraordinary potential to determine the effectiveness of treatment and monitoring strategies, identify subpopulations at risk for disease, define the real variability in the natural history of disease and comorbidities, discover rational bases fortargeting therapies to particular patients, and determine the incidence and contexts of unwanted health care outcomes. Matching patient responses (phenotype) with gene expression and known metabolic pathway relationships across large numbers of individuals may be the best hope for understanding the complex interplay between multiple genes and the environment that underlies some of the most commonand debilitating health problems [1]. Although serious issues remain to be resolved before the large-scale secondary use of health data for research can become routine, this topic has been recognized and identified as a national priority in Canada [2] and the United States [3]. Clinical laboratory databases contain perhaps the largest available collection of structured medical data representinghuman phenotypes of disease progression and response to therapy. Alone and especially in combination with other clinical and environmental data, laboratory databases have substantial value for translational research, including correlative studies linking gene expression with phenotype, and for identifying groups of patients with similar characteristics for follow-up analysis or inclusion in clinicalstudies. Large-scale clinical databases permit targeted observational and correlative studies that complement randomized clinical trials [4,5]. These databases also hold the promise of more comprehensive analyses to reveal
E-mail address: james.harrison@virginia.edu 0272-2712/08/$ - see front matter Ó 2008 Elsevier Inc. All rights reserved. doi:10.1016/j.cll.2007.10.001 labmed.theclinics.com

2HARRISON

unknown, useful real-world relationships among clinical data. The data volume and comprehensiveness that make these data sets useful, however, also make them difficult or impossible to analyze by manual or traditional statistical methods. Analogous challenges have occurred previously in other domains, including the need to identify purchasing associations among billions of retailtransactions [6], the need to identify similarities in patterns among terabytes of geologic data for oil exploration [7], and the need to identify patterns in planetary mapping data [8], among many other examples. These needs have been addressed using a set of techniques from the machine learning and pattern recognition fields collectively referred to as ‘‘data mining.’’ In recent years, biomedicalscience has also begun to apply these techniques to large-scale data analysis, as evidenced by the dramatic increase in biomedical publications referring to data mining over the past 10 years (Fig. 1). Brief overview of data mining Data mining has been described as the ‘‘extraction of implicit, previously unknown and potentially useful information’’ [9], such as associations and correlations betweendata elements, from large repositories of data. It is the technical and statistical component of the process of ‘‘knowledge discovery in databases’’ (KDD, Refs. [10,11]), which has a primary goal of identifying useful new information and is sometimes used synonymously with KDD. Although the data mining label is sometimes also applied to techniques designed to determine whether and to what...
Leer documento completo

Regístrate para leer el documento completo.

Estos documentos también te pueden resultar útiles

  • Data Mining
  • Data Mining
  • data mining
  • Data mining
  • Data Mining
  • DATA MINING
  • Data Mining
  • Data Mining

Conviértase en miembro formal de Buenas Tareas

INSCRÍBETE - ES GRATIS