Definiciones de analizis de mineria de datos

  • Publicado : 17 de julio de 2010
1. Discriminant analysis

A procedure for the determination of the group to which an individual belongs, based on the characteristics of that individual. Suppose we have measurements on pcharacteristics for each of a sample of individuals. We know that each individual belongs to one of g groups, but we do not know which. Discriminant analysis attempts to maximize the probability of correctallocation. It differs from cluster analysis in that we have an initial data set, the training set, whose group allocations are known.

For the case g=2, suppose that the p×1 vectors of sample meansare x̄1 and x̄2. Let n1 and n2 be the numbers of members of the training set falling in the two groups and let S1 and S2 be the variance–covariance matrices for the two parts of the training set. If wedefine the matrix S by

a future individual with measurement vector x will be assigned to group 1 if and only if

where a′ is the transpose of the vector a given by


The function a′Xis called Fisher's linear discriminant function. The terms 'discriminant analysis' and 'discriminant function' were coined by Sir Ronald Fisher in his articles in 1936 and 1938 that introduced theprocedure.

2. Logistic regression

In statistics, logistic regression (sometimes called the logistic model or logit model) is used for prediction of the probability of occurrence of an event byfitting data to a logit functionlogistic curve.

* Example
The application of a logistic regression may be illustrated using a fictitious example of death from heart disease. This simplifiedmodel uses only three risk factors (age, sex, and blood cholesterol level) to predict the 10-year risk of death from heart disease. This is the model that we fit:
β0 = − 5.0 (the intercept)
β1 = +2.0
β2 = − 1.0
β3 = + 1.2
x1 = age in years, less 50
x2 = sex, where 0 is male and 1 is female
x3 = cholesterol level, in mmol/L above 5.0
Which means the model is

In this model, increasing...
