Page 1 of 18
Statistical Methods in Consumer Credit Scoring
Introduction Credit scoring is the term used to describe formal statistical (data analytical!) methods used for classifying applicants for credit into ‘good’ and ‘bad’ risk classes. Such methods have become increasingly important with the dramatic growth in consumer credit in recent years. New Types of Data A traditional view ofstatistics might be that it deals with relatively small data sets, which are numerical, which are clean, and which permit straightforward enough answers. This view is understandable if one’s perspective is gained from texts on the subject. By necessity, as a result of the limited space available in which to develop the techniques, these are characteristics of the data sets typically used to illustratemethods, but this is misleading. Modern statistics and, more generally, modern data analysis, must contend with data which depart from this ideal in many ways. (Hand, 1998). Modern electronic data capture methods mean that vast databases are being accrued. These may contain tens of millions or even billions of records. Examples of this sort of thing are the transaction databases from modernindustrial and commercial conglomerates and may be on an international scale. Electronic point-of-sale data capture means that even the slightest sale goes recorded and is available for analysis. To take advantage of this rich source of data entirely novel techniques are required. They are slowly appearing—under the name of data mining.
Page 2 of 18
WHAT IS CREDIT SCORING? Credit scoring is a method ofevaluating the credit risk of loan applications. Using historical data and statistical techniques, credit scoring tries to isolate the effects of various applicant characteristics on delinquencies and defaults. The method produces a “score” that a bank can use to rank its loan applicants or borrowers in terms of risk. To build a scoring model, or “scorecard,” developers analyze historical data onthe performance of previously made loans to determine which borrower characteristics are useful in predicting whether the loan performed well. A welldesigned model should give a higher percentage of high scores to borrowers whose loans will perform well and a higher percentage of low scores to borrowers whose loans won’t perform well. But no model is perfect, and some bad accounts will receivehigher scores than some good accounts. Information on borrowers is obtained from their loan applications. Data such as the applicant’s monthly income, outstanding debt, financial assets, how long the applicant has been in the same job, whether the applicant has defaulted or was ever delinquent on a previous loan, whether the applicant owns or rents a home, and the type of bank account the applicanthas are all potential factors that may relate to loan performance and may end up being used in the scorecard. Regression analysis relating loan performance to these variables is used to pick out which combination of factors best predicts delinquency or default, and how much weight should be given to each of the factors. (See Scoring Methods for a overview of the statistical methods being used.)Given the correlations between the factors, it is quite possible some of the factors the model developer
Page 3 of 18
begins with won’t make it into the final model, since they have little value added given the other variables in the model. Indeed, according to Fair, Isaac and Company, Inc., a leading developer of scoring models, 50 or 60 variables might be considered when developing a typicalmodel, but eight to 12 might end up in the final scorecard as yielding the most predictive combination (Fair, Isaac). In most (but not all) scoring systems, a higher score indicates lower risk, and a lender sets a cutoff score based on the amount of risk it is willing to accept. Strictly adhering to the model, the lender would approve applicants with scores above the cutoff and deny applicants...
Leer documento completo
Regístrate para leer el documento completo.