3.3 Categorical Distributions
In chapter I we discussed the difference between attributes, or qualitative data, and variables, or quantitative data. When some observable characteristic, of an elementary unit is not inherently numerical, we call that characteristic an attribute. Such a characteristics can only be observed as to its presence or absence. However, numbers can beassociated with an attribute by counting the elementary units that posses or lack the specific attribute. When classes in a frequency distribution are specific in terms of attributes rather than numerical values, the distribution is called a categorical frequency distribution or simply a categorical distribution. The number of observations that fall into each category provide the class frequencies.Categorical distributions are very common. Classification of registered voters by political party results in a categorical distribution. The classification of licensed drivers by sex, or the classification of accounts receivable as either current or delinquent, also results in categorical distributions. Suppose, as another illustration, that the Advance Machine Company is bidding for a governmentcontract. In order to quialify they are required to submit data on the racial distribution of their employees.
The personnel manager of the company conducts a survey the 200 employees of the company, which results in the categorical distribution shown in Table 3.12.
Note than in constructing a categorical distribution there are no real problems regarding class limits, classboundaries, or class marks. These characteristics of a distribution are essentially quantitative in nature and simply do not apply to categorical distributions. Care must be taken, however, to be sure that the categories are all inclusive and mutually exclusive. As with distributions an quantitative observations, “ all inclusive” means that every observation must not overlap. Categories such as“black” and “nonwhite” are not mutually exclusive and would create a problem in classifying the data.
The number of classes or categories in a categorical distribution cannot be governed by general rules. It is foolish to require at least five classes en the distribution when five pertinent categories of the attribute simply do not exist. The number of classes used must be related to the type ofinformation required and the purpose for which the data will be used. It is possible for the Advance Machine Company to classify its employees into only two categories, “white” and “nonwhite”, as shown in Table 3.13. Whether this classification satisfies the government requirements for information is questionable, however, and those requirements are the determining factor.
Thought not every pictureis worth a thousand words, most people agree that picture is generally easier to comprehend than a set of numbers in a table. And a picture in conjunction with a table is even more useful. The most common and useful picture or graph of a frequency distribution is the HISTOGRAM. A histogram is a graph in which the horizontal scale represents the frequencies or relative frequencies. Each class inthe distribution is represented by a vertical rectangle or bar whose base corresponds to the class interval and whose height is proportional to the frequency or relative frequency of class.
The vertical edges of each bar correspond to class boundaries and consequently there are no gaps between the bars. The numerical values on the horizontal scale can be placed to correspond witheither the class boundaries or the class marks.
Histograms corresponding to the frequency distributions of Tables 3.5 and 3.7 appear as Figures 3.2. The values on the horizontal scale in figure 3.1 correspond to class boundaries, while those in Figure 3.2 correspond to class marks.
Notice that as long as the class intervals in the frequency distributions are the same, the bars in...