Unequal probability sampling is when some units in the population have probabilities of being selected from others. This handout introduces the Hansen-Hurwitz (H-H) estimator and Horvitz-Thompson (H-T) estimator, examines the properties of both types of estimators for the population total and mean, and compares the two estimators by way of an example. TheHansen-Hurwitz (H-H) Estimator for random sampling with replacement. • Suppose a sample of size n is selected randomly with replacement from a population
but that on each draw, unit i has probability pi of being selected, where
pi = 1.
The probability pi here is called the selection probability for the ith unit. Let yi be the response variable measured on each unit selected. Notethat if a unit is selected more than once, it is used as many times as it is selected. An unbiased estimator of
the population total τ =
yi is given by: τp = 1 n yi . n i=1 pi
An unbiased estimator of the population mean is µp = (1/N )τp . • Dividing by pi gives higher weight to units less likely to be selected. • What happens to this estimator if pi = 1/N, i = 1, . . . , N , sothat each unit has an equal chance of selection?
Example: Consider a population of size N = 3, with values and corresponding selection probabilities given in the ﬁrst two columns of the table to the right. Note that the true population total is τ = 14. Consider taking a sample of size 1. The H-H estimates of the total for each of the 3 values (samples) are given in the third column of thetable.
Values Probabilities y1 = 3 p1 = .2 y2 = 2 p2 = .5 p3 = .3 y3 = 9
τp 15 4 30
The expected value of τp is: E(τp ) = .2(15) + .5(4) + .3(30) = 14 = τ . • So, in τp = yi 1 n yi , each is unbiased for τ . n i=1 pi pi
Why would you want select units with unequal probabilities? • It may be the most convenient way to sample. Recall the example of taking sample of ponds by selecting a randompoint on a map. If the point lands in a pond then that pond is selected for the sample. It would require a lot more eﬀort to enumerate all the ponds so that an SRS could be selected. See also the farm example below.
• If the response variable is positively correlated with the selection probability, then the Hansen-Hurwitz estimator can have lower variance than the estimator based on anSRS.
Properties of the Hansen-Hurwitz Estimator E(τp ) =
1 n yi 1 Var(τp ) = Var = 2 n i=1 pi n 1 = n2
(indep. due to sampling with replacement) yj −τ pj
yj −τ pj
1 N = pj n j=1
where τ is unknown, so we need to estimate it in this variance. An unbiased estimate of the variance can be computed as:
n 1 yj Var(τp ) = − τp n(n −1) j=1 pj 2
Note that the properties of µp = (1/N )τp follow easily: E(µp ) = (1/N )τ = µ (unbiased), Var(µp ) = (1/N )2 Var(τp ), Var(µp ) = (1/N )2 Var(τp ). Notes on the Hansen-Hurwitz Estimator 1. We only need pi for the units in the sample (not the whole population). 2. We need not know N in order to estimate τ . 1 n 1 3. If we let yi = 1, i = 1, . . . , N , then τ = N and τp = = N is anestimator of N . n i=1 pi 4. If there is low variability between the values of yj /pj , then the H-H estimator will have low variance, with the extreme case being when yj and pj are exactly proportional to each other. On the other hand, the H-H estimator will have high variance when there is high variability among the values of yj /pj . Example: Consider a population of farms on a 25x25 grid ofvarying sizes and shapes, as given on the last page of this handout. If we randomly select a single square on this grid, then letting xi = the area of farm i and A = 625 total units, the probability that farm i is xi xi = . selected is: pi = A 625 Let yi = the response variable of interest (which might be xi ).
• If yi = xi , then τ =
yi = the total area of all farms. In this...