Csselect

Páginas: 13 (3121 palabras) Publicado: 3 de junio de 2012
CSSELECT
This document describes the algorithm used by CSSELECT to draw samples according to
complex designs. The data file does not have to be sorted. Population units can appear more
than once in the data file and they do not have to be in a consecutive block of cases.

Notation
The following notation is used throughout this chapter unless otherwise stated:
Population size

N
n
fSample size

hi

Hit counts of i-th population unit. (i=1,...,N)

Mi

Size measure of i-th population unit. (i=1,...,N)

Sampling fraction

M

N

Total size.

M = ∑ Mi
i =1

pi

pi =

Mi
M

is the relative size of i-th population unit (i=1,...,N)

Stratification
Stratification partitions the sampling frame into disjoint sets. Sampling is carried out
independentlywithin each stratum. Therefore, without loss of generality, the algorithm
described in this document only considers sampling from one population.
In the first stage of selection, the sampling frame is partitioned by the stratification variables
specified in stage 1. In the second stage, the sampling frame is stratified by first-stage strata
and cluster variables as well as strata variablesspecified in stage 2. If sampling with
replacement is used in the first stage, the first-stage duplication index is also one of the
stratification variables. Stratification of the third stage continues in a like manner.

Population Size
Sampling units in a population are identified by all unique level combinations of cluster
variables within a stratum. Therefore, the population size N of astratum is equal to the
number of unique level combinations of the cluster variables within a stratum. When a
sampling unit is selected, all cases having the same sampling unit identifier are included in
the sample. If no cluster variable is defined, each case is a sampling unit.

Sample Size
CSSELECT uses a fixed sample size approach in selecting samples. If the sample size n is
supplied by theuser, it should satisfy 0 ≤ n ≤ N for any without replacement design and
n ≥ 0 for any with replacement design.
If a sampling fraction f is specified, it should satisfy 0 < f ≤ 1 for any without
replacement design and f > 0 for any with replacement design. The actual sample size is
determined by the formula n = round (f * N ) . When the option RATEMINSIZE is
specified, a sample size lessthan RATEMINSIZE is raised to RATEMINSIZE. Likewise, a
sample size exceeding RATEMAXSIZE is lowered to RATEMAXSIZE.

Simple Random Sampling
This algorithm selects n distinct units out of N population units with equal probability; see
Fan, Muller & Rezucha (1962) for more information.


Inclusion probability of i-th unit = n/N



Sampling weight of i-th = N/n

1.

If f is supplied,compute n=round(f*N).

2.

Set k=0, i=0 and start data scan.

3.

Get a population unit and set k=k+1. If no more population units left, terminate.

4.

Test if k-th unit should go into the sample

Algorithm

a)

Generate a uniform (0,1) random number U

b) If
c)

(n − i) /( N − k + 1) > U , k-th population unit is selected and set i=i+1.

If i=n, terminate. Otherwise, goto step 3.

Unrestricted Random Sampling
This algorithm selects n units out of N population units with equal probability and with
replacement.


Inclusion probability of i-th unit = 1-(1-1/N)n



Sampling weight of i-th = N/n. (For use with Hansen-Hurwitz(1943) estimator)



Expected number of hits of i-th = n/N

1.

Set i=0 and initialize all hit counts to zero.

2.Generate an integer k between 1 and N uniformly.

3.

Increase hit count of k-th population unit by 1.

4.

Set i=i+1.

5.

If i=n, then terminate. Otherwise go to step 2.

Algorithm

At the end of the procedure, population units with hit count greater than zero are selected.

Systematic Sampling
This algorithm selects n distinct units out of N population units. If the selection...
Leer documento completo

Regístrate para leer el documento completo.

Conviértase en miembro formal de Buenas Tareas

INSCRÍBETE - ES GRATIS