Chapter Contents
Chapter Contents

PPS Sampling without Replacement

If you specify the option METHOD=PPS, PROC SURVEYSELECT selects units with probability proportional to size and without replacement. The selection probability for unit i in stratum h equals nh Zhi . The procedure uses the Hanurav-Vijayan algorithm for PPS selection without replacement. Hanurav (1967) introduced this algorithm for the selection of two units per stratum, and Vijayan (1968) generalized it for the selection of more than two units. The algorithm enables computation of joint selection probabilities and provides joint selection probability values that usually ensure nonnegativiy and stability of the Sen-Yates-Grundy variance estimator. Refer to Fox (1989), Golmant (1990), and Watts (1991) for details.

Notation in the remainder of this section drops the stratum subscript h for simplicity, but selection is still done independently within strata if you specify a stratified design. For a stratified design, n now denotes the sample size for the current stratum, N denotes the stratum population size, and Mi denotes the size measure for unit i in the stratum. If the design is not stratified, this notation applies to the entire sampling frame.

According to the Hanurav-Vijayan algorithm, PROC SURVEYSELECT first orders units within the stratum in ascending order by size measure, so that M_1 \leq M_2 \leq  ...  \leq M_N.Then the procedure selects the PPS sample of n observations as follows:

  1. The procedure randomly chooses one of the integers 1, 2, ... , n with probability \theta_1, \theta_2,  ... ,
\theta_n, where
    \theta_i = n  (Z_{N-n+i+1} - Z_{N-n+i})  
 (T + i  Z_{N-n+1})  /  T
    Zj = Mj / M,   T = \sum_{j=1}^{N-n} Z_j,  and, by definition, ZN+1 = 1/n to ensure that  \sum_{i=1}^n \theta_i = 1.
  2. If i is the integer selected in step 1, the procedure includes the last (n-i) units of the stratum in the sample, where the units are ordered by size measure as described previously. The procedure then selects the remaining i units according to steps 3 through 6 below.
  3. The procedure defines new normed size measures for the remaining (N-n+i) stratum units that were not selected in steps 1 and 2,
    Z_j^{\ast} &=& Z_j  /  (T + i  Z_{N-n+1}) 
 &{for}  j = 1,  ... , N-n+1 \ 
 Z_j^{\ast} &=& Z_{N-n+1}  /  (T + i  Z_{N-n+1}) &{for}  j = N-n+2,  ... , N-n+i
  4. The procedure selects the next unit from the first (N-n+1) stratum units with probability proportional to aj(1), where
    a_1(1) &=& iZ_1^{\ast} & \ 
a_j(1) &=& iZ_j^{\ast} \prod_{k=1}^{j-1} [1 - (i-1)P_k] &{for}j=2, ... ,N-n+1
    and Pk = Mk / ( Mk+1 + Mk+2 + ... + MN-n+i.
  5. If stratum unit j1 is the unit selected in step 4, then the procedure selects the next unit from units j1+1 through N-n+2 with probability proportional to aj(2,j1), where
    a_{j_1+1}(2,j_1)  = 
    a_j(2,j_1)  =  
 (i-1)Z_j^{\ast} \prod_{k=j_1+1}^{j-1} 
 [1 - (i-2)P_k]
  {for}j = j_1+2, ... ,N-n+2
  6. The procedure repeats step 5 until all n sample units are selected.

If you request the JTPROBS option, PROC SURVEYSELECT computes the joint selection probabilities for all pairs of selected units in each stratum. The joint selection probability for units i and j in the stratum equals

P_{(ij)}  =  
 \sum_{r=1}^n \theta_r K_{ij}^{(r)}
K_{ij}^{(r)} &=& 1 & N-n+r \lt i \leq N-1 \ 
 &=& rZ_{N-n+1}/(T + rZ_{N-n+1}) &
 ...+1}) & 
 1 \leq i \leq N-n,  j \gt N-n+r \ 
 &=& \pi_{ij}^{(r)} & j \leq N-n+r \
\pi_{ij}^{(r)} = \frac{r(r-1)}2  P_i  Z_j 
 \prod_{k=1}^{i-1} (1-P_k)
where Pk = Mk / ( Mk+1 + Mk+2 + ... + MN-n+r.

Chapter Contents
Chapter Contents

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.