Chapter Contents Previous Next
 Introduction to Survey Sampling and Analysis Procedures

Survey sampling is the process of selecting a probability-based sample from a finite population according to a sample design. You then collect data from these selected units and use them to estimate characteristics of the entire population.

A sample design encompasses the rules and operations by which you select sampling units from the population and the computation of sample statistics, which are estimates of the population values of interest. The objective of your survey often determines appropriate sample designs and valid data collection methodology. A complex sample design often includes stratification, clustering, multiple stages of selection, and unequal weighting.

For more detailed information, refer to Cochran (1977), Kalton (1983), Kish (1965), and Hansen, Hurwitz, and Madow (1953).

To select a sample with the SURVEYSELECT procedure and analyze your survey data with the SURVEYMEANS and SURVEYREG procedures, you need to specify sample design information to those procedures. This information includes design strata, clusters, and sampling weights.

#### Population

Population refers to the target population or group of individuals of interest for study. Often, the primary objective is to estimate certain characteristics of this population, called population values. A sampling unit is an element or an individual in the target population. A sample is a subset of the population that is selected for the study.

Before you use the survey procedures, you should have a well-defined target population, sampling units, and an appropriate sample design.

In order to select a sample according to your sample design, you need to have a list of sampling units in the population. This is called a sampling frame. PROC SURVEYSELECT selects a sample using this sampling frame.

#### Stratification

Stratified sampling involves selecting samples independently within strata, which are nonoverlapping subgroups of the survey population. Stratification controls the distribution of the sample size in the strata. It is widely used in practice to meet a variety of survey objectives. For example, with stratification you can ensure adequate sample sizes for subgroups of interest, including small subgroups, or you can use stratification to improve the precision of overall estimates. To improve precision, units within strata should be as homogeneous as possible for the characteristics of interest.

#### Clustering

Cluster sampling involves selecting clusters, which are groups of sampling units. For example, clusters may be schools, hospitals, or geographical areas, and sampling units may be students, patients, or citizens. Cluster sampling can provide efficiency in frame construction and other survey operations. However, it can also result in a loss in precision of your estimates, compared to a nonclustered sample of the same size. To minimize this effect, units within clusters should be as heterogeneous as possible for the characteristics of interest.

#### Multistage Sampling

In multistage sampling, you select an initial or first-stage sample based on groups of elements in the population, called primary sampling units or PSUs. Then you create a second-stage sample by drawing a subsample from each selected PSU in the first-stage sample. By repeating this operation, you can select a higher-stage sample.

If you include all the elements from a selected primary sampling unit, then the two-stage sampling is a cluster sampling.

#### Sampling Weights

Sampling weights, or survey weights, are positive values associated with each unit in your sample. Ideally, the weight of a sampling unit should be the "frequency" that the sampling unit represents in the target population. Therefore, the sum of the weights over the sample should estimate the population size N. If you normalize the weights such that the sum of the weights over the sample equals the population size N, then the weighted sum of a characteristic y estimates the population total value Y.

Often, sampling weights are the reciprocals of the selection probabilities for the sampling units. When you use PROC SURVEYSELECT, the procedure generates the sampling weight component for each stage of the design, and you can multiply these sampling weight components to obtain the final sampling weights. Sometimes, sampling weights also include nonresponse adjustments, post-sampling stratification, or regression adjustments using supplemental information.

When the sampling units have unequal weights, you must provide the weights to the survey analysis procedures. If you do not specify sampling weights, the procedures use equal weights in the analysis.

#### Population Totals and Sampling Rates

The ratio of the sample size (the number of sampling units in the sample) n and the population size (the total number of sampling units in the target population) N is written as
f = [n/N]
This ratio is called the sampling rate or the sampling fraction. If you select a sample without replacement, the extra efficiency compared to selecting a sample with replacement can be measured by the finite population correction (fpc) factor, (1-f). If your analysis should include a finite population correction factor, you can input either the sampling rate or the population total. Otherwise, the procedures do not use the fpc when computing variance estimates. For fairly small sampling fractions, it is appropriate to ignore this correction. Refer to Cochran (1977) and Kish (1965).

As stated in the section "Variance Estimation", for a multistage sample design, the variance estimation method depends only on the first stage of the sample design. Therefore, if you are specifying the sampling rate, you should input the first-stage sampling rate, which is the ratio of the number of PSUs in the sample to the total number of PSUs in the target population.

 Chapter Contents Previous Next Top