Chapter Contents |
Previous |
Next |

Introduction to Survey Sampling and Analysis
Procedures |

This section demonstrates how you can use the survey procedures to select a probability-based sample, compute descriptive statistics from the sample, perform regression analysis, and make inferences about income and expenditures of a group of households in North Carolina and South Carolina. The goals of the survey are to

- estimate total income and total basic living expenses
- investigate the linear relationship between income and living expenses

In this example, the sample design is a stratified simple random sampling design, with households as the sampling units. The sampling frame (the list of the group of the households) is stratified by State and Region. Within strata, households are selected by simple random sampling. Using this design, the following PROC SURVEYSELECT statements select a probability sample of households from the HHSample data set.

proc surveyselect data=HHSample out=Sample method=srs n=(3, 5, 3, 6, 2); strata State Region; run;

The STRATA statement names the stratification variables State and Region. In the PROC SURVEYSELECT statement, the DATA= option names the SAS data set HHSample as the input data set (the sampling frame) from which to select the sample. The OUT= option stores the sample in the SAS data set named Sample. The METHOD=SRS option specifies simple random sampling as the sample selection method. The N= option specifies the stratum sample sizes.

The SURVEYSELECT procedure then selects a stratified random sample of households and produces the output data set Sample, which contains the selected households together with their selection probabilities and sampling weights. The data set Sample also contains the sampling unit identification variable Id and the stratification variables State and Region from the data set HHSample.

To estimate the total income and expenditure in the population from the sample, you specify the input data set containing the sample, the statistics to be computed, the variables to be analyzed, and any stratification variables. The statements to compute the descriptive statistics are as follows:

proc surveymeans data=Sample sum clm; var Income Expense; strata State Region; weight Weight; run;

The PROC SURVEYMEANS statement invokes the procedure, specifies the input data set, and requests estimates of population totals and their standard deviations for the analysis variables (SUM), and confidence limits for the estimates (CLM).

The VAR statement specifies the two analysis variables, Income and Expense. The STRATA statement identifies State and Region as the stratification variables in the sample design. The WEIGHT statement specifies the sampling weight variable Weight.

You can also use the SURVEYREG procedure to perform regression analysis for sample survey data. Suppose that, in order to explore the relationship between the total income and the total basic living expenses of a household in the survey population, you choose the following linear model to describe the relationship.

The following statements fit this linear model.

proc surveyreg data=Sample; strata State Region ; model Expense = Income; weight Weight; run;

In the PROC SURVEYREG statement, the DATA= option specifies the input sample survey data as Sample. The STRATA statement identifies the stratification variables as State and Region . The MODEL statement specifies the model, with Expense as the dependent variable and Income as the independent variable. The WEIGHT statement specifies the sampling weight variable Weight.

Chapter Contents |
Previous |
Next |
Top |

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.