Chapter Contents Previous Next
 The SURVEYREG Procedure

## Stratified Sampling

Suppose that the previous student sample is actually drawn from a stratified sampling. The strata are grades in the junior high school: the 7th grade, the 8th grade, and the 9th grade. Within strata, simple random samples are selected. Table 62.1 provides the number of students in each grade.

 Grade Number of Students 7 1,824 8 1,025 3 1,151 Total 4,000

In order to analyze this sample using PROC SURVEYREG, you need to input the stratification information by creating a SAS data set for Table 62.1. The following SAS statements create a data set called StudentTotal.

```   data StudentTotal;
datalines;
7 1824
8 1025
9 1151
;
```

The variable Grade is the stratification variable, and the variable _TOTAL_ contains the total numbers of students in the strata in the survey population. PROC SURVEYREG requires you to use the keyword _TOTAL_ as the name of the variable that contains the population total information.

The following statements demonstrate how you can fit the linear model while incorporating the sample design information (stratification).

```   title1 'Ice Cream Spending Analysis';
title2 'Stratified Simple Random Sampling Design';
proc surveyreg data=IceCream total=StudentTotal;
class Kids;
model Spending = Income Kids / solution;
run;
```

By comparing these statements to those in the section "Simple Random Sampling", the TOTAL=StudentTotal option replaces the previous TOTAL=4000 option. When the population totals and sample sizes differ among strata, the population totals must be provided by a data set.

The STRATA statement specifies the stratification variable Grade. The LIST option in the STRATA statement requests that the stratification information be included in the output.

 Ice Cream Spending Analysis Stratified Simple Random Sampling Design

 The SURVEYREG Procedure Regression Analysis for Dependent Variable Spending

 Data Summary Number of Observations 40 Mean of Spending 8.75000 Sum of Spending 350.00000

 Design Summary Number of Strata 3

 Fit Statistics R-square 0.8132 Root MSE 2.4506 Denominator DF 37
Figure 62.4: Summary of the Regression

Figure 62.4 summarizes the data information, the sample design information, and the fit information. Note that, due to the stratification, the denominator degrees of freedom for F tests and t tests is 37, which is different from the analysis in Figure 62.1.

 Ice Cream Spending Analysis Stratified Simple Random Sampling Design

 The SURVEYREG Procedure Regression Analysis for Dependent Variable Spending

 Stratum Information StratumIndex Grade N Obs Population Total SamplingRate 1 7 20 1824 0.01 2 8 9 1025 0.01 3 9 11 1151 0.01

 Class Level Information Class Variable Levels Values Kids 4 1 2 3 4
Figure 62.5: Stratification and Classification Information

Figure 62.5 displays the identifications of strata, numbers of observations or sample sizes in strata, total numbers of students in strata, and calculated sampling rates or sampling fractions in strata.

 Ice Cream Spending Analysis Stratified Simple Random Sampling Design

 The SURVEYREG Procedure Regression Analysis for Dependent Variable Spending

 ANOVA for Dependent Variable Spending Source DF Sum of Squares Mean Square F Value Pr > F Model 4 915.310 228.8274 38.10 <.0001 Error 35 210.190 6.0054 Corrected Total 39 1125.500

 Tests of Model Effects Effect Num DF F Value Pr > F Model 4 114.60 <.0001 Intercept 1 150.05 <.0001 Income 1 317.63 <.0001 Kids 3 0.93 0.4355

 NOTE: The denominator degrees of freedom for the F tests is 37.

Figure 62.6: Testing Effects

Figure 62.6 displays the ANOVA table for the regression and the tests for the significance of model effects under the stratified sample design. The income effect is significant, while the kids effect is not significant at the 5% level.

 Ice Cream Spending Analysis Stratified Simple Random Sampling Design

 The SURVEYREG Procedure Regression Analysis for Dependent Variable Spending

 Estimated Regression Coefficients Parameter Estimate Standard Error t Value Pr > |t| Intercept -26.084677 2.48241893 -10.51 <.0001 Income 0.775330 0.04350401 17.82 <.0001 Kids 1 0.897655 1.11778377 0.80 0.4271 Kids 2 1.494032 1.25209199 1.19 0.2404 Kids 3 -0.513181 1.36853454 -0.37 0.7098 Kids 4 0.000000 0.00000000 . .

 NOTE: The denominator degrees of freedom for the t tests is 37.Matrix X'X is singular and a generalized inverse was used to solve the normal equations. Estimates are not unique.

Figure 62.7: Regression Coefficients

The regression coefficient estimates for the stratified sample are displayed in Figure 62.7. The standard errors of the estimates and associated t tests are also shown in this table.

You can request other statistics and tests using PROC SURVEYREG. You can also analyze data from a more complex sample design. The remainder of this chapter provides more detailed information.

 Chapter Contents Previous Next Top