Chapter Contents Previous Next
 The SURVEYREG Procedure

## Example 62.5: Regression Estimator for Stratified Sample

This example uses the corn yield data from the previous example to illustrate how to construct a regression estimator for a stratified sample design.

Similar to Example 62.3, by incorporating auxilary information into a regression estimator, the procedure can produce more accurate estimates of the population characteristics that are of interest. In this example, the sample design is a stratified sampling design. The auxilary information is the total farm areas in regions of each state, as displayed in Table 62.4. You want to estimate the total corn yield using this information under the three linear models given in Example 62.4.

Table 62.4: Information for Each Stratum
 Number of Farms in Stratum State Region Population Sample Total Farm Area 1 Iowa 1 100 3 2 2 50 5 13,200 3 3 15 3 4 Nebraska 1 30 6 8,750 5 2 40 2 Total 235 19 21,950

The regression estimator to estimate the total corn yield under Model I can be obtained by using PROC SURVEYREG with an ESTIMATE statement.

```   title1 'Estimate Corn Yield from Farm Size';
title2 'Model I: Same Intercept and Slope';
proc surveyreg data=Farms total=TotalInStrata;
strata State Region / list;
class  State Region;
model  CornYield = FarmArea State*Region /solution;
weight Weight;
estimate 'Estimate of CornYield under Model I'
INTERCEPT 235 FarmArea 21950
State*Region 100 50 15 30 40 /e;
run;
```

To apply the contraint in each stratum that the weighted total number of farms equals to the total number of farms in the stratum, you can include the strata as an effect in the MODEL statement, effect State*Region. Thus, the CLASS statement must list the STRATA variables, State and Region, as classification variables. The following ESTIMATE statement specifies the regression estimator, which is a linear function of the regression parameters.

```      estimate 'Estimate of CornYield under Model I'
INTERCEPT 235 FarmArea 21950
State*Region 100 50 15 30 40 /e;
```

This linear function contains the total for each explanatory variable in the model. Because the sampling units are farms in this example, the coefficient for Intercept in the ESTIMATE statement is the total number of farms (235); the coefficient for FarmArea is the total farm area listed in Table 62.4 (21950); and the coefficients for effect State*Region are the total number of farms in each strata (as displayed in Table 62.4).

Output 62.5.1: Regression Estimator for the Total of CornYield under Model I

 Estimate Corn Yield from Farm Size Model I: Same Intercept and Slope

 The SURVEYREG Procedure Regression Analysis for Dependent Variable CornYield

 Analysis of Estimable Functions Parameter Estimate Standard Error t Value Pr > |t| Estimate of CornYield under Model I 7463.52329 926.841541 8.05 <.0001

 NOTE: The denominator degrees of freedom for the t tests is 14.

Output 62.5.1 displays the results of the ESTIMATE statement. The regression estimator for the total of CornYield in Iowa and Nebraska is 7464 under Model I, with a standard error of 927.

Under Model II, a regression estimator for totals can be obtained using the following statements.

```   title1 'Estimate Corn Yield from Farm Size';
title2 'Model II: Same Intercept, Different Slopes';
proc surveyreg data=FarmsByState total=TotalInStrata;
strata State Region;
class  State Region;
model  CornYield = FarmAreaIA FarmAreaNE
state*region /solution;
weight Weight;
estimate 'Total of CornYield under Model II'
INTERCEPT 235 FarmAreaIA 13200 FarmAreaNE 8750
State*Region 100 50 15 30 40 /e;
run;
```

In this model, you also need to include strata as a fixed effect in the MODEL statement. Other regressors are the auxiliary variables FarmAreaIA and FarmAreaNE (defined in Example 62.4). In the following ESTIMATE statement, the coefficient for Intercept is still the total number of farms; and the coefficients for FarmAreaIA and FarmAreaNE are the total farm area in Iowa and Nebraska, respectively, as displayed in Table 62.4. The total number of farms in each strata are the coefficients for the strata effect.

```      estimate 'Total of CornYield under Model II'
INTERCEPT 235 FarmAreaIA 13200 FarmAreaNE 8750
State*Region 100 50 15 30 40 /e;
```

Output 62.5.2: Regression Estimator for the Total of CornYield under Model II

 Estimate Corn Yield from Farm Size Model II: Same Intercept, Different Slopes

 The SURVEYREG Procedure Regression Analysis for Dependent Variable CornYield

 Analysis of Estimable Functions Parameter Estimate Standard Error t Value Pr > |t| Total of CornYield under Model II 7580.48657 859.180439 8.82 <.0001

 NOTE: The denominator degrees of freedom for the t tests is 14.

Output 62.5.2 displays that the results of the regression estimator for the total of corn yield in two states under Model II is 7580 with a standard error of 859. The regression estimator under Model II has a slightly smaller standard error than under Model I.

Finally, you can apply Model III to the data and estimate the total corn yield. Under Model III, you can also obtain the regression estimators for the total corn yield for each state. Three ESTIMATE statements are used in the following statements to create the three regression estimators.

```   title1 'Estimate Corn Yield from Farm Size';
title2 'Model III: Different Intercepts and Slopes';
proc SURVEYREG data=FarmsByState total=TotalInStrata;
strata State Region;
class  State Region;
model  CornYield = state FarmAreaIA FarmAreaNE
State*Region /noint solution;
weight Weight;
estimate 'Total CornYield in Iowa under Model III'
State 165 0 FarmAreaIA 13200 FarmAreaNE  0
State*region 100 50 15  0  0 /e;
estimate 'Total CornYield in Nebraska under Model III'
State 0 70 FarmAreaIA 0 FarmAreaNE 8750
State*Region 0 0 0 30 40 /e;
estimate 'Total CornYield in both states under Model III'
State 165 70 FarmAreaIA 13200 FarmAreaNE 8750
State*Region 100 50 15 30 40 /e;
run;
```

The fixed effect State is added to the MODEL statement to obtain different intercepts in different states, using the NOINT option. Among the ESTIMATE statements, the coefficients for explanatory variables are different depending on which regression estimator is estimated. For example, in the ESTIMATE statement

```      estimate 'Total CornYield in Iowa under Model III'
State 165 0 FarmAreaIA 13200 FarmAreaNE  0
State*region 100 50 15  0  0 /e;
```
the coefficients for the effect State are 165 and 0, respectively. This indicates that the total number of farms in Iowa is 165 and the total number of farms in Nebraska is 0, because the estimation is the total corn yield in Iowa only. Similarly, the total numbers of farms in three regions in Iowa are used for the coefficients of the strata effect State*Region, as displayed in Table 62.4.

Output 62.5.3: Regression Estimator for the Total of CornYield under Model III

 Estimate Corn Yield from Farm Size Model III: Different Intercepts and Slopes

 The SURVEYREG Procedure Regression Analysis for Dependent Variable CornYield

 Analysis of Estimable Functions Parameter Estimate Standard Error t Value Pr > |t| Total CornYield in Iowa under Model III 6246.10697 851.272372 7.34 <.0001 Total CornYield in Nebraska under Model III 1334.37961 116.302948 11.47 <.0001 Total CornYield in both states under Model III 7580.48657 859.180439 8.82 <.0001

 NOTE: The denominator degrees of freedom for the t tests is 14.

Output 62.5.3 displays the results from the three regression estimators using Model III. Since the estimations are independent in each state, the total corn yield from both states is equal to the sum of the estimated total of corn yield in Iowa and Nebraska, 6246 + 1334 = 7580. This regression estimator is the same as the one under Model II. The variance of regression estimator of the total corn yield in both states is the sum of variances of regression estimators for total corn yield in each state. Therefore, it is not necessary to use Model III to obtain the regression estimator for the total corn yield unless you need to estimate the total corn yield for each individual state.

 Chapter Contents Previous Next Top