Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The REG Procedure

Example 55.1: Aerobic Fitness Prediction


Aerobic fitness (measured by the ability to consume oxygen) is fit to some simple exercise tests. The goal is to develop an equation to predict fitness based on the exercise tests rather than on expensive and cumbersome oxygen consumption measurements. Three model-selection methods are used: forward selection, backward selection, and MAXR selection. The following statements produce Output 55.1.1 through Output 55.1.5. (Collinearity diagnostics for the full model are shown in Figure 55.41.)

   *-------------------Data on Physical Fitness-------------------*
   | These measurements were made on men involved in a physical   |
   | fitness course at N.C.State Univ. The variables are Age      |
   | (years), Weight (kg), Oxygen intake rate (ml per kg body     |
   | weight per minute), time to run 1.5 miles (minutes), heart   |
   | rate while resting, heart rate while running (same time      |
   | Oxygen rate measured), and maximum heart rate recorded while |
   | running.                                                     |
   | ***Certain values of MaxPulse were changed for this analysis.|
   *--------------------------------------------------------------*;
   data fitness;
      input Age Weight Oxygen RunTime RestPulse RunPulse MaxPulse @@;
      datalines;
   44 89.47 44.609 11.37 62 178 182   40 75.07 45.313 10.07 62 185 185
   44 85.84 54.297  8.65 45 156 168   42 68.15 59.571  8.17 40 166 172
   38 89.02 49.874  9.22 55 178 180   47 77.45 44.811 11.63 58 176 176
   40 75.98 45.681 11.95 70 176 180   43 81.19 49.091 10.85 64 162 170
   44 81.42 39.442 13.08 63 174 176   38 81.87 60.055  8.63 48 170 186
   44 73.03 50.541 10.13 45 168 168   45 87.66 37.388 14.03 56 186 192
   45 66.45 44.754 11.12 51 176 176   47 79.15 47.273 10.60 47 162 164
   54 83.12 51.855 10.33 50 166 170   49 81.42 49.156  8.95 44 180 185
   51 69.63 40.836 10.95 57 168 172   51 77.91 46.672 10.00 48 162 168
   48 91.63 46.774 10.25 48 162 164   49 73.37 50.388 10.08 67 168 168
   57 73.37 39.407 12.63 58 174 176   54 79.38 46.080 11.17 62 156 165
   52 76.32 45.441  9.63 48 164 166   50 70.87 54.625  8.92 48 146 155
   51 67.25 45.118 11.08 48 172 172   54 91.63 39.203 12.88 44 168 172
   51 73.71 45.790 10.47 59 186 188   57 59.08 50.545  9.93 49 148 155
   49 76.32 48.673  9.40 56 186 188   48 61.24 47.920 11.50 52 170 176
   52 82.78 47.467 10.50 53 170 172
   ;
   proc reg data=fitness;
      model Oxygen=Age Weight RunTime RunPulse RestPulse MaxPulse
            / selection=forward;
      model Oxygen=Age Weight RunTime RunPulse RestPulse MaxPulse
            / selection=backward;
      model Oxygen=Age Weight RunTime RunPulse RestPulse MaxPulse
            / selection=maxr;
   run;

The FORWARD model-selection method begins with no variables in the model and adds RunTime, then Age,...

Output 55.1.1: Forward Selection Method: PROC REG
 
The REG Procedure
Model: MODEL1
Dependent Variable: Oxygen
Forward Selection: Step 1

 

Variable RunTime Entered: R-Square = 0.7434 and C(p) = 13.6988

Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 1 632.90010 632.90010 84.01 <.0001
Error 29 218.48144 7.53384    
Corrected Total 30 851.38154      
 
Variable Parameter
Estimate
Standard
Error
Type II SS F Value Pr > F
Intercept 82.42177 3.85530 3443.36654 457.05 <.0001
RunTime -3.31056 0.36119 632.90010 84.01 <.0001

Bounds on condition number: 1, 1

 

Forward Selection: Step 2

 

Variable Age Entered: R-Square = 0.7642 and C(p) = 12.3894

Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 2 650.66573 325.33287 45.38 <.0001
Error 28 200.71581 7.16842    
Corrected Total 30 851.38154      
 
Variable Parameter
Estimate
Standard
Error
Type II SS F Value Pr > F
Intercept 88.46229 5.37264 1943.41071 271.11 <.0001
Age -0.15037 0.09551 17.76563 2.48 0.1267
RunTime -3.20395 0.35877 571.67751 79.75 <.0001

Bounds on condition number: 1.0369, 4.1478


...then RunPulse, then MaxPulse,...

 

 

Forward Selection: Step 3

 

Variable RunPulse Entered: R-Square = 0.8111 and C(p) = 6.9596

Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 3 690.55086 230.18362 38.64 <.0001
Error 27 160.83069 5.95669    
Corrected Total 30 851.38154      
 
Variable Parameter
Estimate
Standard
Error
Type II SS F Value Pr > F
Intercept 111.71806 10.23509 709.69014 119.14 <.0001
Age -0.25640 0.09623 42.28867 7.10 0.0129
RunTime -2.82538 0.35828 370.43529 62.19 <.0001
RunPulse -0.13091 0.05059 39.88512 6.70 0.0154

Bounds on condition number: 1.3548, 11.597

 

Forward Selection: Step 4

 

Variable MaxPulse Entered: R-Square = 0.8368 and C(p) = 4.8800

Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 4 712.45153 178.11288 33.33 <.0001
Error 26 138.93002 5.34346    
Corrected Total 30 851.38154      
 
Variable Parameter
Estimate
Standard
Error
Type II SS F Value Pr > F
Intercept 98.14789 11.78569 370.57373 69.35 <.0001
Age -0.19773 0.09564 22.84231 4.27 0.0488
RunTime -2.76758 0.34054 352.93570 66.05 <.0001
RunPulse -0.34811 0.11750 46.90089 8.78 0.0064
MaxPulse 0.27051 0.13362 21.90067 4.10 0.0533

Bounds on condition number: 8.4182, 76.851


...and finally, Weight. The final variable available to add to the model, RestPulse, is not added since it does not meet the 50% (the default value of the SLE option is 0.5 for FORWARD selection) significance-level criterion for entry into the model.

 
Forward Selection: Step 5

 

Variable Weight Entered: R-Square = 0.8480 and C(p) = 5.1063

Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 5 721.97309 144.39462 27.90 <.0001
Error 25 129.40845 5.17634    
Corrected Total 30 851.38154      
 
Variable Parameter
Estimate
Standard
Error
Type II SS F Value Pr > F
Intercept 102.20428 11.97929 376.78935 72.79 <.0001
Age -0.21962 0.09550 27.37429 5.29 0.0301
Weight -0.07230 0.05331 9.52157 1.84 0.1871
RunTime -2.68252 0.34099 320.35968 61.89 <.0001
RunPulse -0.37340 0.11714 52.59624 10.16 0.0038
MaxPulse 0.30491 0.13394 26.82640 5.18 0.0316

Bounds on condition number: 8.7312, 104.83

No other variable met the 0.5000 significance level for entry into the model.

 

Summary of Forward Selection
Step Variable
Entered
Number
Vars In
Partial
R-Square
Model
R-Square
C(p) F Value Pr > F
1 RunTime 1 0.7434 0.7434 13.6988 84.01 <.0001
2 Age 2 0.0209 0.7642 12.3894 2.48 0.1267
3 RunPulse 3 0.0468 0.8111 6.9596 6.70 0.0154
4 MaxPulse 4 0.0257 0.8368 4.8800 4.10 0.0533
5 Weight 5 0.0112 0.8480 5.1063 1.84 0.1871

The BACKWARD model-selection method begins with the full model.

Output 55.1.2: Backward Selection Method: PROC REG
 
The REG Procedure
Model: MODEL2
Dependent Variable: Oxygen
Backward Elimination: Step 0

 

All Variables Entered: R-Square = 0.8487 and C(p) = 7.0000

Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 6 722.54361 120.42393 22.43 <.0001
Error 24 128.83794 5.36825    
Corrected Total 30 851.38154      
 
Variable Parameter
Estimate
Standard
Error
Type II SS F Value Pr > F
Intercept 102.93448 12.40326 369.72831 68.87 <.0001
Age -0.22697 0.09984 27.74577 5.17 0.0322
Weight -0.07418 0.05459 9.91059 1.85 0.1869
RunTime -2.62865 0.38456 250.82210 46.72 <.0001
RunPulse -0.36963 0.11985 51.05806 9.51 0.0051
RestPulse -0.02153 0.06605 0.57051 0.11 0.7473
MaxPulse 0.30322 0.13650 26.49142 4.93 0.0360

Bounds on condition number: 8.7438, 137.13


RestPulse is the first variable deleted,...

 
Backward Elimination: Step 1

 

Variable RestPulse Removed: R-Square = 0.8480 and C(p) = 5.1063

Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 5 721.97309 144.39462 27.90 <.0001
Error 25 129.40845 5.17634    
Corrected Total 30 851.38154      
 
Variable Parameter
Estimate
Standard
Error
Type II SS F Value Pr > F
Intercept 102.20428 11.97929 376.78935 72.79 <.0001
Age -0.21962 0.09550 27.37429 5.29 0.0301
Weight -0.07230 0.05331 9.52157 1.84 0.1871
RunTime -2.68252 0.34099 320.35968 61.89 <.0001
RunPulse -0.37340 0.11714 52.59624 10.16 0.0038
MaxPulse 0.30491 0.13394 26.82640 5.18 0.0316

Bounds on condition number: 8.7312, 104.83

...followed by Weight. No other variables are deleted from the model since the variables remaining (Age,RunTime, RunPulse, and MaxPulse) are all significant at the 10% (the default value of the SLS option is 0.1 for the BACKWARD elimination method) significance level.

 
Backward Elimination: Step 2

 

Variable Weight Removed: R-Square = 0.8368 and C(p) = 4.8800

Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 4 712.45153 178.11288 33.33 <.0001
Error 26 138.93002 5.34346    
Corrected Total 30 851.38154      
 
Variable Parameter
Estimate
Standard
Error
Type II SS F Value Pr > F
Intercept 98.14789 11.78569 370.57373 69.35 <.0001
Age -0.19773 0.09564 22.84231 4.27 0.0488
RunTime -2.76758 0.34054 352.93570 66.05 <.0001
RunPulse -0.34811 0.11750 46.90089 8.78 0.0064
MaxPulse 0.27051 0.13362 21.90067 4.10 0.0533

Bounds on condition number: 8.4182, 76.851

All variables left in the model are significant at the 0.1000 level.

 

Summary of Backward Elimination
Step Variable
Removed
Number
Vars In
Partial
R-Square
Model
R-Square
C(p) F Value Pr > F
1 RestPulse 5 0.0007 0.8480 5.1063 0.11 0.7473
2 Weight 4 0.0112 0.8368 4.8800 1.84 0.1871

The MAXR method tries to find the "best" one-variable model, the "best" two-variable model, and so on. For the fitness data, the one-variable model contains RunTime; the two-variable model contains RunTime and Age...

Output 55.1.3: Maximum R-Square Improvement Selection Method: PROC REG
 
The REG Procedure
Model: MODEL3
Dependent Variable: Oxygen
Maximum R-Square Improvement: Step 1

 

Variable RunTime Entered: R-Square = 0.7434 and C(p) = 13.6988

Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 1 632.90010 632.90010 84.01 <.0001
Error 29 218.48144 7.53384    
Corrected Total 30 851.38154      
 
Variable Parameter
Estimate
Standard
Error
Type II SS F Value Pr > F
Intercept 82.42177 3.85530 3443.36654 457.05 <.0001
RunTime -3.31056 0.36119 632.90010 84.01 <.0001

Bounds on condition number: 1, 1

The above model is the best 1-variable model found.

 

Maximum R-Square Improvement: Step 2

 

Variable Age Entered: R-Square = 0.7642 and C(p) = 12.3894

Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 2 650.66573 325.33287 45.38 <.0001
Error 28 200.71581 7.16842    
Corrected Total 30 851.38154      
 
Variable Parameter
Estimate
Standard
Error
Type II SS F Value Pr > F
Intercept 88.46229 5.37264 1943.41071 271.11 <.0001
Age -0.15037 0.09551 17.76563 2.48 0.1267
RunTime -3.20395 0.35877 571.67751 79.75 <.0001

Bounds on condition number: 1.0369, 4.1478

The above model is the best 2-variable model found.

...the three-variable model contains RunTime, Age, and RunPulse; the four-variable model contains Age, RunTime, RunPulse, and MaxPulse...

 

Maximum R-Square Improvement: Step 3

 

Variable RunPulse Entered: R-Square = 0.8111 and C(p) = 6.9596

Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 3 690.55086 230.18362 38.64 <.0001
Error 27 160.83069 5.95669    
Corrected Total 30 851.38154      
 
Variable Parameter
Estimate
Standard
Error
Type II SS F Value Pr > F
Intercept 111.71806 10.23509 709.69014 119.14 <.0001
Age -0.25640 0.09623 42.28867 7.10 0.0129
RunTime -2.82538 0.35828 370.43529 62.19 <.0001
RunPulse -0.13091 0.05059 39.88512 6.70 0.0154

Bounds on condition number: 1.3548, 11.597

The above model is the best 3-variable model found.

 

Maximum R-Square Improvement: Step 4

 

Variable MaxPulse Entered: R-Square = 0.8368 and C(p) = 4.8800

Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 4 712.45153 178.11288 33.33 <.0001
Error 26 138.93002 5.34346    
Corrected Total 30 851.38154      
 
Variable Parameter
Estimate
Standard
Error
Type II SS F Value Pr > F
Intercept 98.14789 11.78569 370.57373 69.35 <.0001
Age -0.19773 0.09564 22.84231 4.27 0.0488
RunTime -2.76758 0.34054 352.93570 66.05 <.0001
RunPulse -0.34811 0.11750 46.90089 8.78 0.0064
MaxPulse 0.27051 0.13362 21.90067 4.10 0.0533

Bounds on condition number: 8.4182, 76.851

The above model is the best 4-variable model found.

...the five-variable model contains Age, Weight, RunTime, RunPulse, and MaxPulse; and finally, the six-variable model contains all the variables in the MODEL statement.

 

Maximum R-Square Improvement: Step 5

 

Variable Weight Entered: R-Square = 0.8480 and C(p) = 5.1063

Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 5 721.97309 144.39462 27.90 <.0001
Error 25