Example 55.1: Aerobic Fitness Prediction
Aerobic fitness (measured by the ability to consume
oxygen) is fit to some simple exercise tests.
The goal is to develop an equation to predict fitness
based on the exercise tests rather than on expensive
and cumbersome oxygen consumption measurements.
Three model-selection methods are used: forward
selection, backward selection, and MAXR selection.
The following statements produce
Output 55.1.1 through Output 55.1.5.
(Collinearity diagnostics for the full model are
shown in Figure 55.41.)
*-------------------Data on Physical Fitness-------------------*
| These measurements were made on men involved in a physical |
| fitness course at N.C.State Univ. The variables are Age |
| (years), Weight (kg), Oxygen intake rate (ml per kg body |
| weight per minute), time to run 1.5 miles (minutes), heart |
| rate while resting, heart rate while running (same time |
| Oxygen rate measured), and maximum heart rate recorded while |
| running. |
| ***Certain values of MaxPulse were changed for this analysis.|
*--------------------------------------------------------------*;
data fitness;
input Age Weight Oxygen RunTime RestPulse RunPulse MaxPulse @@;
datalines;
44 89.47 44.609 11.37 62 178 182 40 75.07 45.313 10.07 62 185 185
44 85.84 54.297 8.65 45 156 168 42 68.15 59.571 8.17 40 166 172
38 89.02 49.874 9.22 55 178 180 47 77.45 44.811 11.63 58 176 176
40 75.98 45.681 11.95 70 176 180 43 81.19 49.091 10.85 64 162 170
44 81.42 39.442 13.08 63 174 176 38 81.87 60.055 8.63 48 170 186
44 73.03 50.541 10.13 45 168 168 45 87.66 37.388 14.03 56 186 192
45 66.45 44.754 11.12 51 176 176 47 79.15 47.273 10.60 47 162 164
54 83.12 51.855 10.33 50 166 170 49 81.42 49.156 8.95 44 180 185
51 69.63 40.836 10.95 57 168 172 51 77.91 46.672 10.00 48 162 168
48 91.63 46.774 10.25 48 162 164 49 73.37 50.388 10.08 67 168 168
57 73.37 39.407 12.63 58 174 176 54 79.38 46.080 11.17 62 156 165
52 76.32 45.441 9.63 48 164 166 50 70.87 54.625 8.92 48 146 155
51 67.25 45.118 11.08 48 172 172 54 91.63 39.203 12.88 44 168 172
51 73.71 45.790 10.47 59 186 188 57 59.08 50.545 9.93 49 148 155
49 76.32 48.673 9.40 56 186 188 48 61.24 47.920 11.50 52 170 176
52 82.78 47.467 10.50 53 170 172
;
proc reg data=fitness;
model Oxygen=Age Weight RunTime RunPulse RestPulse MaxPulse
/ selection=forward;
model Oxygen=Age Weight RunTime RunPulse RestPulse MaxPulse
/ selection=backward;
model Oxygen=Age Weight RunTime RunPulse RestPulse MaxPulse
/ selection=maxr;
run;
The FORWARD model-selection method begins with no
variables in the model and adds RunTime, then Age,...
Output 55.1.1: Forward Selection Method: PROC REG
| The REG Procedure |
| Model: MODEL1 |
| Dependent Variable: Oxygen |
| Forward Selection: Step 1 |
| Variable RunTime Entered: R-Square = 0.7434 and C(p) = 13.6988 |
| Analysis of Variance |
| Source |
DF |
Sum of Squares |
Mean Square |
F Value |
Pr > F |
| Model |
1 |
632.90010 |
632.90010 |
84.01 |
<.0001 |
| Error |
29 |
218.48144 |
7.53384 |
|
|
| Corrected Total |
30 |
851.38154 |
|
|
|
| Variable |
Parameter Estimate |
Standard Error |
Type II SS |
F Value |
Pr > F |
| Intercept |
82.42177 |
3.85530 |
3443.36654 |
457.05 |
<.0001 |
| RunTime |
-3.31056 |
0.36119 |
632.90010 |
84.01 |
<.0001 |
| Bounds on condition number: 1, 1 |
| Forward Selection: Step 2 |
| Variable Age Entered: R-Square = 0.7642 and C(p) = 12.3894 |
| Analysis of Variance |
| Source |
DF |
Sum of Squares |
Mean Square |
F Value |
Pr > F |
| Model |
2 |
650.66573 |
325.33287 |
45.38 |
<.0001 |
| Error |
28 |
200.71581 |
7.16842 |
|
|
| Corrected Total |
30 |
851.38154 |
|
|
|
| Variable |
Parameter Estimate |
Standard Error |
Type II SS |
F Value |
Pr > F |
| Intercept |
88.46229 |
5.37264 |
1943.41071 |
271.11 |
<.0001 |
| Age |
-0.15037 |
0.09551 |
17.76563 |
2.48 |
0.1267 |
| RunTime |
-3.20395 |
0.35877 |
571.67751 |
79.75 |
<.0001 |
| Bounds on condition number: 1.0369, 4.1478 |
|
...then RunPulse, then MaxPulse,...
|
| Forward Selection: Step 3 |
| Variable RunPulse Entered: R-Square = 0.8111 and C(p) = 6.9596 |
| Analysis of Variance |
| Source |
DF |
Sum of Squares |
Mean Square |
F Value |
Pr > F |
| Model |
3 |
690.55086 |
230.18362 |
38.64 |
<.0001 |
| Error |
27 |
160.83069 |
5.95669 |
|
|
| Corrected Total |
30 |
851.38154 |
|
|
|
| Variable |
Parameter Estimate |
Standard Error |
Type II SS |
F Value |
Pr > F |
| Intercept |
111.71806 |
10.23509 |
709.69014 |
119.14 |
<.0001 |
| Age |
-0.25640 |
0.09623 |
42.28867 |
7.10 |
0.0129 |
| RunTime |
-2.82538 |
0.35828 |
370.43529 |
62.19 |
<.0001 |
| RunPulse |
-0.13091 |
0.05059 |
39.88512 |
6.70 |
0.0154 |
| Bounds on condition number: 1.3548, 11.597 |
| Forward Selection: Step 4 |
| Variable MaxPulse Entered: R-Square = 0.8368 and C(p) = 4.8800 |
| Analysis of Variance |
| Source |
DF |
Sum of Squares |
Mean Square |
F Value |
Pr > F |
| Model |
4 |
712.45153 |
178.11288 |
33.33 |
<.0001 |
| Error |
26 |
138.93002 |
5.34346 |
|
|
| Corrected Total |
30 |
851.38154 |
|
|
|
| Variable |
Parameter Estimate |
Standard Error |
Type II SS |
F Value |
Pr > F |
| Intercept |
98.14789 |
11.78569 |
370.57373 |
69.35 |
<.0001 |
| Age |
-0.19773 |
0.09564 |
22.84231 |
4.27 |
0.0488 |
| RunTime |
-2.76758 |
0.34054 |
352.93570 |
66.05 |
<.0001 |
| RunPulse |
-0.34811 |
0.11750 |
46.90089 |
8.78 |
0.0064 |
| MaxPulse |
0.27051 |
0.13362 |
21.90067 |
4.10 |
0.0533 |
| Bounds on condition number: 8.4182, 76.851 |
|
...and finally, Weight.
The final variable available to add to the model,
RestPulse, is not added since it does not meet the 50% (the
default value of the SLE option is 0.5 for FORWARD selection)
significance-level criterion for entry into the model.
| Forward Selection: Step 5 |
| Variable Weight Entered: R-Square = 0.8480 and C(p) = 5.1063 |
| Analysis of Variance |
| Source |
DF |
Sum of Squares |
Mean Square |
F Value |
Pr > F |
| Model |
5 |
721.97309 |
144.39462 |
27.90 |
<.0001 |
| Error |
25 |
129.40845 |
5.17634 |
|
|
| Corrected Total |
30 |
851.38154 |
|
|
|
| Variable |
Parameter Estimate |
Standard Error |
Type II SS |
F Value |
Pr > F |
| Intercept |
102.20428 |
11.97929 |
376.78935 |
72.79 |
<.0001 |
| Age |
-0.21962 |
0.09550 |
27.37429 |
5.29 |
0.0301 |
| Weight |
-0.07230 |
0.05331 |
9.52157 |
1.84 |
0.1871 |
| RunTime |
-2.68252 |
0.34099 |
320.35968 |
61.89 |
<.0001 |
| RunPulse |
-0.37340 |
0.11714 |
52.59624 |
10.16 |
0.0038 |
| MaxPulse |
0.30491 |
0.13394 |
26.82640 |
5.18 |
0.0316 |
| Bounds on condition number: 8.7312, 104.83 |
| No other variable met the 0.5000 significance level for entry into the model. |
| Summary of Forward Selection |
| Step |
Variable Entered |
Number Vars In |
Partial R-Square |
Model R-Square |
C(p) |
F Value |
Pr > F |
| 1 |
RunTime |
1 |
0.7434 |
0.7434 |
13.6988 |
84.01 |
<.0001 |
| 2 |
Age |
2 |
0.0209 |
0.7642 |
12.3894 |
2.48 |
0.1267 |
| 3 |
RunPulse |
3 |
0.0468 |
0.8111 |
6.9596 |
6.70 |
0.0154 |
| 4 |
MaxPulse |
4 |
0.0257 |
0.8368 |
4.8800 |
4.10 |
0.0533 |
| 5 |
Weight |
5 |
0.0112 |
0.8480 |
5.1063 |
1.84 |
0.1871 |
|
The BACKWARD model-selection method begins with the full
model.
Output 55.1.2: Backward Selection Method: PROC REG
| The REG Procedure |
| Model: MODEL2 |
| Dependent Variable: Oxygen |
| Backward Elimination: Step 0 |
| All Variables Entered: R-Square = 0.8487 and C(p) = 7.0000 |
| Analysis of Variance |
| Source |
DF |
Sum of Squares |
Mean Square |
F Value |
Pr > F |
| Model |
6 |
722.54361 |
120.42393 |
22.43 |
<.0001 |
| Error |
24 |
128.83794 |
5.36825 |
|
|
| Corrected Total |
30 |
851.38154 |
|
|
|
| Variable |
Parameter Estimate |
Standard Error |
Type II SS |
F Value |
Pr > F |
| Intercept |
102.93448 |
12.40326 |
369.72831 |
68.87 |
<.0001 |
| Age |
-0.22697 |
0.09984 |
27.74577 |
5.17 |
0.0322 |
| Weight |
-0.07418 |
0.05459 |
9.91059 |
1.85 |
0.1869 |
| RunTime |
-2.62865 |
0.38456 |
250.82210 |
46.72 |
<.0001 |
| RunPulse |
-0.36963 |
0.11985 |
51.05806 |
9.51 |
0.0051 |
| RestPulse |
-0.02153 |
0.06605 |
0.57051 |
0.11 |
0.7473 |
| MaxPulse |
0.30322 |
0.13650 |
26.49142 |
4.93 |
0.0360 |
| Bounds on condition number: 8.7438, 137.13 |
|
RestPulse is the first variable deleted,...
| Backward Elimination: Step 1 |
| Variable RestPulse Removed: R-Square = 0.8480 and C(p) = 5.1063 |
| Analysis of Variance |
| Source |
DF |
Sum of Squares |
Mean Square |
F Value |
Pr > F |
| Model |
5 |
721.97309 |
144.39462 |
27.90 |
<.0001 |
| Error |
25 |
129.40845 |
5.17634 |
|
|
| Corrected Total |
30 |
851.38154 |
|
|
|
| Variable |
Parameter Estimate |
Standard Error |
Type II SS |
F Value |
Pr > F |
| Intercept |
102.20428 |
11.97929 |
376.78935 |
72.79 |
<.0001 |
| Age |
-0.21962 |
0.09550 |
27.37429 |
5.29 |
0.0301 |
| Weight |
-0.07230 |
0.05331 |
9.52157 |
1.84 |
0.1871 |
| RunTime |
-2.68252 |
0.34099 |
320.35968 |
61.89 |
<.0001 |
| RunPulse |
-0.37340 |
0.11714 |
52.59624 |
10.16 |
0.0038 |
| MaxPulse |
0.30491 |
0.13394 |
26.82640 |
5.18 |
0.0316 |
| Bounds on condition number: 8.7312, 104.83 |
|
...followed by Weight.
No other variables are deleted from the model since the
variables remaining (Age,RunTime, RunPulse, and
MaxPulse)
are all significant at the 10% (the default
value of the SLS option is 0.1 for the BACKWARD elimination method)
significance level.
| Backward Elimination: Step 2 |
| Variable Weight Removed: R-Square = 0.8368 and C(p) = 4.8800 |
| Analysis of Variance |
| Source |
DF |
Sum of Squares |
Mean Square |
F Value |
Pr > F |
| Model |
4 |
712.45153 |
178.11288 |
33.33 |
<.0001 |
| Error |
26 |
138.93002 |
5.34346 |
|
|
| Corrected Total |
30 |
851.38154 |
|
|
|
| Variable |
Parameter Estimate |
Standard Error |
Type II SS |
F Value |
Pr > F |
| Intercept |
98.14789 |
11.78569 |
370.57373 |
69.35 |
<.0001 |
| Age |
-0.19773 |
0.09564 |
22.84231 |
4.27 |
0.0488 |
| RunTime |
-2.76758 |
0.34054 |
352.93570 |
66.05 |
<.0001 |
| RunPulse |
-0.34811 |
0.11750 |
46.90089 |
8.78 |
0.0064 |
| MaxPulse |
0.27051 |
0.13362 |
21.90067 |
4.10 |
0.0533 |
| Bounds on condition number: 8.4182, 76.851 |
| All variables left in the model are significant at the 0.1000 level. |
| Summary of Backward Elimination |
| Step |
Variable Removed |
Number Vars In |
Partial R-Square |
Model R-Square |
C(p) |
F Value |
Pr > F |
| 1 |
RestPulse |
5 |
0.0007 |
0.8480 |
5.1063 |
0.11 |
0.7473 |
| 2 |
Weight |
4 |
0.0112 |
0.8368 |
4.8800 |
1.84 |
0.1871 |
|
The MAXR method tries to find the "best" one-variable
model, the "best" two-variable model, and so on.
For the fitness data, the one-variable model contains RunTime;
the two-variable model contains RunTime and Age...
Output 55.1.3: Maximum R-Square Improvement Selection Method: PROC REG
| The REG Procedure |
| Model: MODEL3 |
| Dependent Variable: Oxygen |
| Maximum R-Square Improvement: Step 1 |
| Variable RunTime Entered: R-Square = 0.7434 and C(p) = 13.6988 |
| Analysis of Variance |
| Source |
DF |
Sum of Squares |
Mean Square |
F Value |
Pr > F |
| Model |
1 |
632.90010 |
632.90010 |
84.01 |
<.0001 |
| Error |
29 |
218.48144 |
7.53384 |
|
|
| Corrected Total |
30 |
851.38154 |
|
|
|
| Variable |
Parameter Estimate |
Standard Error |
Type II SS |
F Value |
Pr > F |
| Intercept |
82.42177 |
3.85530 |
3443.36654 |
457.05 |
<.0001 |
| RunTime |
-3.31056 |
0.36119 |
632.90010 |
84.01 |
<.0001 |
| Bounds on condition number: 1, 1 |
| The above model is the best 1-variable model found. |
| Maximum R-Square Improvement: Step 2 |
| Variable Age Entered: R-Square = 0.7642 and C(p) = 12.3894 |
| Analysis of Variance |
| Source |
DF |
Sum of Squares |
Mean Square |
F Value |
Pr > F |
| Model |
2 |
650.66573 |
325.33287 |
45.38 |
<.0001 |
| Error |
28 |
200.71581 |
7.16842 |
|
|
| Corrected Total |
30 |
851.38154 |
|
|
|
| Variable |
Parameter Estimate |
Standard Error |
Type II SS |
F Value |
Pr > F |
| Intercept |
88.46229 |
5.37264 |
1943.41071 |
271.11 |
<.0001 |
| Age |
-0.15037 |
0.09551 |
17.76563 |
2.48 |
0.1267 |
| RunTime |
-3.20395 |
0.35877 |
571.67751 |
79.75 |
<.0001 |
| Bounds on condition number: 1.0369, 4.1478 |
| The above model is the best 2-variable model found. |
|
...the three-variable model contains RunTime, Age, and
RunPulse;
the four-variable model contains
Age, RunTime, RunPulse, and MaxPulse...
|
| Maximum R-Square Improvement: Step 3 |
| Variable RunPulse Entered: R-Square = 0.8111 and C(p) = 6.9596 |
| Analysis of Variance |
| Source |
DF |
Sum of Squares |
Mean Square |
F Value |
Pr > F |
| Model |
3 |
690.55086 |
230.18362 |
38.64 |
<.0001 |
| Error |
27 |
160.83069 |
5.95669 |
|
|
| Corrected Total |
30 |
851.38154 |
|
|
|
| Variable |
Parameter Estimate |
Standard Error |
Type II SS |
F Value |
Pr > F |
| Intercept |
111.71806 |
10.23509 |
709.69014 |
119.14 |
<.0001 |
| Age |
-0.25640 |
0.09623 |
42.28867 |
7.10 |
0.0129 |
| RunTime |
-2.82538 |
0.35828 |
370.43529 |
62.19 |
<.0001 |
| RunPulse |
-0.13091 |
0.05059 |
39.88512 |
6.70 |
0.0154 |
| Bounds on condition number: 1.3548, 11.597 |
| The above model is the best 3-variable model found. |
| Maximum R-Square Improvement: Step 4 |
| Variable MaxPulse Entered: R-Square = 0.8368 and C(p) = 4.8800 |
| Analysis of Variance |
| Source |
DF |
Sum of Squares |
Mean Square |
F Value |
Pr > F |
| Model |
4 |
712.45153 |
178.11288 |
33.33 |
<.0001 |
| Error |
26 |
138.93002 |
5.34346 |
|
|
| Corrected Total |
30 |
851.38154 |
|
|
|
| Variable |
Parameter Estimate |
Standard Error |
Type II SS |
F Value |
Pr > F |
| Intercept |
98.14789 |
11.78569 |
370.57373 |
69.35 |
<.0001 |
| Age |
-0.19773 |
0.09564 |
22.84231 |
4.27 |
0.0488 |
| RunTime |
-2.76758 |
0.34054 |
352.93570 |
66.05 |
<.0001 |
| RunPulse |
-0.34811 |
0.11750 |
46.90089 |
8.78 |
0.0064 |
| MaxPulse |
0.27051 |
0.13362 |
21.90067 |
4.10 |
0.0533 |
| Bounds on condition number: 8.4182, 76.851 |
| The above model is the best 4-variable model found. |
|
...the five-variable model contains Age,
Weight, RunTime, RunPulse, and MaxPulse; and
finally, the six-variable model contains
all the variables in the MODEL statement.
|
| Maximum R-Square Improvement: Step 5 |
| Variable Weight Entered: R-Square = 0.8480 and C(p) = 5.1063 |
| Analysis of Variance |
| Source |
DF |
Sum of Squares |
Mean Square |
F Value |
Pr > F |
| Model |
5 |
721.97309 |
144.39462 |
27.90 |
<.0001 |
| Error |
25 |
| |