Example 49.2: Best Subset Selection
An alternative to stepwise selection of variables is best subset
selection.
The procedure uses the branch and bound
algorithm of
Furnival and Wilson (1974) to find a specified number of best models
containing one, two, three variables and so on, up to the single model
containing all of the explanatory variables. The criterion
used to determine "best" is based on the global
score chisquared statistic. For two models A and B,
each having the same number of explanatory variables, model A is
considered to be
better than model B if the global score chisquared
statistic for A exceeds
that for B.
Best subset selection analysis is requested by specifying
the
SELECTION=SCORE option in the MODEL statement.
The BEST=3 option requests
the procedure to identify only the three best models
for each size.
In other words, PROC PHREG will list the three models having the highest
score statistics of all the models possible for a given number
of covariates.
proc phreg data=Myeloma;
model Time*VStatus(0)=LogBUN HGB Platelet Age LogWBC
Frac LogPBM Protein SCalc
/ selection=score best=3;
run;
Output 49.2.1 displays the results of this analysis.
The number of explanatory
variables in the model is given in the first column,
and the names of the variables are listed on the right.
The models are listed
in descending order of their score chisquared values within each model
size. For example,
among all models containing two explanatory variables, the
model that contains the variables LogBUN and HGB has the
largest score value (12.7252), the
model that contains the variables LogBUN and Platelet has the
second largest score value (11.1842), and the model that
contains the variables LogBUN and SCalc has the third
largest score value (9.9962).
Output 49.2.1: Best Variable Combinations
Regression Models Selected by Score Criterion 
Number of Variables 
Score ChiSquare 
Variables Included in Model 
1 
8.5164 
LogBUN 
1 
5.0664 
HGB 
1 
3.1816 
Platelet 
2 
12.7252 
LogBUN HGB 
2 
11.1842 
LogBUN Platelet 
2 
9.9962 
LogBUN SCalc 
3 
15.3053 
LogBUN HGB SCalc 
3 
13.9911 
LogBUN HGB Age 
3 
13.5788 
LogBUN HGB Frac 
4 
16.9873 
LogBUN HGB Age SCalc 
4 
16.0457 
LogBUN HGB Frac SCalc 
4 
15.7619 
LogBUN HGB LogPBM SCalc 
5 
17.6291 
LogBUN HGB Age Frac SCalc 
5 
17.3519 
LogBUN HGB Age LogPBM SCalc 
5 
17.1922 
LogBUN HGB Age LogWBC SCalc 
6 
17.9120 
LogBUN HGB Age Frac LogPBM SCalc 
6 
17.7947 
LogBUN HGB Age LogWBC Frac SCalc 
6 
17.7744 
LogBUN HGB Platelet Age Frac SCalc 
7 
18.1517 
LogBUN HGB Platelet Age Frac LogPBM SCalc 
7 
18.0568 
LogBUN HGB Age LogWBC Frac LogPBM SCalc 
7 
18.0223 
LogBUN HGB Platelet Age LogWBC Frac SCalc 
8 
18.3925 
LogBUN HGB Platelet Age LogWBC Frac LogPBM SCalc 
8 
18.1636 
LogBUN HGB Platelet Age Frac LogPBM Protein SCalc 
8 
18.1309 
LogBUN HGB Platelet Age LogWBC Frac Protein SCalc 
9 
18.4550 
LogBUN HGB Platelet Age LogWBC Frac LogPBM Protein SCalc 

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.