Chapter Contents Previous Next
 The DISCRIM Procedure

## Example 25.2: Bivariate Density Estimates and Posterior Probabilities

In this example, four more discriminant analyses of iris data are run with two quantitative variables: petal width and petal length. The example produces Output 25.2.1 through Output 25.2.5. A scatter plot shows the joint sample distribution. See Appendix B, "Using the %PLOTIT Macro," for more information on the %PLOTIT macro.

```   %plotit(data=iris, plotvars=PetalWidth PetalLength,
labelvar=_blank_, symvar=symbol, typevar=symbol,
symsize=0.35, symlen=4, exttypes=symbol, ls=100);
```

Output 25.2.1: Joint Sample Distribution of Petal Width and Petal Length in Three Species

Another data set is created for plotting, containing a grid of points suitable for contour plots. The large number of points in the grid makes the following analyses very time-consuming. If you attempt to duplicate these examples, begin with a small number of points in the grid.

```   data plotdata;
do PetalLength=-2 to 72 by 0.25;
h + 1;    * Number of horizontal cells;
do PetalWidth=-5 to 32 by 0.25;
n + 1; * Total number of cells;
output;
end;
end;
* Make variables to contain H and V grid sizes;
call symput('hnobs', compress(put(h    , best12.)));
call symput('vnobs', compress(put(n / h, best12.)));
drop n h;
run;
```

A macro CONTOUR is defined to make contour plots of density estimates and posterior probabilities. Classification results are also plotted on the same grid.

```   %macro contour;
data contour(keep=PetalWidth PetalLength symbol density);
set plotd(in=d) iris;
if d then density = max(setosa,versicolor,virginica);
run;

title3 'Plot of Estimated Densities';
%plotit(data=contour, plotvars=PetalWidth PetalLength,
labelvar=_blank_, symvar=symbol, typevar=symbol,
symlen=4, exttypes=symbol contour, ls=100,
paint=density white black, rgbtypes=contour,
hnobs=&hnobs, vnobs=&vnobs, excolors=white,
rgbround=-16 1 1 1,  extend=close, options=noclip,
types  =Setosa Versicolor Virginica  '',
symtype=symbol symbol     symbol     contour,
symsize=0.6    0.6        0.6        1,
symfont=swiss  swiss      swiss      solid)

data posterior(keep=PetalWidth PetalLength symbol
prob _into_);
set plotp(in=d) iris;
if d then prob = max(setosa,versicolor,virginica);
run;

title3 'Plot of Posterior Probabilities '
'(Black to White is Low to High Probability)';
%plotit(data=posterior, plotvars=PetalWidth PetalLength,
labelvar=_blank_, symvar=symbol, typevar=symbol,
symlen=4, exttypes=symbol contour, ls=100,
paint=prob black white 0.3 0.999, rgbtypes=contour,
hnobs=&hnobs, vnobs=&vnobs,  excolors=white,
rgbround=-16 1 1 1, extend=close, options=noclip,
types  =Setosa Versicolor Virginica  '',
symtype=symbol symbol     symbol     contour,
symsize=0.6    0.6        0.6        1,
symfont=swiss  swiss      swiss      solid)

title3 'Plot of Classification Results';
%plotit(data=posterior, plotvars=PetalWidth PetalLength,
labelvar=_blank_, symvar=symbol, typevar=symbol,
symlen=4, exttypes=symbol contour, ls=100,
paint=_into_ CXCCCCCC CXDDDDDD white,
rgbtypes=contour, hnobs=&hnobs, vnobs=&vnobs,
excolors=white,
extend=close, options=noclip,
types  =Setosa Versicolor Virginica  '',
symtype=symbol symbol     symbol     contour,
symsize=0.6    0.6        0.6        1,
symfont=swiss  swiss      swiss      solid)

%mend;
```

A normal-theory analysis (METHOD=NORMAL) assuming equal covariance matrices (POOL=YES) illustrates the linearity of the classification boundaries. These statements produce Output 25.2.2:

```   proc discrim data=iris method=normal pool=yes
testdata=plotdata testout=plotp testoutd=plotd
short noclassify crosslisterr;
class Species;
var Petal:;
title2 'Using Normal Density Estimates with Equal
Variance';
run;
%contour
```

Output 25.2.2: Normal Density Estimates with Equal Variance

 Discriminant Analysis of Fisher (1936) Iris Data Using Normal Density Estimates with Equal Variance

 The DISCRIM Procedure

 Observations 150 DF Total 149 Variables 2 DF Within Classes 147 Classes 3 DF Between Classes 2

 Class Level Information Species VariableName Frequency Weight Proportion PriorProbability Setosa Setosa 50 50.0000 0.333333 0.333333 Versicolor Versicolor 50 50.0000 0.333333 0.333333 Virginica Virginica 50 50.0000 0.333333 0.333333

 Discriminant Analysis of Fisher (1936) Iris Data Using Normal Density Estimates with Equal Variance

 The DISCRIM Procedure Classification Results for Calibration Data: WORK.IRIS Cross-validation Results using Linear Discriminant Function

 Posterior Probability of Membership in Species Obs From Species Classified intoSpecies Setosa Versicolor Virginica 5 Virginica Versicolor * 0.0000 0.8453 0.1547 9 Versicolor Virginica * 0.0000 0.2130 0.7870 25 Virginica Versicolor * 0.0000 0.8322 0.1678 57 Virginica Versicolor * 0.0000 0.8057 0.1943 91 Virginica Versicolor * 0.0000 0.8903 0.1097 148 Versicolor Virginica * 0.0000 0.3118 0.6882

 * Misclassified observation

 Discriminant Analysis of Fisher (1936) Iris Data Using Normal Density Estimates with Equal Variance

 The DISCRIM Procedure Classification Summary for Calibration Data: WORK.IRIS Cross-validation Summary using Linear Discriminant Function

 Number of Observations and Percent Classifiedinto Species From Species Setosa Versicolor Virginica Total Setosa 50 100.00 0 0.00 0 0.00 50 100.00 Versicolor 0 0.00 48 96.00 2 4.00 50 100.00 Virginica 0 0.00 4 8.00 46 92.00 50 100.00 Total 50 33.33 52 34.67 48 32.00 150 100.00 Priors 0.33333 0.33333 0.33333

 Error Count Estimates for Species Setosa Versicolor Virginica Total Rate 0.0000 0.0400 0.0800 0.0400 Priors 0.3333 0.3333 0.3333

 Discriminant Analysis of Fisher (1936) Iris Data Using Normal Density Estimates with Equal Variance

 The DISCRIM Procedure Classification Summary for Test Data: WORK.PLOTDATA Classification Summary using Linear Discriminant Function

 Number of Observations and Percent Classifiedinto Species Setosa Versicolor Virginica Total Total 14507 32.78 16888 38.16 12858 29.06 44253 100.00 Priors 0.33333 0.33333 0.33333

A normal-theory analysis assuming unequal covariance matrices (POOL=NO) illustrates quadratic classification boundaries. These statements produce Output 25.2.3:

```   proc discrim data=iris method=normal pool=no
testdata=plotdata testout=plotp testoutd=plotd
short noclassify crosslisterr;
class Species;
var Petal:;
title2 'Using Normal Density Estimates with Unequal
Variance';
run;
%contour
```

Output 25.2.3: Normal Density Estimates with Unequal Variance

 Discriminant Analysis of Fisher (1936) Iris Data Using Normal Density Estimates with Unequal Variance

 The DISCRIM Procedure

 Observations 150 DF Total 149 Variables 2 DF Within Classes 147 Classes 3 DF Between Classes 2

 Class Level Information Species VariableName Frequency Weight Proportion PriorProbability Setosa Setosa 50 50.0000 0.333333 0.333333 Versicolor Versicolor 50 50.0000 0.333333 0.333333 Virginica Virginica 50 50.0000 0.333333 0.333333

 Discriminant Analysis of Fisher (1936) Iris Data Using Normal Density Estimates with Unequal Variance

 The DISCRIM Procedure Classification Results for Calibration Data: WORK.IRIS Cross-validation Results using Quadratic Discriminant Function

 Posterior Probability of Membership in Species Obs From Species Classified intoSpecies Setosa Versicolor Virginica 5 Virginica Versicolor * 0.0000 0.7288 0.2712 9 Versicolor Virginica * 0.0000 0.0903 0.9097 25 Virginica Versicolor * 0.0000 0.5196 0.4804 91 Virginica Versicolor * 0.0000 0.8335 0.1665 148 Versicolor Virginica * 0.0000 0.4675 0.5325

 * Misclassified observation

 Discriminant Analysis of Fisher (1936) Iris Data Using Normal Density Estimates with Unequal Variance

 The DISCRIM Procedure Classification Summary for Calibration Data: WORK.IRIS Cross-validation Summary using Quadratic Discriminant Function

 Number of Observations and Percent Classifiedinto Species From Species Setosa Versicolor Virginica Total Setosa 50 100.00 0 0.00 0 0.00 50 100.00 Versicolor 0 0.00 48 96.00 2 4.00 50 100.00 Virginica 0 0.00 3 6.00 47 94.00 50 100.00 Total 50 33.33 51 34.00 49 32.67 150 100.00 Priors 0.33333 0.33333 0.33333

 Error Count Estimates for Species Setosa Versicolor Virginica Total Rate 0.0000 0.0400 0.0600 0.0333 Priors 0.3333 0.3333 0.3333

 Discriminant Analysis of Fisher (1936) Iris Data Using Normal Density Estimates with Unequal Variance

 The DISCRIM Procedure Classification Summary for Test Data: WORK.PLOTDATA Classification Summary using Quadratic Discriminant Function

 Number of Observations and Percent Classifiedinto Species Setosa Versicolor Virginica Total Total 5461 12.34 5354 12.10 33438 75.56 44253 100.00 Priors 0.33333 0.33333 0.33333

A nonparametric analysis (METHOD=NPAR) follows, using normal kernels (KERNEL=NORMAL) and equal bandwidths (POOL=YES) in each class. The value of the radius parameter r that, assuming normality, minimizes an approximate mean integrated square error is 0.50 (see the "Nonparametric Methods" section). These statements produce Output 25.2.4:

```   proc discrim data=iris method=npar kernel=normal
r=.5 pool=yes
testdata=plotdata testout=plotp
testoutd=plotd
short noclassify crosslisterr;
class Species;
var Petal:;
title2 'Using Kernel Density Estimates with Equal
Bandwidth';
run;
%contour
```

Output 25.2.4: Kernel Density Estimates with Equal Bandwidth

 Discriminant Analysis of Fisher (1936) Iris Data Using Kernel Density Estimates with Equal Bandwidth

 The DISCRIM Procedure

 Observations 150 DF Total 149 Variables 2 DF Within Classes 147 Classes 3 DF Between Classes 2

 Class Level Information Species VariableName Frequency Weight Proportion PriorProbability Setosa Setosa 50 50.0000 0.333333 0.333333 Versicolor Versicolor 50 50.0000 0.333333 0.333333 Virginica Virginica 50 50.0000 0.333333 0.333333

 Discriminant Analysis of Fisher (1936) Iris Data Using Kernel Density Estimates with Equal Bandwidth

 The DISCRIM Procedure Classification Results for Calibration Data: WORK.IRIS Cross-validation Results using Normal Kernel Density

 Posterior Probability of Membership in Species Obs From Species Classified intoSpecies Setosa Versicolor Virginica 5 Virginica Versicolor * 0.0000 0.7474 0.2526 9 Versicolor Virginica * 0.0000 0.0800 0.9200 25 Virginica Versicolor * 0.0000 0.5863 0.4137 91 Virginica Versicolor * 0.0000 0.8358 0.1642 148 Versicolor Virginica * 0.0000 0.4123 0.5877

 * Misclassified observation

 Discriminant Analysis of Fisher (1936) Iris Data Using Kernel Density Estimates with Equal Bandwidth

 The DISCRIM Procedure Classification Summary for Calibration Data: WORK.IRIS Cross-validation Summary using Normal Kernel Density

 Number of Observations and Percent Classifiedinto Species From Species Setosa Versicolor Virginica Total Setosa 50 100.00 0 0.00 0 0.00 50 100.00 Versicolor 0 0.00 48 96.00 2 4.00 50 100.00 Virginica 0 0.00 3 6.00 47 94.00 50 100.00 Total 50 33.33 51 34.00 49 32.67 150 100.00 Priors 0.33333 0.33333 0.33333

 Error Count Estimates for Species Setosa Versicolor Virginica Total Rate 0.0000 0.0400 0.0600 0.0333 Priors 0.3333 0.3333 0.3333

 Discriminant Analysis of Fisher (1936) Iris Data Using Kernel Density Estimates with Equal Bandwidth

 The DISCRIM Procedure Classification Summary for Test Data: WORK.PLOTDATA Classification Summary using Normal Kernel Density

 Number of Observations and Percent Classifiedinto Species Setosa Versicolor Virginica Total Total 12631 28.54 9941 22.46 21681 48.99 44253 100.00 Priors 0.33333 0.33333 0.33333

Another nonparametric analysis is run with unequal bandwidths (POOL=NO). These statements produce Output 25.2.5:

```   proc discrim data=iris method=npar kernel=normal
r=.5 pool=no
testdata=plotdata testout=plotp
testoutd=plotd
short noclassify crosslisterr;
class Species;
var Petal:;
title2 'Using Kernel Density Estimates with Unequal
Bandwidth';
run;
%contour
```

Output 25.2.5: Kernel Density Estimates with Unequal Bandwidth

 Discriminant Analysis of Fisher (1936) Iris Data Using Kernel Density Estimates with Unequal Bandwidth

 The DISCRIM Procedure

 Observations 150 DF Total 149 Variables 2 DF Within Classes 147 Classes 3 DF Between Classes 2

 Class Level Information Species VariableName Frequency Weight Proportion PriorProbability Setosa Setosa 50 50.0000 0.333333 0.333333 Versicolor Versicolor 50 50.0000 0.333333 0.333333 Virginica Virginica 50 50.0000 0.333333 0.333333

 Discriminant Analysis of Fisher (1936) Iris Data Using Kernel Density Estimates with Unequal Bandwidth

 The DISCRIM Procedure Classification Results for Calibration Data: WORK.IRIS Cross-validation Results using Normal Kernel Density

 Posterior Probability of Membership in Species Obs From Species Classified intoSpecies Setosa Versicolor Virginica 5 Virginica Versicolor * 0.0000 0.7826 0.2174 9 Versicolor Virginica * 0.0000 0.0506 0.9494 91 Virginica Versicolor * 0.0000 0.8802 0.1198 148 Versicolor Virginica * 0.0000 0.3726 0.6274

 * Misclassified observation

 Discriminant Analysis of Fisher (1936) Iris Data Using Kernel Density Estimates with Unequal Bandwidth

 The DISCRIM Procedure Classification Summary for Calibration Data: WORK.IRIS Cross-validation Summary using Normal Kernel Density

 Number of Observations and Percent Classifiedinto Species From Species Setosa Versicolor Virginica Total Setosa 50 100.00 0 0.00 0 0.00 50 100.00 Versicolor 0 0.00 48 96.00 2 4.00 50 100.00 Virginica 0 0.00 2 4.00 48 96.00 50 100.00 Total 50 33.33 50 33.33 50 33.33 150 100.00 Priors 0.33333 0.33333 0.33333

 Error Count Estimates for Species Setosa Versicolor Virginica Total Rate 0.0000 0.0400 0.0400 0.0267 Priors 0.3333 0.3333 0.3333

 Discriminant Analysis of Fisher (1936) Iris Data Using Kernel Density Estimates with Unequal Bandwidth

 The DISCRIM Procedure Classification Summary for Test Data: WORK.PLOTDATA Classification Summary using Normal Kernel Density

 Number of Observations and Percent Classifiedinto Species Setosa Versicolor Virginica Total Total 5447 12.31 5984 13.52 32822 74.17 44253 100.00 Priors 0.33333 0.33333 0.33333

 Chapter Contents Previous Next Top