Chapter Contents |
Previous |
Next |

The CANDISC Procedure |

The CANDISC procedure performs a canonical discriminant analysis, computes squared Mahalanobis distances between class means, and performs both univariate and multivariate one-way analyses of variance. Two output data sets can be produced: one containing the canonical coefficients and another containing, among other things, scored canonical variables. The canonical coefficients output data set can be rotated by the FACTOR procedure. It is customary to standardize the canonical coefficients so that the canonical variables have means that are equal to zero and pooled within-class variances that are equal to one. PROC CANDISC displays both standardized and unstandardized canonical coefficients. Correlations between the canonical variables and the original variables as well as the class means for the canonical variables are also displayed; these correlations, sometimes known as loadings, are called canonical structures. The scored canonical variables output data set can be used in conjunction with the PLOT procedure or the %PLOTIT macro to plot pairs of canonical variables to aid visual interpretation of group differences.

Given two or more groups of observations with measurements on several
quantitative variables, canonical discriminant analysis derives a linear
combination of the variables that has the highest possible multiple correlation
with the groups. This maximal multiple correlation is called the *first
canonical correlation*. The coefficients of the linear combination are the
*canonical coefficients* or *canonical weights*.
The variable defined by the linear combination is the *
first canonical variable* or *canonical component*.
The second canonical correlation is obtained by finding the
linear combination uncorrelated with the first canonical
variable that has the highest possible multiple correlation
with the groups. The process of extracting canonical
variables can be repeated until the number of canonical
variables equals the number of original variables or the
number of classes minus one, whichever is smaller.

The first canonical correlation is at least as large as the multiple correlation
between the groups and any of the original variables. If the original variables
have high within-group correlations, the first canonical correlation can be
large even if all the multiple correlations are small. In other words, the
first canonical variable can show substantial differences between the classes,
even if none of the original variables do. Canonical variables are sometimes
called *discriminant functions*,
but this usage is ambiguous because the DISCRIM procedure
produces very different functions for classification that
are also called discriminant functions.

For each canonical correlation, PROC CANDISC tests the hypothesis that it and
all smaller canonical correlations are zero in the population. An *F*
approximation (Rao 1973; Kshirsagar 1972) is used that gives better small-sample
results than the usual chi-square approximation. The variables should have an
approximate multivariate normal distribution within each class, with a common
covariance matrix in order for the probability levels to be valid.

Canonical discriminant analysis is equivalent to canonical correlation analysis between the quantitative variables and a set of dummy variables coded from the class variable. Canonical discriminant analysis is also equivalent to performing the following steps:

- Transform the variables so that the pooled within-class covariance matrix is an identity matrix.
- Compute class means on the transformed variables.
- Perform a principal component analysis on the means, weighting each mean by the number of observations in the class. The eigenvalues are equal to the ratio of between-class variation to within-class variation in the direction of each principal component.
- Back-transform the principal components into the space of the original variables, obtaining the canonical variables.

Chapter Contents |
Previous |
Next |
Top |

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.