Chapter Contents


The CORR Procedure


Missing Values
By default, PROC CORR uses pairwise deletion when observations contain missing values. PROC CORR includes all nonmissing pairs of values for each pair of variables in the statistical computations. Therefore, the correlations statistics may be based on different numbers of observations.

If you specify the NOMISS option, PROC CORR uses listwise deletion when a value of the BY, FREQ, VAR, WEIGHT, or WITH statement variable is missing. PROC CORR excludes all observations with missing values from the analysis. Therefore, the number of observations for each pair of variables is identical. The PARTIAL statement always excludes the observations with missing values by automatically invoking NOMISS. Listwise deletion is needed to correctly calculate Cronbach's coefficient alpha when data are missing. If a data set contains missing values, when you specify ALPHA use the NOMISS option

There are two reasons to specify NOMISS and, thus, to avoid pairwise deletion. First, NOMISS is computationally more efficient, so you use fewer computer resources. Second, if you use the correlations as input to regression or other statistical procedures, a pairwise-missing correlation matrix leads to several statistical difficulties. Pairwise correlation matrices may not be nonnegative definite, and the pattern of missing values may bias the results.

Procedure Output
By default, PROC CORR prints a report that includes descriptive statistics and correlation statistics for each variable.The descriptive statistics include the number of observations with nonmissing values, the mean, the standard deviation, the minimum, and the maximum. PROC CORR reports the following additional descriptive statistics when you request various correlation statistics:

for Pearson correlation only

for nonparametric measures of association

partial variance
for Pearson partial correlation

partial standard deviation
for Pearson partial correlation.

If variable labels are available, PROC CORR labels the variables.

When you specify the CSSCP, SSCP, or COV option, the appropriate sum-of-squares and crossproducts and covariance matrix appears at the top of the correlation report. If the data set contains missing values, PROC CORR prints additional statistics for each pair of variables. These statistics, calculated from the observations with nonmissing row and column variable values, may include

uncorrected sum-of-squares and crossproducts

uncorrected sum-of-squares for the row variable

uncorrected sum-of-squares for the column variable

corrected sum-of-squares and crossproducts

corrected sum-of-squares for the row variable

corrected sum-of-squares for the column variable

COV (W','V')

VAR (W')
variance for the row variable

VAR (V')
variance for the column variable

divisor for calculating covariance and variances.

For each pair of variables, PROC CORR always prints the correlation coefficients, the number of observations used to calculate the coefficient, and the significance probability. When you specify the ALPHA option, PROC CORR prints Cronbach's coefficient alpha, the correlation between the variable and the total of the remaining variables, and Cronbach's coefficient alpha using the remaining variables for the raw variables and the standardized variables.

Output Data Sets
When you specify the OUTP=, OUTS=, OUTK=, or OUTH= option, PROC CORR creates an output data set containing statistics for Pearson correlation, Spearman correlation, Kendall correlation, or Hoeffding's D, respectively. By default, the output data set is a special data set type (TYPE=CORR) that many SAS/STAT procedures recognize, including PROC REG and PROC FACTOR. When you specify the NOCORR option and the COV, CSSCP, or SSCP option, use the TYPE= data set option to change the data set type to COV, CSSCP, or SSCP. For example, the following statement

   proc corr nocorr cov outp=b(type=cov);
specifies the output data set type as COV.

PROC CORR does not print the output data set. Use PROC PRINT, PROC REPORT, or another SAS reporting tool to print the output data set.

The output data set includes the following variables

BY variables
identifies the BY group when using a BY statement.

_TYPE_ variable
identifies the type of observation.

_NAME_ variable
identifies the variable that corresponds to a given row of the correlation matrix.

INTERCEP variable
identifies variable sums when specifying the SSCP option.

VAR variables
identifies the variables listed in the VAR statement.

You can use a combination of the _TYPE_ and _NAME_ variables to identify the contents of an observation. The _NAME_ variable indicates which row of the correlation matrix the observation corresponds to. The values of the _TYPE_ variable are

uncorrected sums of squares and crossproducts

corrected sums of squares and crossproducts


mean of each variable

standard deviation of each variable

number of nonmissing observations for each variable

sum of the weights for each variable when using a WEIGHT statement

correlation statistics for each variable.

When you specify the SSCP option, the OUTP= data set includes an additional observation that contains intercept values. When you specify the ALPHA option, the OUTP= data set also includes observations with the following _TYPE_ values:

Cronbach's coefficient alpha for raw variables

Cronbach's coefficient alpha for standardized variables

Cronbach's coefficient alpha for raw variables after deleting one variable

Cronbach's coefficient alpha for standardized variables after deleting one variable

correlation between a raw variable and the total of the remaining raw variables

correlation between a standardized variable and the total of the remaining standardized variables.

When you use a PARTIAL statement, the previous statistics are calculated for the variables after partialling. If PROC CORR computes Pearson correlation statistics, MEAN equals zero and STD equals the partial standard deviation associated with the partial variance for the OUTP=, OUTK=, or OUTS= data set. Otherwise, PROC CORR assigns missing values to MEAN and STD. OUTP= Data Set with Pearson Partial Correlations lists the observations in an OUTP= data set when the COV option and PARTIAL statement are used to compute Pearson partial correlations. The _TYPE_ variable identifies COV, MEAN, STD, N, and CORR as the statistical values for the variables Weight, Oxygen, and Runtime. MEAN always equals 0, while STD is a partial standard deviation.

OUTP= Data Set with Pearson Partial Correlations
   Pearson Correlation Statistics Using the PARTIAL Statement  1
                 Output Data Set from PROC CORR

     _TYPE_    _NAME_       Weight      Oxygen     Runtime

      COV      Weight      72.4374    -12.7511      2.0677
      COV      Oxygen     -12.7511     27.0165     -5.5937
      COV      Runtime      2.0677     -5.5937      1.9451
      MEAN                  0.0000      0.0000      0.0000
      STD                   8.5110      5.1977      1.3947
      N                    28.0000     28.0000     28.0000
      CORR     Weight       1.0000     -0.2882      0.1742
      CORR     Oxygen      -0.2882      1.0000     -0.7716
      CORR     Runtime      0.1742     -0.7716      1.0000

Chapter Contents



Top of Page

Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.