Chapter Contents |
Previous |
Next |

The CORR Procedure |

Interpreting Correlation Coefficients |

The scatterplots in Examining Correlations Using Scatterplots depict the relationship between two numeric random variables.

*Examining Correlations Using Scatterplots*

When the relationship between two variables is nonlinear or when outliers are present, the correlation coefficient incorrectly estimates the strength of the relationship. Plotting the data before computing a correlation coefficient enables you to verify the linear relationship and to identify the potential outliers.

Determining Computer Resources |

N | number of observations in the data set. |

C | number of correlation types (1 to 4). |

V | number of VAR statement variables. |

W | number of WITH statement variables. |

P | number of PARTIAL statement variables. |

T= | V+W+P | |

K= | V*W | when W>0 |

V*(V+1)/2 | when W=0 | |

L= | K | when P=0 |

T*(T+1)/2 | when P>0 |

For small N and large K, the CPU time varies as K for all types of correlations. For large N, the CPU time depends on the type of correlation. To calculate CPU time use

K*N | with PEARSON (default) |

T*N*log N | with SPEARMAN |

K*N*log N | with HOEFFDING or KENDALL |

You can reduce CPU time by specifying NOMISS. Without NOMISS, processing is much faster when most observations do not contain missing values.

The options and statements you use in the procedure require different amounts of storage to process the data. For Pearson correlations, the amount of temporary storage in bytes (M) is

40T+16L | with NOMISS and NOSIMPLE |

40T+16L+56T | with NOMISS |

40T+16L+56K | with NOSIMPLE |

40T+16L+56K+56T | with no options |

Using a PARTIAL statement increases the amount of temporary storage by 12T bytes. Using the ALPHA option increases the amount of temporary storage by 32V+16 bytes.

The following example uses a PARTIAL statement, which invokes NOMISS.

proc corr; var x1 x2; with y1 y2 y3; partial z1;Therefore, using 40T+16L+56T+12T, the minimum temporary storage equals 984 bytes (T=2+3+1 and L=T(T+1)/2).

Using the SPEARMAN, KENDALL, or HOEFFDING option requires additional temporary storage for each observation. For the most time-efficient processing, the amount of temporary storage in bytes is

40T+8K+8L*C+12T*N+28N+QS+QP+QK |

QS= | 0 | with NOSIMPLE |

68T | otherwise | |

QP= | 56K | with PEARSON and without NOMISS |

0 | otherwise | |

QK = | 32N | with KENDALL or HOEFFDING |

0 | otherwise. |

The following example uses KENDALL:

proc corr kendall; var x1 x2 x3;Therefore, the minimum temporary storage in bytes is

40*3+8*6+8*6*1+12*3N+28N+3*68+32N = 420+96N |

If M bytes are not available, PROC CORR must process the data multiple times to compute all the statistics. This reduces the minimum temporary storage you need by 12(T-2)N bytes. When this occurs, PROC CORR prints a note suggesting a larger memory region.

Chapter Contents |
Previous |
Next |
Top of Page |

Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.