Chapter Contents |
Previous |
Next |
The FREQ Procedure |
The Pearson correlation coefficient and the Spearman rank correlation coefficient are also appropriate for ordinal variables. The Pearson correlation describes the strength of the linear association between the row and column variables, and it is computed using the row and column scores specified by the SCORES= option in the TABLES statement. The Spearman correlation is computed with rank scores. The polychoric correlation (requested by the PLCORR option) also requires ordinal variables and assumes that the variables have an underlying bivariate normal distribution. The following measures of association do not require ordinal variables, but they are appropriate for nominal variables: lambda asymmetric, lambda symmetric, and uncertainty coefficients.
PROC FREQ computes estimates of the measures according to the formulas given in the discussion of each measure of association. For each measure, PROC FREQ computes an asymptotic standard error (ASE), which is the square root of the asymptotic variance denoted by var in the following sections.
The confidence limits are computed as
where est is the estimate of the measure, is the percentile of the standard normal distribution, and ASE is the asymptotic standard error of the estimate.
where est is the estimate of the measure and var_{0}(est) is the variance of the estimate under the null hypothesis. Formulas for var_{0}(est) are given in the discussion of each measure of association.
Note that the ratio of est to is the same for the following measures: gamma, Kendall's tau-b, Stuart's tau-c, Somers' D(R|C), and Somers' D(C|R). Therefore, the tests for these measures are identical. For example, the p-values for the test of H_{0}: gamma = 0 equal the p-values for the test of H_{0}: tau-b = 0.
PROC FREQ computes one-sided and two-sided p-values for each of these tests. When the test statistic z is greater than its null hypothesis expected value of zero, PROC FREQ computes the right-sided p-value, which is the probability of a larger value of the statistic occurring under the null hypothesis. A small right-sided p-value supports the alternative hypothesis that the true value of the measure is greater than zero. When the test statistic is less than or equal to zero, PROC FREQ computes the left-sided p-value, which is the probability of a smaller value of the statistic occurring under the null hypothesis. A small left-sided p-value supports the alternative hypothesis that the true value of the measure is less than zero. The one-sided p-value P_{1} can be expressed as
where Z has a standard normal distribution. The two-sided p-value P_{2} is computed as
Exact tests are available for two measures of association, the Pearson correlation coefficient and the Spearman rank correlation coefficient. If you specify the PCORR option in the EXACT statement, PROC FREQ computes the exact test of the hypothesis that the Pearson correlation equals zero. If you specify the SCORR option in the EXACT statement, PROC FREQ computes the exact test of the hypothesis that the Spearman correlation equals zero. See the section "Exact Statistics" for information on exact tests.
The variance of the estimator under the null hypothesis that gamma equals zero is computed as
For 2 ×2 tables, gamma is equivalent to Yule's Q. Refer to Goodman and Kruskal (1979), Agresti (1990), and Brown and Benedetti (1977).
The variance of the estimator under the null hypothesis that tau-b equals zero is computed as
Refer to Kendall (1955) and Brown and Benedetti (1977).
The variance of the estimator under the null hypothesis that tau-c equals zero is
Refer to Brown and Benedetti (1977).
The variance of the estimator under the null hypothesis that D(C|R) equals zero is computed as
Refer to Somers (1962), Goodman and Kruskal (1979), and Liebetrau (1983).
The row scores R_{i} and the column scores C_{j} are
determined by the SCORES= option in the TABLES statement,
and
To compute an asymptotic test for the Pearson correlation, PROC FREQ uses a standardized test statistic , which has an asymptotic standard normal distribution under the null hypothesis that the correlation equals zero. The standardized test statistic is computed as
where var_{0}(r) is the variance of the correlation under the null hypothesis.
The asymptotic variance is derived for multinomial sampling in a contingency table framework, and it differs from the form obtained under the assumption that both variables are continuous and normally distributed. Refer to Brown and Benedetti (1977).
PROC FREQ also computes the exact test for the hypothesis that the Pearson correlation equals zero when you specify the PCORR option in the EXACT statement. See the section "Exact Statistics" for information on exact tests.
Refer to Snedecor and Cochran (1989) and Brown and Benedetti (1977).
To compute an asymptotic test for the Spearman correlation, PROC FREQ uses a standardized test statistic ,which has an asymptotic standard normal distribution under the null hypothesis that the correlation equals zero. The standardized test statistic is computed as
where var_{0}(r_{s}) is the variance of the correlation under the null hypothesis.
The asymptotic variance is derived for multinomial sampling in a contingency table framework, and it differs from the form obtained under the assumption that both variables are continuous and normally distributed. Refer to Brown and Benedetti (1977).
PROC FREQ also computes the exact test for the hypothesis that the Spearman rank correlation equals zero when you specify the SCORR option in the EXACT statement. See the section "Exact Statistics" for information on exact tests.
To estimate the polychoric correlation, PROC FREQ iteratively solves the likelihood equations by a Newton-Raphson algorithm using the Pearson correlation coefficient as the initial approximation. Iteration stops when the convergence measure falls below the convergence criterion or when the maximum number of iterations is reached, whichever occurs first. The CONVERGE= option sets the convergence criterion, and the default value is 0.0001. The MAXITER= option sets the maximum number of iterations, and the default value is 20.
Because of the uniqueness assumptions, ties in the frequencies or in the marginal totals must be broken in an arbitrary but consistent manner. In case of ties, l is defined here as the smallest value of j such that r = n_{·j}. For a given i, if there is at least one value j such that n_{ij}=r_{i}=c_{j}, then l_{i} is defined here to be the smallest such value of j. Otherwise, if n_{il}=r_{i}, then l_{i} is defined to be equal to l. If neither condition is true, then l_{i} is taken to be the smallest value of j such that n_{ij}=r_{i}. The formulas for lambda asymmetric (R|C) can be obtained by interchanging the indices.
Refer to Goodman and Kruskal (1979).
Chapter Contents |
Previous |
Next |
Top |
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.