Chapter Contents |
Previous |
Next |
The FREQ Procedure |
proc freq; tables A*B*C*D / cmh; run;The CMH option in the TABLES statement gives a stratified statistical analysis of the relationship between C and D, after controlling for A and B. The stratified analysis provides a way to adjust for the possible confounding effects of A and B without being forced to estimate parameters for them. The analysis produces Cochran-Mantel-Haenszel statistics, and for 2 ×2 tables, it includes estimation of the common odds ratio, common relative risks, and the Breslow-Day test for homogeneity of the odds ratios.
Let the number of strata be denoted by q, indexing the strata by h = 1, 2, ... , q. Each stratum contains a contingency table with X representing the row variable and Y representing the column variable. For table h, denote the cell frequency in row i and column j by n_{hij}, with corresponding row and column marginal totals denoted by n_{hi.} and n_{h.j}, and the overall stratum total by n_{h}.
Because the formulas for the Cochran-Mantel-Haenszel statistics are more easily defined in terms of matrices, the following notation is used. Vectors are presumed to be column vectors unless they are transposed (').
n_{hi}' | = | (n_{hi1},n_{hi2}, ... ,n_{hiC}) | (1 ×C) | |
n_{h}' | = | (n_{h1}',n_{h2}', ... , n_{hR}') | (1 ×RC) | |
p_{hi ·} | = | [(n_{hi ·})/(n_{h})] | (1 ×1) | |
p_{h ·j} | = | [(n_{h ·j})/(n_{h})] | (1 ×1) | |
P_{h* ·}' | = | (p_{h1 ·},p_{h2 ·}, ... ,p_{hR ·}) | (1 ×R) | |
P_{h ·*}' | = | (p_{h ·1},p_{h ·2}, ... ,p_{h ·C}) | (1 ×C) |
Assume that the strata are independent and that the marginal totals of each stratum are fixed. The null hypothesis, H_{0}, is that there is no association between X and Y in any of the strata. The corresponding model is the multiple hypergeometric; this implies that, under H_{0}, the expected value and covariance matrix of the frequencies are, respectively,
The generalized CMH statistic (Landis, Heyman, and Koch 1978) is defined as
PROC FREQ computes three CMH statistics using this formula for the generalized CMH statistic, with different row and column score definitions for each statistic. The CMH statistics that PROC FREQ computes are the correlation statistic, the ANOVA (row mean scores) statistic, and the general association statistic. These statistics test the null hypothesis of no association against different alternative hypotheses. The following sections describe the computation of these CMH statistics.
Caution: The CMH statistics have low power for detecting an association in which the patterns of association for some of the strata are in the opposite direction of the patterns displayed by other strata. Thus, a nonsignificant CMH statistic suggests either that there is no association or that no pattern of association has enough strength or consistency to dominate any other pattern.
The alternative hypothesis for the correlation statistic is that there is a linear association between X and Y in at least one stratum. If either X or Y does not lie on an ordinal (or interval) scale, then this statistic is not meaningful.
To compute the correlation statistic, PROC FREQ uses the formula for the generalized CMH statistic with the row and column scores determined by the SCORES= option in the TABLES statement. See the section "Scores" for more information on the available score types. The matrix of row scores R_{h} has dimension 1 ×R, and the matrix of column scores C_{h} has dimension 1 ×C.
When there is only one stratum, this CMH statistic reduces to (n-1)r^{2}, where r is the Pearson correlation coefficient between X and Y. When nonparametric (RANK or RIDIT) scores are specified, then the statistic reduces to (n-1)r_{s}^{2}, where r_{s} is the Spearman rank correlation coefficient between X and Y. When there is more than one stratum, then this CMH statistic becomes a stratum-adjusted correlation statistic.
The matrix of column scores C_{h} has dimension 1 ×C, the column scores are determined by the SCORES= option.
The matrix of row scores R_{h} has dimension (R-1) ×R and is created internally by PROC FREQ as
When there is only one stratum, this CMH statistic is essentially an analysis of variance (ANOVA) statistic in the sense that it is a function of the variance ratio F statistic that would be obtained from a one-way ANOVA on the dependent variable Y. If nonparametric scores are specified in this case, then the ANOVA statistic is a Kruskal-Wallis test.
If there is more than one stratum, then this CMH statistic corresponds to a stratum-adjusted ANOVA or Kruskal-Wallis test. In the special case where there is one subject per row and one subject per column in the contingency table of each stratum, this CMH statistic is identical to Friedman's chi-square. See Example 28.8 for an illustration.
For the general association statistic, the matrix R_{h} is the same as the one used for the ANOVA statistic. The matrix C_{h} is defined similarly as
Refer to Cochran (1954); Mantel and Haenszel (1959); Mantel (1963); Birch (1965); Landis, Heyman, and Koch (1978).
For example,
proc freq; tables A*B*C*D / cmh; run;In this example, if the row and columns variables C and D both have two levels, PROC FREQ provides odds ratio and relative risk estimates, adjusting for the confounding variables A and B.
The choice of an appropriate measure depends on the study design. For case-control (retrospective) studies, the odds ratio is appropriate. For cohort (prospective) or cross-sectional studies, the relative risk is appropriate. See the section "Odds Ratio and Relative Risks for 2×2 Tables" for more information on these measures.
Throughout this section, z denotes the percentile of the standard normal distribution.
Odds Ratio, Case-Control Studies
Mantel-Haenszel Estimator The Mantel-Haenszel estimate of the common odds ratio is computed as
Using the estimated variance for log( OR_{ MH}) given by Robins, Breslow, and Greenland (1986), PROC FREQ computes the corresponding % confidence limits for the odds ratio as
where
Note that the Mantel-Haenszel odds ratio estimator is less sensitive to small n_{h} than the logit estimator.
Logit Estimator The adjusted logit estimate of the odds ratio (Woolf 1955) is computed as
If any cell frequency in a stratum h is zero, then PROC FREQ adds 0.5 to each cell of the stratum before computing OR_{h} and w_{h} (Haldane 1955), and prints a warning.
Relative Risks, Cohort Studies
Mantel-Haenszel Estimator The Mantel-Haenszel estimate of the common relative risk for column 1 is computed as
Using the estimated variance for log( RR_{ MH}) given by Greenland and Robins (1985), PROC FREQ computes the corresponding % confidence limits for the relative risk as
where
Logit Estimator The adjusted logit estimate of the common relative risk for column 1 is computed as
The Breslow-Day statistic is computed as
Caution: Unlike the Cochran-Mantel-Haenszel statistics, the Breslow-Day test requires a large sample size within each stratum, and this limits its usefulness. In addition, the validity of the CMH tests does not depend on any assumption of homogeneity of the odds ratios; therefore, the Breslow-Day test should never be used as such an indicator of validity.
Refer to Breslow and Day (1994).
Chapter Contents |
Previous |
Next |
Top |
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.