Chapter Contents |
Previous |
Next |
Introduction to Regression Procedures |
Parameter estimates are formed using least-squares criteria by solving the normal equations
for the parameter estimates b, yielding
Assume for the present that (X'X) is full rank (this assumption is relaxed later). The variance of the error is estimated by the mean square error
where x_{i} is the ith row of regressors. The parameter estimates are unbiased:
The covariance matrix of the estimates is
The estimate of the covariance matrix is obtained by replacing with its estimate, s^{2}, in the formula preceding:
Let
Standard errors of the estimates are computed using the equation
where (X' X)^{-1}_{ii} is the ith diagonal element of (X' X)^{-1}. The ratio
is distributed as Student's t under the hypothesis that is zero. Regression procedures display the t ratio and the significance probability, which is the probability under the hypothesis of a larger absolute t value than was actually obtained. When the probability is less than some small level, the event is considered so unlikely that the hypothesis is rejected.
Type I SS and Type II SS measure the contribution of a variable to the reduction in SSE. Type I SS measure the reduction in SSE as that variable is entered into the model in sequence. Type II SS are the increment in SSE that results from removing the variable from the full model. Type II SS are equivalent to the Type III and Type IV SS reported in the GLM procedure. If Type II SS are used in the numerator of an F test, the test is equivalent to the t test for the hypothesis that the parameter is zero. In polynomial models, Type I SS measure the contribution of each polynomial term after it is orthogonalized to the previous terms in the model. The four types of SS are described in Chapter 12, "The Four Types of Estimable Functions."
Standardized estimates are defined as the estimates that result when all variables are standardized to a mean of 0 and a variance of 1. Standardized estimates are computed by multiplying the original estimates by the sample standard deviation of the regressor variable and dividing by the sample standard deviation of the dependent variable.
R^{2} is an indicator of how much of the variation in the data is explained by the model. It is defined as
where SSE is the sum of squares for error and TSS is the corrected total sum of squares. The Adjusted R^{2} statistic is an alternative to R^{2} that is adjusted for the number of parameters in the model. This is calculated as
where n is the number of observations used to fit the model, p is the number of parameters in the model (including the intercept), and i is 1 if the model includes an intercept term, and 0 otherwise.
Tolerances and variance inflation factors measure the strength of interrelationships among the regressor variables in the model. If all variables are orthogonal to each other, both tolerance and variance inflation are 1. If a variable is very closely related to other variables, the tolerance goes to 0 and the variance inflation gets very large. Tolerance (TOL) is 1 minus the R^{2} that results from the regression of the other variables in the model on that regressor. Variance inflation (VIF) is the diagonal of (X' X)^{-1} if (X' X) is scaled to correlation form. The statistics are related as
However, these estimates are not unique since there are an infinite number of solutions using different generalized inverses. PROC REG and other regression procedures choose a nonzero solution for all variables that are linearly independent of previous variables and a zero solution for other variables. This corresponds to using a generalized inverse in the normal equations, and the expected values of the estimates are the Hermite normal form of X' X multiplied by the true parameters:
Degrees of freedom for the zeroed estimates are reported as zero. The hypotheses that are not testable have t tests displayed as missing. The message that the model is not full rank includes a display of the relations that exist in the matrix.
Chapter Contents |
Previous |
Next |
Top |
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.