Chapter Contents Previous Next
 Introduction to Regression Procedures

## Linear Models

In matrix algebra notation, a linear model is written as

where X is the n ×k design matrix (rows are observations and columns are the regressors), is the k ×1 vector of unknown parameters, and is the n ×1 vector of unknown errors. The first column of X is usually a vector of 1s used in estimating the intercept term.

The statistical theory of linear models is based on strict classical assumptions. Ideally, the response is measured with all the factors controlled in an experimentally determined environment. If you cannot control the factors experimentally, some tests must be interpreted as being conditional on the observed values of the regressors.

Other assumptions are that

• the form of the model is correct (all important explanatory variables have been included)
• regressor variables are measured without error
• the expected value of the errors is zero
• the variance of the errors (and thus the dependent variable) is a constant across observations (called )
• the errors are uncorrelated across observations
When hypotheses are tested, the additional assumption is made that the errors are normally distributed.

### Statistical Model

If the model satisfies all the necessary assumptions, the least-squares estimates are the best linear unbiased estimates (BLUE). In other words, the estimates have minimum variance among the class of estimators that are unbiased and are linear functions of the responses. If the additional assumption that the error term is normally distributed is also satisfied, then
• the statistics that are computed have the proper sampling distributions for hypothesis testing
• parameter estimates are normally distributed
• various sums of squares are distributed proportional to chi-square, at least under proper hypotheses
• ratios of estimates to standard errors are distributed as Student's t under certain hypotheses
• appropriate ratios of sums of squares are distributed as F under certain hypotheses
When regression analysis is used to model data that do not meet the assumptions, the results should be interpreted in a cautious, exploratory fashion. The significance probabilities under these circumstances are unreliable.

Box (1966) and Mosteller and Tukey (1977, chaps. 12 and 13) discuss the problems that are encountered with regression data, especially when the data are not under experimental control.

 Chapter Contents Previous Next Top