Chapter Contents Previous Next
 The GENMOD Procedure

## What is a Generalized Linear Model?

A traditional linear model is of the form

where yi is the response variable for the ith observation. The quantity xi is a column vector of covariates, or explanatory variables, for observation i that is known from the experimental setting and is considered to be fixed, or nonrandom. The vector of unknown coefficients is estimated by a least squares fit to the data y. The are assumed to be independent, normal random variables with zero mean and constant variance. The expected value of yi, denoted by , is
While traditional linear models are used extensively in statistical data analysis, there are types of problems for which they are not appropriate.
• It may not be reasonable to assume that data are normally distributed. For example, the normal distribution (which is continuous) may not be adequate for modeling counts or measured proportions that are considered to be discrete.
• If the mean of the data is naturally restricted to a range of values, the traditional linear model may not be appropriate, since the linear predictor can take on any value. For example, the mean of a measured proportion is between 0 and 1, but the linear predictor of the mean in a traditional linear model is not restricted to this range.
• It may not be realistic to assume that the variance of the data is constant for all observations. For example, it is not unusual to observe data where the variance increases with the mean of the data.
A generalized linear model extends the traditional linear model and is, therefore, applicable to a wider range of data analysis problems. A generalized linear model consists of the following components:
• The linear component is defined just as it is for traditional linear models:
• A monotonic differentiable link function g describes how the expected value of yi is related to the linear predictor :
• The response variables yi are independent for i = 1, 2,...and have a probability distribution from an exponential family. This implies that the variance of the response depends on the mean through a variance function V:
where is a constant and wi is a known weight for each observation. The dispersion parameter is either known (for example, for the binomial or Poisson distribution, ) or it must be estimated.

See the section "Response Probability Distributions" for the form of a probability distribution from the exponential family of distributions.

As in the case of traditional linear models, fitted generalized linear models can be summarized through statistics such as parameter estimates, their standard errors, and goodness-of-fit statistics. You can also make statistical inference about the parameters using confidence intervals and hypothesis tests. However, specific inference procedures are usually based on asymptotic considerations, since exact distribution theory is not available or is not practical for all generalized linear models.

 Chapter Contents Previous Next Top