## Overdispersion

For a correctly specified model, the Pearson chi-square statistic
and the deviance, divided by their degrees of freedom, should be
approximately equal to one. When their values are much larger
than one, the assumption of binomial variability may not be valid
and the data are said to exhibit overdispersion. Underdispersion,
which results in the ratios being less than one, occurs less often
in practice.
When fitting a model, there are several problems that can cause the
goodness-of-fit statistics to exceed their degrees of freedom. Among
these are such problems as outliers in the data, using the wrong link
function, omitting important terms from the model, and needing to
transform some predictors. These problems should be eliminated before
proceeding to use the following methods to correct for overdispersion.

One way of correcting overdispersion is
to multiply the covariance matrix by a dispersion parameter.
This method assumes that the sample sizes in each subpopulation
are approximately equal.
You can
supply the value of the dispersion parameter directly, or you can
estimate the dispersion parameter based on either the Pearson
chi-square statistic or the deviance for the fitted model.
The Pearson chi-square statistic
and the deviance
are given by

where *m* is the number of subpopulation profiles, *k*+1 is the number of
response levels, *r*_{ij} is the total weight associated with
*j*th level responses in the *i*th profile,
, and is the fitted
probability for the *j*th level at the *i*th profile. Each of
these chi-square statistics has *mk* - *q* degrees of freedom,
where *q* is the number of
parameters estimated. The dispersion parameter
is estimated by

In order for the Pearson statistic and the deviance to be distributed as
chi-square, there must be sufficient replication within the subpopulations.
When this is not true, the data are sparse, and the *p*-values for
these statistics are not valid and should be ignored. Similarly, these
statistics, divided by their degrees of freedom, cannot serve as
indicators of overdispersion. A large difference between the
Pearson statistic and the deviance provides some evidence that the data
are too sparse to use either statistic.

You can use the AGGREGATE (or AGGREGATE=)
option to define the subpopulation
profiles. If you do not specify this option,
each observation is regarded as coming from a separate subpopulation.
For *events/trials* syntax, each observation
represents *n* Bernoulli trials,
where *n* is the value of the *trials* variable;
for *single-trial* syntax, each observation represents
a single trial.
Without the
AGGREGATE (or AGGREGATE=) option, the
Pearson chi-square statistic and the deviance are
calculated only for
*events/trials* syntax.

Note that the parameter estimates are not changed by this
method. However, their standard errors are adjusted for overdispersion,
affecting their significance tests.

*Williams' Method*

Suppose that the data consist of *n* binomial observations.
For the *i*th observation, let *r*_{i}/*n*_{i} be the observed
proportion and let **x**_{i} be the associated vector
of explanatory variables.
Suppose that the response probability for
the *i*th observation is a random variable *P*_{i}
with mean and variance

where *p*_{i} is the probability of the event, and
is a nonnegative but otherwise unknown scale parameter.
Then the mean and variance of *r*_{i} are

Williams (1982) estimates the unknown parameter by
equating the value of Pearson's chi-square statistic for the
full model
to its approximate expected value.
Suppose *w*_{i}^{*} is the weight
associated with the *i*th observation. The Pearson chi-square
statistic is given by

Let *g*'(·) be the first derivative of the link function *g*(·). The
approximate expected value of is

where *v*_{i}=*n*_{i}/(*p*_{i}(1-*p*_{i})[*g*'(*p*_{i})]^{2}) and
*d*_{i} is the variance of the linear predictor
.The scale parameter is estimated
by the following iterative procedure.
At the start,
let *w*_{i}^{*}=1 and let *p*_{i} be approximated by *r*_{i}/*n*_{i}, *i* = 1,2, ... ,*n*.
If you apply these weights and approximated probabilities to
and and then equate them,
an initial estimate of is
therefore

where *m* is the total number of parameters.
The initial estimates of the weights become
. After a weighted
fit of the model, is recalculated, and so is
. Then a revised estimate of is given by

The iterative procedure is repeated until is very
close to its degrees of freedom.
Once has been estimated by under the full model,
weights of can be used in fitting models
that have fewer terms than the full model.
See Example 39.8 for an illustration.

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.