Chapter Contents |
Previous |
Next |

The LOGISTIC Procedure |

This section uses the following notation:

*r*_{j},*n*_{j}is the number of event responses out of*r*_{j}trials for the*n*_{j}th observation. If*j**events/trials*syntax is used,is the value of*r*_{j}*events*andis the value of*n*_{j}*trials*. For*single-trial*syntax,, and*n*_{j}=1if the ordered response is 1, and*r*_{j}=1if the ordered response is 2.*r*_{j}=0*w*_{j}- is the total weight (the product of the WEIGHT and FREQ values)
of the
th observation.*j* *p*_{j}- is the probability of an event response for the
th observation given by , where*j*(.) is the inverse link function.*F* **b**- is the maximum likelihood estimate (MLE)
of .
- is the estimated covariance matrix of
**b**. - is the estimate of
evaluated at*p*_{j}**b**, and .

Pregibon suggests using the index plots of several diagnostic statistics to identify influential observations and to quantify the effects on various aspects of the maximum likelihood fit. In an index plot, the diagnostic statistic is plotted against the observation number. In general, the distributions of these diagnostic statistics are not known, so cutoff values cannot be given for determining when the values are large. However, the IPLOTS and INFLUENCE options provide displays of the diagnostic values allowing visual inspection and comparison of the values across observations. In these plots, if the model is correctly specified and fits all observations well, then no extreme points should appear.

The next five sections give formulas for these diagnostic statistics.

For a binary response logit model, the hat matrix diagonal elements are

If the estimated probability is extreme (less than 0.1 and greater than 0.9, approximately), then the hat diagonal may be greatly reduced in value. Consequently, when an observation has a very large or very small estimated probability, its hat diagonal value is not a good indicator of the observation's distance from the design space (Hosmer and Lemeshow 1989).

The Pearson chi-square statistic is the sum of squares of the Pearson residuals. The deviance residual for the

where the plus (minus) in is used if

where is the standard error of the

is the approximate change (

Typically, to use these statistics, you plot them against an index (as the IPLOT option does) and look for outliers.

Chapter Contents |
Previous |
Next |
Top |

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.