Chapter Contents Previous Next
 The PROBIT Procedure

## Example 54.3: Logistic Regression

In this example, a series of people are questioned as to whether or not they would subscribe to a new newspaper. For each person, the variables sex (Female, Male), age, and subs (1=yes,0=no) are recorded. The PROBIT procedure is used to fit a logistic regression model to the probability of a positive response (subscribing) as a function of the variables sex and age. Specifically, the probability of subscribing is modeled as

p = Pr( subs = 1) = F ( b0 + b1 × sex + b2 × age )
where F is the cumulative logistic distribution function.

By default, the PROBIT procedure models the probability of the lower response level for binary data. One way to model Pr( subs = 1) is to format the response variable so that the formatted value corresponding to subs = 1 is the lower level. The following statements format the values of subs as 1 = 'accept' and 0 = 'reject', so that PROBIT models Pr(accept) = Pr( subs = 1).

The following statements produce Output 54.3.1:

```   data news;
input sex \$ age subs;
datalines;
Female     35    0
Male       44    0
Male       45    1
Female     47    1
Female     51    0
Female     47    0
Male       54    1
Male       47    1
Female     35    0
Female     34    0
Female     48    0
Female     56    1
Male       46    1
Female     59    1
Female     46    1
Male       59    1
Male       38    1
Female     39    0
Male       49    1
Male       42    1
Male       50    1
Female     45    0
Female     47    0
Female     30    1
Female     39    0
Female     51    0
Female     45    0
Female     43    1
Male       39    1
Male       31    0
Female     39    0
Male       34    0
Female     52    1
Female     46    0
Male       58    1
Female     50    1
Female     32    0
Female     52    1
Female     35    0
Female     51    0
;

proc format;
value subscrib 1 = 'accept' 0 = 'reject';
run;

proc probit;
class subs sex;
model subs=sex age / d=logistic itprint;
format subs subscrib.;
title 'Logistic Regression of Subscription Status';
run;
```

Output 54.3.1: Logistic Regression: PROC PROBIT

 Logistic Regression of Subscription Status

 Probit Procedure

 Class Level Information Name Levels Values subs 2 accept reject sex 2 Female Male

 Logistic Regression of Subscription Status

 Probit Procedure

 Iteration History for Parameter Estimates Iter Ridge Loglikelihood Intercept age sex.1 0 0 -27.725887 0 0 0 1 0 -20.142659 -3.634567629 0.1051634384 -1.648455751 2 0 -19.52245 -5.254865196 0.1506493473 -2.234724956 3 0 -19.490439 -5.728485385 0.1639621828 -2.409827238 4 0 -19.490303 -5.76187293 0.1649007124 -2.422349862 5 0 -19.490303 -5.7620267 0.1649050312 -2.422407743

 Model Information Data Set WORK.NEWS Dependent Variable subs Number of Observations 40 Name of Distribution LOGISTIC Log Likelihood -19.49030281

 Weighted FrequencyCounts for the OrderedResponse Categories Level Count accept 20 reject 20

 Logistic Regression of Subscription Status

 Probit Procedure

 Last Evaluation of the Negative of the Gradient Intercept sex.1 age -5.95379E-12 8.76834E-10 -1.636692E-8

 Last Evaluation of the Negative of the Hessian Intercept sex.1 age Intercept 6.4597397447 4.6042218284 292.04051848 sex.1 4.6042218284 4.6042218284 216.20829515 age 292.04051848 216.20829515 13487.329973

 Algorithm converged.

 Logistic Regression of Subscription Status

 Probit Procedure

 Analysis of Parameter Estimates Variable DF Estimate Standard Error Chi-Square Pr > ChiSq Label Intercept 1 -5.76203 2.76345 4.3476 0.0371 Intercept sex 1 6.4220 0.0113 1 -2.42241 0.95590 6.4220 0.0113 Female 0 0 0 . . Male age 1 0.16491 0.06519 6.3992 0.0114

From Output 54.3.1, there appears to be an effect due to both the variables sex and age. The positive coefficient for age indicates that older people are more likely to subscribe than younger people. The negative coefficient for sex indicates that females are less likely to subscribe than males.

 Chapter Contents Previous Next Top