Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The BOXPLOT Procedure

Example 18.1: Using Box Plots to Compare Groups

In the following example, a box plot is used to compare the delay times for airline flights during the Christmas holidays with the delay times prior to the holiday period. The following statements create a data set named Times with the delay times in minutes for 25 flights each day. When a flight is canceled, the delay is recorded as a missing value.

   data Times;
      informat day date7. ;
      format   day date7. ;
      input day @ ;
      do flight=1 to 25;
         input delay @ ;
         output;
         end;
   datalines;
   16DEC88   4  12   2   2  18   5   6  21   0   0   0  14   3
             .   2   3   5   0   6  19   7   4   9   5  10
   17DEC88   1  10   3   3   0   1   5   0   .   .   1   5   7
             1   7   2   2  16   2   1   3   1  31   5   0
   18DEC88   7   8   4   2   3   2   7   6  11   3   2   7   0
             1  10   2   3  12   8   6   2   7   2   4   5
   19DEC88  15   6   9   0  15   7   1   1   0   2   5   6   5
            14   7  20   8   1  14   3  10   0   1  11   7
   20DEC88   2   1   0   4   4   6   2   2   1   4   1  11   .
             1   0   6   5   5   4   2   2   6   6   4   0
   21DEC88   2   6   6   2   7   7   5   2   5   0   9   2   4
             2   5   1   4   7   5   6   5   0   4  36  28
   22DEC88   3   7  22   1  11  11  39  46   7  33  19  21   1
             3  43  23   9   0  17  35  50   0   2   1   0
   23DEC88   6  11   8  35  36  19  21   .   .   4   6  63  35
             3  12  34   9   0  46   0   0  36   3   0  14
   24DEC88  13   2  10   4   5  22  21  44  66  13   8   3   4
            27   2  12  17  22  19  36   9  72   2   4   4
   25DEC88   4  33  35   0  11  11  10  28  34   3  24   6  17
             0   8   5   7  19   9   7  21  17  17   2   6
   26DEC88   3   8   8   2   7   7   8   2   5   9   2   8   2
            10  16   9   5  14  15   1  12   2   2  14  18
   ;

In the following statements, the MEANS procedure is used to count the number of canceled flights for each day. This information is then added to the data set Times.

   proc means data=Times noprint;
      var delay;
      by day ;
      output out=cancel nmiss=ncancel;

   data Times;
      merge Times cancel;
      by day;
   run;

The following statements create a data set named Weather that contains information about possible causes for delays. This data set is merged with the data set Times.

   data Weather;
      informat day date7. ;
      format   day date7. ;
      length reason $ 16 ;
   input day flight reason & ;
   datalines;
   16DEC88  8   Fog
   17DEC88  18  Snow Storm
   17DEC88  23  Sleet
   21DEC88  24  Rain
   21DEC88  25  Rain
   22DEC88  7   Mechanical
   22DEC88  15  Late Arrival
   24DEC88  9   Late Arrival
   24DEC88  22  Late Arrival
   ;

   data times;
      merge Times Weather;
      by day flight;
   run;

The following statements create a box plot for the complete set of data.

   symbol1 v=plus     c=salmon;
   symbol2 v=square   c=vigb;
   symbol3 v=triangle c=vig;
   goptions ftext=swiss;
   axis1 minor=none color=black label=(angle=90 rotate=0);
   title 'Box Plot for Airline Delays';

   proc boxplot data=times;
      plot delay * day = ncancel /
                         nohlabel
                         symbollegend = legend1
                         cboxes       = dagr
                         cboxfill     = ywh
                         cframe       = vligb
                         vaxis        = axis1;
      legend1 label=('Cancellations:')
              cborder=black cframe=ligr;
      label delay = 'Delay in Minutes';
   run;

The box plot is shown in Output 18.1.1. The level of the symbol-variable ncancel determines the symbol marker for each group mean, and the SYMBOLLEGEND= option controls the appearance of the legend for the symbols. The NOHLABEL option suppresses the label for the horizontal axis.

Output 18.1.1: Box Plot for Airline Data
boxex1.gif (6241 bytes)

The delay distributions from December 22 through December 25 are drastically different from the delay distributions during the pre-holiday period. Both the mean delay and the variability of the delays are much greater during the holiday period.

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.