Chapter Contents |
Previous |
Next |

The FREQ Procedure |

Missing Values |

- MISSPRINT
- includes missing value frequencies in frequency or crosstabulation tables.
- MISSING
- includes missing values in percentage and statistical calculations.

The OUT= option in the TABLES statement includes an observation in the output data set that contains the frequency of missing values. The NMISS keyword in the OUTPUT statement creates a variable in the output data set that contains the number of missing values.

Missing Values in Frequency Tables shows three ways that PROC FREQ handles missing values. The first table uses the default method; the second table uses MISSPRINT; and the third table uses MISSING.

*Missing Values in Frequency Tables*

When a combination of variable values for a crosstabulation is missing, PROC FREQ assigns zero to the frequency count for the table cell. By default, PROC FREQ omits missing combinations in list format and in the output data set that is created with a TABLES statement. To include the missing combinations, use SPARSE with LIST or OUT= in the TABLES statement.

PROC FREQ treats missing BY variable values like any other BY variable value. The missing values form a separate BY group. When the value of a WEIGHT variable is missing, PROC FREQ excludes the observation from the analysis.

Procedure Output |

Use the following TABLES statement options to report additional information for each table cell:

- CELLCHI2
- includes the cell's contribution to the total chi-square statistic
- CUMCOL
- includes the cumulative column percentage of the cell
- DEVIATION
- includes the deviation of the cell frequency from the expected value
- EXPECTED
- includes the expected cell frequency under the hypothesis of independence.

By default, PROC FREQ displays the next one-way frequency table on the
current page when there is enough space to display the entire table. If you
use COMPRESS in the PROC FREQ statement, the next one-way table starts to
display on the current page even when the entire table will not fit. If you
use PAGE in the PROC FREQ statement, each frequency or crosstabulation table
always displays on a separate page.

When scientific notation is used, only the first few significant digits are shown. If you need more significant digits than PROC FREQ displays, create an output data set by specifying OUT= in the TABLES statement. Then use PROC PRINT and assign an appropriate format to the variable COUNT. For example, the statement

format count 10.;displays exact integer counts up to 9999999999. For more information about formats, see the section on components of the SAS language in

**CAUTION:****Multiway tables can generate a great deal of displayed output.**For example, if the variables A, B, C, D, and E each have ten levels, the table request A*B*C*D*E may generate 1000 or more pages of output. If you are primarily interested in the tests and measures of association, use NOPRINT in the TABLES statement to suppress the tables but display the statistics. Or use NOPRINT in the PROC FREQ statement to suppress all displayed output, and use the OUTPUT statement to store the statistics in an output data set. If you are interested in frequency counts and percentages use LIST in the TABLES statement.

Output Data Sets |

- TABLES statement, OUT= option
- creates an output data set that contains frequency or crosstabulation table counts and percentages.
- OUTPUT statement
- creates an output data set that contains statistics.

- BY variables
- table request variables, such as A, B, C, and D in the table request
A*B*C*D
- COUNT variable containing the cell frequency
- PERCENT variable containing the cell percentage.

If you use OUTEXPECT and OUTPCT, the output data set also contains expected frequencies and row, column, and table percentages, respectively. The additional variables are

- EXPECTED variable containing the expected frequency
- PCT_TABL variable containing the percentage of two-way table frequency,
for
**n**-way tables where**n**> 2 - PCT_ROW variable containing the percentage of row frequency
- PCT_COL variable containing the
percentage of column frequency.

When you submit the following statements

proc freq; tables a a*b / out=d; run;the output data set D contains frequencies and percentages for the last table request, A*B. If A has two levels (1 and 2), B has three levels (1, 2, and 3), and no table cell count is zero or missing, the output data set D includes six observations, one for each combination of A and B. The first observation corresponds to A=1 and B=1; the second observation corresponds to A=1 and B=2; and so on. The data set also includes the variables COUNT and PERCENT. The value of COUNT is the number of observations that have the given combination of A and B values. The value of PERCENT is the percent of the total number of observations having that A and B combination.

When PROC FREQ combines different variable values into the same formatted level, the output data set contains the smallest internal value for the formatted level. For example, suppose a variable X has the values 1.1, 1.4, 1.7, 2.1, and 2.3. When you submit the statement

format x 1.;in a PROC FREQ step, the formatted levels listed in the frequency table for X are 1 and 2. If you create an output data set with the frequency counts, the internal values of X are 1.1 and 1.7. To report the internal values of X when you display the output data set, use a format of 3.1 with X.

The output data set can include the following variables:

- BY variables
- variables that identify the stratum such as A and B in
the table
request A*B*C*D
- variables that contain the specified statistics.

The output data set also includes variables with the
**p**-value
and degrees of freedom, asymptotic standard error (ASE), or confidence limits
when PROC FREQ computes these values for a specified statistic.

The variable names for the specified statistics in the output data set
are the names of the keywords that are enclosed in underscores. PROC FREQ
forms variable names for the corresponding **p**-values, degrees
of freedom, or confidence limits by combining the name of the keyword with
one of the following prefixes

DF_ | degrees of freedom |

E_ | asymptotic standard error (ASE) |

E0_ | asymptotic standard error under the null hypothesis |

L_ | lower confidence limit |

P_ | p-value |

P2_ | two-sided p-value |

PL_ | left-sided p-value |

PR_ | right-sided p-value |

U_ | upper confidence limit |

XP_ | exact p-value |

XP2_ | exact two-sided p-value |

XPR_ | exact right-sided p-value |

XPL_ | exact left-sided p-value |

XL_ | exact lower confidence limit |

XU_ | exact upper confidence limit |

Z_ | standardized value |

Chapter Contents |
Previous |
Next |
Top of Page |

Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.