Chapter Contents |
Previous |
Next |

The SQL Procedure |

Restriction: |
A summary function cannot appear in an ON clause or a WHERE clause. |

See also: |
GROUP BY , HAVING Clause , SELECT Clause , and table-expression |

Featured in: |
Creating a View from a Query's Result , Joining Two Tables and Calculating a New Value , and Counting Missing Values with a SAS Macro |

summary-function (<DISTINCT|ALL> sql-expression) |

- sql-expression is described in
sql-expression .

Summarizing Data |

Function Names and the Corresponding Statistics |

- AVG, MEAN
- means or average of values
- COUNT, FREQ, N
- number of nonmissing values
- CSS
- corrected sum of squares
- CV
- coefficient of variation (percent)
- MAX
- largest value
- MIN
- smallest value
- NMISS
- number of missing values
- PRT
- probability of a greater absolute value of Student's
**t** - RANGE
- range of values
- STD
- standard deviation
- STDERR
- standard error of the mean
- SUM
- sum of values
- SUMWGT
- sum of the WEIGHT variable values(footnote 1)
- T
- Student's
**t**value for testing the hypothesis that the population mean is zero - USS
- uncorrected sum of squares
- VAR
- variance

For a description and the formulas used for these statistics, see SAS Elementary Statistics Procedures

Counting Rows |

If the SELECT clause of a table-expression contains one or more summary functions and that table-expression resolves to no rows, then the summary function results are missing values. The following are exceptions that return zeros:

COUNT(*) | |

COUNT(<DISTINCT> sql-expression) | |

NMISS(<DISTINCT> sql-expression) |

See Creating a View from a Query's Result and Counting Missing Values with a SAS Macro for examples.

Calculating Statistics Based on the Number of Arguments |

proc sql; title 'Summary Table'; select * from summary;

If you use one argument in the function, the calculation is performed on that column only. If you use more than one argument, the calculation is performed on each row of the specified columns. In the following PROC SQL step, the MIN and MAX functions return the minimum and maximum of the columns they are used with. The SUM function returns the sum of each row of the columns specified as arguments:

proc sql; select min(x) as Colmin_x, min(y) as Colmin_y, max(z) as Colmax_z, sum(x,y,z) as Rowsum from summary;

Remerging Data |

NOTE: The query requires remerging summary statistics back with the original data.

The process
of **remerging** involves two passes through the data. On the first
pass, PROC SQL

- calculates and returns the value of summary functions. It then
uses the result to calculate the arithmetic expressions in which the summary
function participates.
- groups data according to the GROUP BY clause.

On the second pass, PROC SQL retrieves any additional columns and rows that it needs to show in the output.

The following examples use the PROCLIB.PAYROLL table (shown in Creating a Table from a Query's Result ) to show when remerging of data is and is not necessary.

The first query requires remerging. The first pass through the data groups the data by Jobcode and resolves the AVG function for each group. However, PROC SQL must make a second pass in order to retrieve the values of IdNumber and Salary.

proc sql outobs=10; title 'Salary Information'; title2 '(First 10 Rows Only)'; select IdNumber, Jobcode, Salary, avg(salary) as AvgSalary from proclib.payroll group by jobcode;You can change the previous query to return only the average salary for each jobcode. The following query does not require remerging because the first pass of the data does the summarizing and the grouping. A second pass is not necessary.

proc sql outobs=10; title 'Average Salary for Each Jobcode'; select Jobcode, avg(salary) as AvgSalary from proclib.payroll group by jobcode;When you use the HAVING clause, PROC SQL may have to remerge data to resolve the HAVING expression.

First, consider a query that uses HAVING but that does not require remerging.
The query groups the data by values of Jobcode, and the result contains one
row for each value of Jobcode and summary information for people in each Jobcode.
On the first pass, the summary functions provide values for the

,
**Number**

, and
**Average Age**

columns. The first pass provides everything that PROC
SQL needs to resolve the HAVING clause, so no remerging is necessary.
**Average Salary**

proc sql outobs=10; title 'Summary Information for Each Jobcode'; title2 '(First 10 Rows Only)'; select Jobcode, count(jobcode) as number label='Number', avg(int((today()-birth)/365.25)) as avgage format=2. label='Average Age', avg(salary) as avgsal format=dollar8. label='Average Salary' from proclib.payroll group by jobcode having avgage ge 30;In the following query, PROC SQL remerges the data because the HAVING clause uses the SALARY column in the comparison and SALARY is not in the GROUP BY clause.

proc sql outobs=10; title 'Employees who Earn More than the'; title2 'Average for Their Jobcode'; title3 '(First 10 Rows Only)'; select Jobcode, Salary, avg(salary) as AvgSalary from proclib.payroll group by jobcode having salary > AvgSalary;Keep in mind that PROC SQL remerges data when

- the values returned by a summary function are used in a calculation.
For example, the following query returns the values of X and the percent of
the total for each row. On the first pass, PROC SQL computes the sum of X,
and on the second pass PROC SQL computes the percentage of the total for each
value of X:
proc sql; title 'Percentage of the Total'; select X, (100*x/sum(X)) as Pct_Total from summary;

- the values returned by a summary function are compared to values
of a column that is not specified in the GROUP BY clause. For example, the
following query uses the PROCLIB.PAYROLL table. PROC SQL remerges data because
the column Salary is not specified in the GROUP BY clause:
proc sql; select jobcode, salary, avg(salary) as avsal from proclib.payroll group by jobcode having salary > avsal;

- a column from the input table is specified in the SELECT clause
and is not specified in the GROUP BY clause. This rule does not refer to columns
used as arguments to summary functions in the SELECT clause.
For example, in the following query, the presence of IdNumber in the SELECT clause causes PROC SQL to remerge the data because IdNumber is not involved in grouping or summarizing during the first pass. In order for PROC SQL to retrieve the values for IdNumber, it must make a second pass through the data.

proc sql; select IdNumber, jobcode, avg(salary) as avsal from proclib.payroll group by jobcode;

Chapter Contents |
Previous |
Next |
Top of Page |

Copyright 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.