Chapter Contents
Chapter Contents
The GLM Procedure

Computational Resources


For large problems, most of the memory resources are required for holding the X'X matrix of the sums and cross products. The section "Parameterization of PROC GLM Models" describes how columns of the X matrix are allocated for various types of effects. For each level that occurs in the data for a combination of class variables in a given effect, a row and column for X'X is needed.

The following example illustrates the calculation. Suppose A has 20 levels, B has 4 levels, and C has 3 levels. Then consider the model

   proc glm;
     class A B C;
     model Y1 Y2 Y3=A B A*B C A*C B*C A*B*C X1 X2;

The X'X matrix (bordered by X'Y and Y'Y) can have as many as 425 rows and columns:

for the intercept term

for A

for B

for A*B

for C

for A*C

for B*C

for A*B*C

for X1 and X2 (continuous variables)

for Y1, Y2, and Y3 (dependent variables)

The matrix has 425 rows and columns only if all combinations of levels occur for each effect in the model. For m rows and columns, 8m2 bytes are needed for cross products. In this case, 8·4252 = 1,445,000 bytes, or about 1,445,000 / 1024 = 1411K.

The required memory grows as the square of the number of columns of X; most of the memory is for the A*B*C interaction. Without A*B*C, you have 185 columns and need 268K for X'X. Without either A*B*C or A*B, you need 86K. If A is recoded to have ten levels, then the full model has only 220 columns and requires 378K.

The second time that a large amount of memory is needed is when Type III, Type IV, or contrast sums of squares are being calculated. This memory requirement is a function of the number of degrees of freedom of the model being analyzed and the maximum degrees of freedom for any single source. Let Rank equal the sum of the model degrees of freedom, MaxDF be the maximum number of degrees of freedom for any single source, and Ny be the number of dependent variables in the model. Then the memory requirement in bytes is
(8 x (\frac{{Rank } x ({Rank } + 1)}2))
 & + & (N_y x {Rank } ) \ & + & ( \frac{{MaxDF} x ({MaxDF} + 1)}2 ) \ & + & (N_y x {MaxDF} )

Unfortunately, these quantities are not available when the X'X matrix is being constructed, so PROC GLM may occasionally request additional memory even after you have increased the memory allocation available to the program.

If you have a large model that exceeds the memory capacity of your computer, these are your options:

CPU Time

For large problems, two operations consume a lot of CPU time: the collection of sums and cross products and the solution of the normal equations.

The time required for collecting sums and cross products is difficult to calculate because it is a complicated function of the model. For a model with m columns and n rows (observations) in X, the worst case occurs if all columns are continuous variables, involving nm2/2 multiplications and additions. If the columns are levels of a classification, then only m sums may be needed, but a significant amount of time may be spent in look-up operations. Solving the normal equations requires time for approximately m3/2 multiplications and additions.

Suppose you know that Type IV sums of squares are appropriate for the model you are analyzing (for example, if your design has no missing cells). You can specify the SS4 option in your MODEL statement, which saves CPU time by requesting the Type IV sums of squares instead of the more computationally burdensome Type III sums of squares. This proves especially useful if you have a factor in your model that has many levels and is involved in several interactions.

Chapter Contents
Chapter Contents

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.