Chapter Contents |
Previous |
Next |

Transforming Variables |

The most common transformations are available in the
**Edit:Variables** menu. For example, log transformations
are commonly used to linearize relationships,
stabilize variances, or reduce skewness.
Perform a log transformation in a fit window
by following these steps:

Open the BASEBALL data set. |

Create a fit analysis of SALARY versus CR_HOME. |

You might expect players who hit many home runs to receive high
salaries. However, most players do not hit many home runs, and
most do not have high salaries. This obscures the relationship
between **SALARY** and **CR_HOME**. Most of the observations appear
in the lower left corner of the scatter plot, and the regression
line does not fit the data well. To make the relationship
clearer, apply a logarithmic transformation.

Select both variables in the scatter plot. |

Use your host's method for noncontiguous selection.

Choose Edit:Variables:log(Y). |

**Figure 20.4:** Edit:Variables Menu

This performs a log transformation on both **SALARY** and
**CR_HOME** and transforms the scatter plot to a log-log plot.
Now the regression fit is improved, and the relationship
between salary and home run production is clearer.

The degrees of freedom (**DF**) is reduced from 261 to 258.
This is due to missing values resulting from the log
transformation, described in the following step.

Scroll the data window to display the last four variables. |

Notice that in addition to residual and predicted values from the
regression, the log transformations created two new variables:
**L_SALARY** and **L_CR_HOM**.

The log transformation is useful in many cases.
However, the result of **log( Y )** is undefined where **Y** is
less than or equal to 0. In such cases, SAS/INSIGHT software
cannot transform the value, so a missing value (.) is generated.
To see this, sort the data in the data window.

Select L_CR_HOM in the data window, and
choose Sort from the data pop-up menu. |

Missing values in the SAS System are considered to be less than
any other value, so they appear first in the sorted variable.
These values represent players who have never hit home runs.
Their value for **CR_HOME** is 0, so the log of this value cannot
be calculated. This means the log transformation has removed
data from the fit analysis. The following steps circumvent
this problem.

Select CR_HOME in the data window. |

Choose Edit:Variables:Other. |

**Figure 20.9:** Edit:Variables Menu

This displays the Edit Variables dialog shown in
Figure 20.10. In the dialog you can see that the variable
**CR_HOME** is already assigned as the **Y** variable.

Scroll down the transformation window, and select log( Y + a ). |

In the field for a enter the value 1, then press the Return key. |

Notice that the **Label** value changes from **log( CR_HOME )** to
**log( CR_HOME + 1 )** to reflect the new value of **a**.
Setting **a** to **1** avoids the problem of generating missing
values because **(CR_HOME + 1)** is greater than zero in all
cases for this data.

Click OK to perform the transformation. |

Scroll all the way to the right to see the new variable, L_CR_H_1. |

Notice that the new variable contains no missing values.

Select L_SALARY and L_CR_H_1, then choose Analyze:Fit (Y X). |

At the lower left corner of the scatter plot, you can see
observations that were not used in the previous fit analysis.
Also note that the degrees of freedom (**DF**) is back to 261.

Related Reading | Linear Models, Chapter 39. |

Chapter Contents |
Previous |
Next |
Top |

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.