Chapter Contents Previous Next
 The TRANSREG Procedure

## Example 65.4: Transformation Regression of Exhaust Emissions Data

In this example, the MORALS algorithm is applied to data from an experiment in which nitrogen oxide emissions from a single cylinder engine are measured for various combinations of fuel, compression ratio, and equivalence ratio. The data are provided by Brinkman (1981).

The equivalence ratio and nitrogen oxide variables are continuous and numeric, so spline transformations of these variables are requested. Each spline is degree three with nine knots (one at each decile) in order to allow PROC TRANSREG a great deal of freedom in finding transformations. The compression ratio variable has only five discrete values, so an optimal scoring is requested. The character variable Fuel is nominal, so it is designated as a classification variable. No monotonicity constraints are placed on any of the transformations. Observations with missing values are excluded with the NOMISS a-option.

The squared multiple correlation for the initial model is less than 0.25. PROC TRANSREG increases the R2 to over 0.95 by transforming the variables. The transformation plots show how each variable is transformed. The transformation of compression ratio (TCpRatio) is nearly linear. The transformation of equivalence ratio (TEqRatio) is nearly parabolic. It can be seen from this plot that the optimal transformation of equivalence ratio is nearly uncorrelated with the original scoring. This suggests that the large increase in R2 is due to this transformation. The transformation of nitrogen oxide (TNOx) is something like a log transformation.

These results suggest the parametric model

You can perform this analysis with PROC TRANSREG using the following MODEL statement:

    model log(NOx)= psp(EqRatio / deg=2) identity(CpRatio)
class(Fuel / zero=first);


The LOG transformation computes the natural log. The PSPLINE expansion expands EqRatio into a linear term, EqRatio, and a squared term, EqRatio2. A linear transformation of CpRatio and a dummy variable expansion of Fuel is requested with the first level as the reference level. These should provide a good parametric operationalization of the optimal transformations. The final model has an R2 of 0.91 (smaller than before since the model uses fewer degrees of freedom, but still quite good).

The following statements produce Output 65.4.1 through Output 65.4.3:

   title 'Gasoline Example';

data Gas;
input Fuel :\$8. CpRatio EqRatio NOx @@;
label Fuel    = 'Fuel'
CpRatio = 'Compression Ratio (CR)'
EqRatio = 'Equivalence Ratio (PHI)'
NOx     = 'Nitrogen Oxide (NOx)';
datalines;
Ethanol  12.0 0.907 3.741 Ethanol  12.0 0.761 2.295
Ethanol  12.0 1.108 1.498 Ethanol  12.0 1.016 2.881
Ethanol  12.0 1.189 0.760 Ethanol   9.0 1.001 3.120
Ethanol   9.0 1.231 0.638 Ethanol   9.0 1.123 1.170
Ethanol  12.0 1.042 2.358 Ethanol  12.0 1.215 0.606
Ethanol  12.0 0.930 3.669 Ethanol  12.0 1.152 1.000
Ethanol  15.0 1.138 0.981 Ethanol  18.0 0.601 1.192
Ethanol   7.5 0.696 0.926 Ethanol  12.0 0.686 1.590
Ethanol  12.0 1.072 1.806 Ethanol  15.0 1.074 1.962
Ethanol  15.0 0.934 4.028 Ethanol   9.0 0.808 3.148
Ethanol   9.0 1.071 1.836 Ethanol   7.5 1.009 2.845
Ethanol   7.5 1.142 1.013 Ethanol  18.0 1.229 0.414
Ethanol  18.0 1.175 0.812 Ethanol  15.0 0.568 0.374
Ethanol  15.0 0.977 3.623 Ethanol   7.5 0.767 1.869
Ethanol   7.5 1.006 2.836 Ethanol   9.0 0.893 3.567
Ethanol  15.0 1.152 0.866 Ethanol  15.0 0.693 1.369
Ethanol  15.0 1.232 0.542 Ethanol  15.0 1.036 2.739
Ethanol  15.0 1.125 1.200 Ethanol   9.0 1.081 1.719
Ethanol   9.0 0.868 3.423 Ethanol   7.5 0.762 1.634
Ethanol   7.5 1.144 1.021 Ethanol   7.5 1.045 2.157
Ethanol  18.0 0.797 3.361 Ethanol  18.0 1.115 1.390
Ethanol  18.0 1.070 1.947 Ethanol  18.0 1.219 0.962
Ethanol   9.0 0.637 0.571 Ethanol   9.0 0.733 2.219
Ethanol   9.0 0.715 1.419 Ethanol   9.0 0.872 3.519
Ethanol   7.5 0.765 1.732 Ethanol   7.5 0.878 3.206
Ethanol   7.5 0.811 2.471 Ethanol  15.0 0.676 1.777
Ethanol  18.0 1.045 2.571 Ethanol  18.0 0.968 3.952
Ethanol  15.0 0.846 3.931 Ethanol  15.0 0.684 1.587
Ethanol   7.5 0.729 1.397 Ethanol   7.5 0.911 3.536
Ethanol   7.5 0.808 2.202 Ethanol   7.5 1.168 0.756
Indolene  7.5 0.831 4.818 Indolene  7.5 1.045 2.849
Indolene  7.5 1.021 3.275 Indolene  7.5 0.970 4.691
Indolene  7.5 0.825 4.255 Indolene  7.5 0.891 5.064
Indolene  7.5 0.710 2.118 Indolene  7.5 0.801 4.602
Indolene  7.5 1.074 2.286 Indolene  7.5 1.148 0.970
Indolene  7.5 1.000 3.965 Indolene  7.5 0.928 5.344
Indolene  7.5 0.767 3.834 Ethanol   7.5 0.749 1.620
Ethanol   7.5 0.892 3.656 Ethanol   7.5 1.002 2.964
82rongas  7.5 0.873 6.021 82rongas  7.5 0.987 4.467
82rongas  7.5 1.030 3.046 82rongas  7.5 1.101 1.596
82rongas  7.5 1.173 0.835 82rongas  7.5 0.931 5.498
82rongas  7.5 0.822 5.470 82rongas  7.5 0.749 4.084
82rongas  7.5 0.625 0.716 94%Eth    7.5 0.818 2.382
94%Eth    7.5 1.128 1.004 94%Eth    7.5 1.191 0.623
94%Eth    7.5 1.132 1.030 94%Eth    7.5 0.993 2.593
94%Eth    7.5 0.866 2.699 94%Eth    7.5 0.910 3.177
94%Eth   12.0 1.139 1.151 94%Eth   12.0 1.267 0.474
94%Eth   12.0 1.017 2.814 94%Eth   12.0 0.954 3.308
94%Eth   12.0 0.861 3.031 94%Eth   12.0 1.034 2.537
94%Eth   12.0 0.781 2.403 94%Eth   12.0 1.058 2.412
94%Eth   12.0 0.884 2.452 94%Eth   12.0 0.766 1.857
94%Eth    7.5 1.193 0.657 94%Eth    7.5 0.885 2.969
94%Eth    7.5 0.915 2.670 Ethanol  18.0 0.812 3.760
Ethanol  18.0 1.230 0.672 Ethanol  18.0 0.804 3.677
Ethanol  18.0 0.712  .    Ethanol  12.0 0.813 3.517
Ethanol  12.0 1.002 3.290 Ethanol   9.0 0.696 1.139
Ethanol   9.0 1.199 0.727 Ethanol   9.0 1.030 2.581
Ethanol  15.0 0.602 0.923 Ethanol  15.0 0.694 1.527
Ethanol  15.0 0.816 3.388 Ethanol  15.0 0.896  .
Ethanol  15.0 1.037 2.085 Ethanol  15.0 1.181 0.966
Ethanol   7.5 0.899 3.488 Ethanol   7.5 1.227 0.754
Indolene  7.5 0.701 1.990 Indolene  7.5 0.807 5.199
Indolene  7.5 0.902 5.283 Indolene  7.5 0.997 3.752
Indolene  7.5 1.224 0.537 Indolene  7.5 1.089 1.640
Ethanol   9.0 1.180 0.797 Ethanol   7.5 0.795 2.064
Ethanol  18.0 0.990 3.732 Ethanol  18.0 1.201 0.586
Methanol  7.5 0.975 2.941 Methanol  7.5 1.089 1.467
Methanol  7.5 1.150 0.934 Methanol  7.5 1.212 0.722
Methanol  7.5 0.859 2.397 Methanol  7.5 0.751 1.461
Methanol  7.5 0.720 1.235 Methanol  7.5 1.090 1.347
Methanol  7.5 0.616 0.344 Gasohol   7.5 0.712 2.209
Gasohol   7.5 0.771 4.497 Gasohol   7.5 0.959 4.958
Gasohol   7.5 1.042 2.723 Gasohol   7.5 1.125 1.244
Gasohol   7.5 1.097 1.562 Gasohol   7.5 0.984 4.468
Gasohol   7.5 0.928 5.307 Gasohol   7.5 0.889 5.425
Gasohol   7.5 0.827 5.330 Gasohol   7.5 0.674 1.448
Gasohol   7.5 1.031 3.164 Methanol  7.5 0.871 3.113
Methanol  7.5 1.026 2.551 Methanol  7.5 0.598 0.204
Indolene  7.5 0.973 5.055 Indolene  7.5 0.980 4.937
Indolene  7.5 0.665 1.561 Ethanol   7.5 0.629 0.561
Ethanol   9.0 0.608 0.563 Ethanol  12.0 0.584 0.678
Ethanol  15.0 0.562 0.370 Ethanol  18.0 0.535 0.530
94%Eth    7.5 0.674 0.900 Gasohol   7.5 0.645 1.207
Ethanol  18.0 0.655 1.900 94%Eth    7.5 1.022 2.787
94%Eth    7.5 0.790 2.645 94%Eth    7.5 0.720 1.475
94%Eth    7.5 1.075 2.147
;

*---Fit the Nonparametric Model---;
proc transreg data=Gas dummy test nomiss;
model spline(NOx / nknots=9)=spline(EqRatio / nknots=9)
opscore(CpRatio) class(Fuel / zero=first);
title2 'Iteratively Estimate NOx, CPRATIO and EQRATIO';
output out=Results;
run;

*---Plot the Results---;
goptions goutmode=replace nodisplay;
%let opts = haxis=axis2 vaxis=axis1 frame cframe=ligr;
* Depending on your goptions, these plot options may work better:
* %let opts = haxis=axis2 vaxis=axis1 frame;

proc gplot data=Results;
title;
axis1 minor=none label=(angle=90 rotate=0);
axis2 minor=none;
symbol1 color=blue v=dot i=none;
plot TCpRatio*CpRatio / &opts name='tregex1';
plot TEqRatio*EqRatio / &opts name='tregex2';
plot TNOx*NOx         / &opts name='tregex3';
run; quit;

goptions display;
proc greplay nofs tc=sashelp.templt template=l2r2;
igout gseg;
treplay 1:tregex1 2:tregex3 3:tregex2;
run; quit;


   *-Fit the Parametric Model Suggested by the Nonparametric Analysis-;
proc transreg data=Gas dummy ss2 short nomiss;
title 'Gasoline Example';
title2 'Now fit log(NOx) = b0 + b1*EqRatio + b2*EqRatio**2 +';
title3 'b3*CpRatio + Sum b(j)*Fuel(j) + Error';
model log(NOx)= pspline(EqRatio / deg=2) identity(CpRatio)
class(Fuel / zero=first);
output out=Results2;
run;


Output 65.4.1: Transformation Regression Example: The Nonparametric Model

 Gasoline Example Iteratively Estimate NOx, CPRATIO and EQRATIO

 The TRANSREG Procedure

 TRANSREG MORALS Algorithm Iteration History for Spline(NOx) IterationNumber AverageChange MaximumChange R-Square CriterionChange Note 0 0.48074 3.86778 0.24597 1 0.00000 0.00000 0.95865 0.71267 Converged

 Algorithm converged.

 The TRANSREG Procedure Hypothesis Tests for Spline(NOx)Nitrogen Oxide (NOx)

 Univariate ANOVA Table Based on the Usual Degrees of Freedom Source DF Sum of Squares Mean Square F Value Liberal p Model 21 326.0946 15.52831 162.27 >= <.0001 Error 147 14.0674 0.09570 Corrected Total 168 340.1619 The above statistics are not adjusted for the fact that thedependent variable was transformed and so are generally liberal.

 Root MSE 0.30935 R-Square 0.9586 Dependent Mean 2.34593 Adj R-Sq 0.9527 Coeff Var 13.1866

 Adjusted Multivariate ANOVA Table Based on the Usual Degrees ofFreedom Dependent Variable Scoring Parameters=12 S=12 M=4 N=67 Statistic Value F Value Num DF Den DF p Wilks' Lambda 0.041355 2.05 252 1455 <= <.0001 Pillai's Trace 0.958645 0.61 252 1764 <= 1.0000 Hotelling-Lawley Trace 23.18089 12.35 252 945.01 <= <.0001 Roy's Greatest Root 23.18089 162.27 21 147 >= <.0001

 The Wilks' Lambda, Pillai's Trace, and Hotelling-Lawley Trace statistics are a conservative adjustment of the normal statistics. Roy's Greatest Root is liberal. These statistics are normally defined in terms of the squared canonical correlations which are the eigenvalues of the matrix H*inv(H+E). Here the R-Square is used for the first eigenvalue and all other eigenvalues are set to zero since only one linear combination is used. Degrees of freedom are computed assuming all linear combinations contribute to the Lambda and Trace statistics, so the F tests for those statistics are conservative. The p values for the liberal and conservative statistics provide approximate lower and upper bounds on p. A liberal test statistic with conservative degrees of freedom and a conservative test statistic with liberal degrees of freedom yield at best an approximate p value, which is indicated by a "~" before the p value.

Output 65.4.2: Transformation Regression Example: The Parametric Model

 Gasoline Example Now fit log(NOx) = b0 + b1*EqRatio + b2*EqRatio**2 + b3*CpRatio + Sum b(j)*Fuel(j) + Error

 The TRANSREG Procedure

 Log(NOx) Algorithm converged.

 The TRANSREG Procedure Hypothesis Tests for Log(NOx)Nitrogen Oxide (NOx)

 Univariate ANOVA Table Based on the Usual Degrees of Freedom Source DF Sum of Squares Mean Square F Value Pr > F Model 8 79.33838 9.917298 213.09 <.0001 Error 160 7.44659 0.046541 Corrected Total 168 86.78498

 Root MSE 0.21573 R-Square 0.9142 Dependent Mean 0.6313 Adj R-Sq 0.9099 Coeff Var 34.1729

 Univariate Regression Table Based on the Usual Degrees of Freedom Variable DF Coefficient Type IISum ofSquares Mean Square F Value Pr > F Label Intercept 1 -14.586532 49.9469 49.9469 1073.18 <.0001 Intercept Pspline.EqRatio_1 1 35.102914 62.7478 62.7478 1348.22 <.0001 Equivalence Ratio (PHI) 1 Pspline.EqRatio_2 1 -19.386468 64.6430 64.6430 1388.94 <.0001 Equivalence Ratio (PHI) 2 Identity(CpRatio) 1 0.032058 1.4445 1.4445 31.04 <.0001 Compression Ratio (CR) Class.Fuel94_Eth 1 -0.449583 1.3158 1.3158 28.27 <.0001 Fuel 94%Eth Class.FuelEthanol 1 -0.414242 1.2560 1.2560 26.99 <.0001 Fuel Ethanol Class.FuelGasohol 1 -0.016719 0.0015 0.0015 0.03 0.8584 Fuel Gasohol Class.FuelIndolene 1 0.001572 0.0000 0.0000 0.00 0.9853 Fuel Indolene Class.FuelMethanol 1 -0.580133 1.7219 1.7219 37.00 <.0001 Fuel Methanol

Output 65.4.3: Plots of Compression Ratio, Equivalence Ratio, and Nitrogen Oxide

 Chapter Contents Previous Next Top