The PRINQUAL Procedure

# Getting Started

In the following example, PROC PRINQUAL uses the MTV method. Suppose that the problem is to linearize a curve through three-dimensional space. Let where X = -1.00, -0.98, -0.96, ... , 1.00.

These three variables define a curve in three-dimensional space. The GPLOT procedure is used to display two-dimensional views of this curve. These data are completely described by three linear components, but they define a single curve, which could be described as a single nonlinear component.

PROC PRINQUAL is used to attempt to straighten the curve into a one-dimensional line with a continuous transformation of each variable. The N=1 option in the PROC PRINQUAL statement requests one principal component. The TRANSFORM statement requests a cubic spline transformation with nine knots. Splines are curves, which are usually required to be continuous and smooth. Splines are usually defined as piecewise polynomials of degree n with function values and first n-1 derivatives that agree at the points where they join. The abscissa values of the join points are called knots. The term "spline" is also used for polynomials (splines with no knots) and piecewise polynomials with more than one discontinuous derivative. Splines with no knots are generally smoother than splines with knots, which are generally smoother than splines with multiple discontinuous derivatives. Splines with few knots are generally smoother than splines with many knots; however, increasing the number of knots usually increases the fit of the spline function to the data. Knots give the curve freedom to bend to more closely follow the data. Refer to Smith (1979) for an excellent introduction to splines. For another example of using splines, see Example 65.1 in Chapter 65, "The TRANSREG Procedure."

One component accounts for 71 percent of the variance of the untransformed data, and after 50 iterations, over 98 percent of the variance of the transformed data is accounted for by one component (see Figure 53.2). The algorithm did not converge with 50 iterations, so more iterations may be needed for this problem.

PROC PRINQUAL creates an output data set (which is not displayed) that contains both the original and transformed variables. The original variables have the names X1, X2, and X3. Transformed variables are named TX1, TX2, and TX3. All observations in the output data set have _TYPE_='SCORE', since the CORRELATIONS option is not specified in the PROC PRINQUAL statement. The GPLOT procedure uses this output data set and displays the nonlinear transformations of all three variables and the nearly one-dimensional scatter plot (see Figure 53.3 and Figure 53.4).

PROC PRINQUAL tries to project each variable on the first principal component. Notice that the curve in this example is closer to a circle than to a function from some views (see the plot of X3 vs. X2 in Figure 53.1) and that the first component does not run approximately from one end point of the curve to the other (see Figure 53.4). Since the curve has these characteristics, PROC PRINQUAL linearizes the scatter plot by collapsing the scatter around the principal axis, not by straightening the curve into a single line. PROC PRINQUAL would straighten simpler curves.

The following statements produce Figure 53.1 through Figure 53.4:

   * Generate a Three-Dimensional Curve;
data X;
do X = -1 to 1 by 0.02;
X1 =      X ** 3;
X2 = X1 - X ** 5;
X3 = X2 - X ** 6;
output;
end;
drop X;
run;

goptions goutmode=replace nodisplay;
%let opts = haxis=axis2 vaxis=axis1 frame cframe=ligr;
* Depending on your goptions, these plot options may work better:
* %let opts = haxis=axis2 vaxis=axis1 frame;

proc gplot data=X;
title;
axis1 minor=none label=(angle=90 rotate=0)
order=(-1 to 1);
axis2 minor=none order=(-1 to 1);
plot X1*X2 / &opts name='prqin1';
plot X3*X2 / &opts name='prqin2' vreverse;
plot X1*X3 / &opts name='prqin3';
symbol1 color=blue;
run; quit;

goptions display;
proc greplay nofs tc=sashelp.templt template=l2r2;
igout gseg;
treplay 1:prqin1 2:prqin2 3:prqin3;
run; quit;


   * Try to Straighten the Curve;
proc prinqual data=X n=1 maxiter=50 covariance;
title 'Iteratively Derive Variable Transformations';
transform spline(X1-X3 / nknots=9);
run;

* Plot the Transformations;
goptions nodisplay;
proc gplot;
title;
axis1 minor=none label=(angle=90 rotate=0);
axis2 minor=none;
plot TX1*X1 / &opts name='prqin4';
plot TX2*X2 / &opts name='prqin5';
plot TX3*X3 / &opts name='prqin6';
symbol1 color=blue;
run; quit;

goptions display;
proc greplay nofs tc=sashelp.templt template=l2r2;
igout gseg;
treplay 1:prqin4 2:prqin6 3:prqin5;
run; quit;

* Plot the Straightened Scatter Plot;
goptions nodisplay;
proc gplot;
axis1 minor=none label=(angle=90 rotate=0)
order=(-1 to 1);
axis2 minor=none order=(-1 to 1);
plot TX1*TX2 / &opts name='prqin7';
plot TX3*TX2 / &opts name='prqin8' vreverse;
plot TX1*TX3 / &opts name='prqin9';
symbol1 color=blue;
run; quit;

goptions display;
proc greplay nofs tc=sashelp.templt template=l2r2;
igout gseg;
treplay 1:prqin7 2:prqin8 3:prqin9;
run; quit; Figure 53.1: Three-Dimensional Curve Example Output

 Iteratively Derive Variable Transformations

 The PRINQUAL Procedure

 PRINQUAL MTV Algorithm Iteration History IterationNumber AverageChange MaximumChange Proportionof Variance CriterionChange Note 1 0.16253 1.33045 0.71369 2 0.07871 0.94549 0.79035 0.07667 3 0.06518 0.80219 0.86334 0.07299 4 0.05322 0.57928 0.91379 0.05045 5 0.04154 0.38404 0.94204 0.02825 6 0.03181 0.24391 0.95640 0.01436 7 0.02461 0.15397 0.96349 0.00709 8 0.01982 0.10205 0.96704 0.00355 9 0.01662 0.07393 0.96894 0.00189 10 0.01439 0.06232 0.97005 0.00112 11 0.01288 0.05436 0.97081 0.00075 12 0.01189 0.04911 0.97139 0.00058 13 0.01119 0.04531 0.97188 0.00049 14 0.01068 0.04276 0.97232 0.00044 15 0.01027 0.04115 0.97273 0.00041 16 0.00993 0.04039 0.97313 0.00040 17 0.00965 0.04249 0.97351 0.00038 18 0.00940 0.04400 0.97388 0.00037 19 0.00919 0.04509 0.97423 0.00036 20 0.00900 0.04587 0.97458 0.00034 21 0.00883 0.04643 0.97491 0.00033 22 0.00867 0.04681 0.97523 0.00032 23 0.00852 0.04705 0.97555 0.00031 24 0.00839 0.04719 0.97585 0.00031 25 0.00827 0.04724 0.97615 0.00030 26 0.00816 0.04722 0.97644 0.00029 27 0.00805 0.04713 0.97672 0.00028 28 0.00795 0.04699 0.97700 0.00027 29 0.00785 0.04680 0.97726 0.00027 30 0.00776 0.04656 0.97752 0.00026 31 0.00768 0.04629 0.97777 0.00025 32 0.00760 0.04598 0.97802 0.00025 33 0.00752 0.04564 0.97826 0.00024 34 0.00745 0.04528 0.97849 0.00023 35 0.00739 0.04489 0.97872 0.00023 36 0.00733 0.04448 0.97894 0.00022 37 0.00729 0.04405 0.97915 0.00022 38 0.00724 0.04361 0.97936 0.00021 39 0.00720 0.04315 0.97957 0.00021 40 0.00716 0.04268 0.97977 0.00020 41 0.00713 0.04219 0.97997 0.00020 42 0.00709 0.04170 0.98016 0.00019 43 0.00706 0.04120 0.98035 0.00019 44 0.00703 0.04070 0.98054 0.00019 45 0.00699 0.04019 0.98072 0.00018 46 0.00696 0.03967 0.98090 0.00018 47 0.00693 0.03916 0.98107 0.00017 48 0.00690 0.03864 0.98124 0.00017 49 0.00687 0.03812 0.98141 0.00017 50 0.00684 0.03760 0.98158 0.00017 Not Converged

 ERROR: Failed to converge.

Figure 53.2: PROC PRINQUAL MTV Iteration History Figure 53.3: Variable Transformation Plots Figure 53.4: Plots of the Nearly One-Dimensional Curve