## Example 51.2: Examining Outliers

This example is a continuation of Example 51.1.

A PLS model effectively models both the predictors and the responses.
In order to check for outliers, you should, therefore, look at the
Euclidean distance from each point to the PLS model in both the
standardized predictors and the standardized responses. No point
should be dramatically farther from the model than the rest. If there
is a group of points that are all farther from the model than the
rest, they may have something in common, in which case they should be
analyzed separately. The following statements compute and plot these
distances to the reduced model, dropping variables
L1, L2, P2, P4, S5, L5, and P5:

proc pls data=ptrain nfac=2 noprint;
model log_RAI = S1 P1
S2
S3 L3 P3
S4 L4 ;
output out=stdres stdxsse=stdxsse
stdysse=stdysse;
data stdres; set stdres;
xdist = sqrt(stdxsse);
ydist = sqrt(stdysse);
run;
symbol1 i=needles v=dot c=blue;
proc gplot data=stdres;
plot xdist*n=1 / cframe=ligr;
proc gplot data=stdres;
plot ydist*n=1 / cframe=ligr;
run;

The plots are shown in Output 51.2.1 and Output 51.2.2.

**Output 51.2.1:** Distances from the X-variables to the Model (Training Set)

**Output 51.2.2:** Distances from the Y-variables to the Model (Training Set)

There appear to be no profound outliers in either the predictor space or
the response space.

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.