Chapter Contents Previous Next
 The PRINQUAL Procedure

## Example 53.2: Principal Components of Basketball Rankings

The data in this example are 1985 -1986 preseason rankings of 35 college basketball teams by 10 different news services. The services do not all rank the same teams or the same number of teams, so there are missing values in these data. Each of the 35 teams in the data set is ranked by at least one news service. One way of summarizing these data is with a principal component analysis, since the rankings should all be related to a single underlying variable, the first principal component.

You can use PROC PRINQUAL to estimate the missing ranks and compute scores for all observations. You can formulate a PROC PRINQUAL analysis that assumes that the observed ranks are ordinal variables and replaces the ranks with new numbers that are monotonic with the ranks and better fit the one principal component model. The missing rank estimates need to be constrained since a news service would have positioned the unranked teams below the teams it ranked. PROC PRINQUAL should impose order constraints within the nonmissing values and between the missing and nonmissing values, but not within the missing values. PROC PRINQUAL has sophisticated missing data handling facilities; however, these facilities cannot directly handle this problem. The solution requires reformulating the problem.

By performing some preliminary data manipulations, specifying the N=1 option in the PROC PRINQUAL statement, and specifying the UNTIE transformation in the TRANSFORM statement, you can make the missing value estimates conform to the requirements. The PROC MEANS step finds the largest rank for each variable. The next DATA step replaces missing values with a value that is one larger than the largest observed rank. The N=1 option (in the PRINQUAL procedure) specifies that the variables should be transformed to make them as one-dimensional as possible. The UNTIE transformation in the TRANSFORM statement monotonically transforms the ranks, untying any ties in an optimal way. Because the only ties are for the values that replace the missing values, and because these values are larger than the observed values, the rescoring of the data satisfies the preceding requirements.

The following statements create the data set and perform the transformations discussed previously. These statements produce Output 53.2.1.

```   * Example 2: Basketball Data
*
* Preseason 1985 College Basketball Rankings
* (rankings of 35 teams by 10 news services)
*
* Note: (a) Various news services rank varying numbers of teams.
*       (b) Not all 35 teams are ranked by all news services.
*       (c) Each team is ranked by at least one service.
*       (d) Rank 20 is missing for UPI.;

title1 '1985 Preseason College Basketball Rankings';

data bballm;
input School \$13. CSN DurhamSun DurhamHerald WashingtonPost
USA_Today SportMagazine InsideSports UPI AP
SportsIllustrated;
label CSN               = 'Community Sports News
(Chapel Hill, NC)'
DurhamSun         = 'Durham Sun'
DurhamHerald      = 'Durham Morning Herald'
WashingtonPost    = 'Washington Post'
USA_Today         = 'USA Today'
SportMagazine     = 'Sport Magazine'
InsideSports      = 'Inside Sports'
UPI               = 'United Press International'
AP                = 'Associated Press'
SportsIllustrated = 'Sports Illustrated'
;
format CSN--SportsIllustrated 5.1;
datalines;
Louisville     1  8  1  9  8  9  6 10  9  9
Georgia Tech   2  2  4  3  1  1  1  2  1  1
Kansas         3  4  5  1  5 11  8  4  5  7
Michigan       4  5  9  4  2  5  3  1  3  2
Duke           5  6  7  5  4 10  4  5  6  5
UNC            6  1  2  2  3  4  2  3  2  3
Syracuse       7 10  6 11  6  6  5  6  4 10
Notre Dame     8 14 15 13 11 20 18 13 12  .
Kentucky       9 15 16 14 14 19 11 12 11 13
LSU           10  9 13  . 13 15 16  9 14  8
DePaul        11  . 21 15 20  . 19  .  . 19
Georgetown    12  7  8  6  9  2  9  8  8  4
Navy          13 20 23 10 18 13 15  . 20  .
Illinois      14  3  3  7  7  3 10  7  7  6
Iowa          15 16  .  . 23  .  . 14  . 20
Arkansas      16  .  .  . 25  .  .  .  . 16
Memphis State 17  . 11  . 16  8 20  . 15 12
Washington    18  .  .  .  .  .  . 17  .  .
UAB           19 13 10  . 12 17  . 16 16 15
UNLV          20 18 18 19 22  . 14 18 18  .
NC State      21 17 14 16 15  . 12 15 17 18
Maryland      22  .  .  . 19  .  .  . 19 14
Pittsburgh    23  .  .  .  .  .  .  .  .  .
Oklahoma      24 19 17 17 17 12 17  . 13 17
Indiana       25 12 20 18 21  .  .  .  .  .
Virginia      26  . 22  .  . 18  .  .  .  .
Old Dominion  27  .  .  .  .  .  .  .  .  .
Auburn        28 11 12  8 10  7  7 11 10 11
St. Johns     29  .  .  .  . 14  .  .  .  .
UCLA          30  .  .  .  .  .  . 19  .  .
St. Joseph's   .  . 19  .  .  .  .  .  .  .
Tennessee      .  . 24  .  . 16  .  .  .  .

Montana        .  .  . 20  .  .  .  .  .  .
Houston        .  .  .  . 24  .  .  .  .  .
Virginia Tech  .  .  .  .  .  . 13  .  .  .
;

* Find maximum rank for each news service and replace
* each missing value with the next highest rank.;

proc means data=bballm noprint;
output out=maxrank
max=mcsn mdurs mdurh mwas musa mspom mins mupi map mspoi;
run;

data bball;
set bballm;
if _n_=1 then set maxrank;
array services[10] CSN--SportsIllustrated;
array maxranks[10] mcsn--mspoi;
keep School CSN--SportsIllustrated;
do i=1 to 10;
if services[i]=. then services[i]=maxranks[i]+1;
end;
run;

* Assume that the ranks are ordinal and that unranked teams
* would have been ranked lower than ranked teams.  Monotonically
* transform all ranked teams while estimating the unranked teams.
* Enforce the constraint that the missing ranks are estimated to
* be less than the observed ranks.  Order the unranked teams
* optimally within this constraint.  Do this so as to maximize
* the variance accounted for by one linear combination.  This
* makes the data as nearly rank one as possible, given the
* constraints.
*
* NOTE: The UNTIE transformation should be used with caution.
* If frequently produces degenerate results.;

proc prinqual data=bball out=tbball scores n=1 tstandard=z;
title2 'Optimal Monotonic Transformation of Ranked Teams';
title3 'with Constrained Estimation of Unranked Teams';
transform untie(CSN -- SportsIllustrated);
id School;
run;
```

Output 53.2.1: Transformation of Basketball Team Rankings

 1985 Preseason College Basketball Rankings Optimal Monotonic Transformation of Ranked Teams with Constrained Estimation of Unranked Teams

 The PRINQUAL Procedure

 PRINQUAL MTV Algorithm Iteration History IterationNumber AverageChange MaximumChange Proportionof Variance CriterionChange Note 1 0.18563 0.76531 0.85850 2 0.03225 0.14627 0.94362 0.08512 3 0.02126 0.10530 0.94669 0.00307 4 0.01467 0.07526 0.94801 0.00132 5 0.01067 0.05282 0.94865 0.00064 6 0.00800 0.03669 0.94899 0.00034 7 0.00617 0.02862 0.94919 0.00020 8 0.00486 0.02636 0.94932 0.00013 9 0.00395 0.02453 0.94941 0.00009 10 0.00327 0.02300 0.94947 0.00006 11 0.00275 0.02166 0.94952 0.00005 12 0.00236 0.02041 0.94956 0.00004 13 0.00205 0.01927 0.94959 0.00003 14 0.00181 0.01818 0.94962 0.00003 15 0.00162 0.01719 0.94964 0.00002 16 0.00147 0.01629 0.94966 0.00002 17 0.00136 0.01546 0.94968 0.00002 18 0.00128 0.01469 0.94970 0.00002 19 0.00121 0.01398 0.94971 0.00001 20 0.00115 0.01332 0.94973 0.00001 21 0.00111 0.01271 0.94974 0.00001 22 0.00105 0.01213 0.94975 0.00001 23 0.00099 0.01155 0.94976 0.00001 24 0.00095 0.01095 0.94977 0.00001 25 0.00091 0.01038 0.94978 0.00001 26 0.00088 0.00986 0.94978 0.00001 27 0.00084 0.00936 0.94979 0.00001 28 0.00081 0.00889 0.94980 0.00001 29 0.00077 0.00846 0.94980 0.00000 30 0.00073 0.00805 0.94980 0.00000 Not Converged

 WARNING: Failed to converge, however criterion change is less than 0.0001.

An alternative approach is to use the pairwise deletion option of the CORR procedure to compute the correlation matrix and then use PROC PRINCOMP or PROC FACTOR to perform the principal component analysis. This approach has several disadvantages. The correlation matrix may not be positive semidefinite (psd), an assumption required for principal component analysis. PROC PRINQUAL always produces a psd correlation matrix. Even with pairwise deletion, PROC CORR removes the six observations with only a single nonmissing value from this data set. Finally, it is still not possible to calculate scores on the principal components for those teams that have missing values.

It is possible to compute the composite ranking using PROC PRINCOMP and some preliminary data manipulations, similar to those discussed previously. Chapter 52, "The PRINCOMP Procedure," contains an example where the average of the unused ranks in each poll is substituted for the missing values, and each observation is weighted by the number of nonmissing values. This method has much to recommend it. It is much faster and simpler than using PROC PRINQUAL. It is also much less prone to degeneracies and capitalization on chance. However, PROC PRINCOMP does not allow the nonmissing ranks to be monotonically transformed and the missing values untied to optimize fit.

PROC PRINQUAL monotonically transforms the observed ranks and estimates the missing ranks (within the constraints given previously) to account for almost 95 percent of the variance of the transformed data by just one dimension. PROC FACTOR is then used to report details of the principal component analysis of the transformed data. As shown by the Factor Pattern values in Output 53.2.2, nine of the ten news services have a correlation of 0.95 or larger with the scores on the first principal component after the data are optimally transformed. The scores are sorted and the composite ranking is displayed following the PROC FACTOR output. More confidence can be placed in the stability of the scores for the teams that are ranked by the majority of the news services than in scores for teams that are seldom ranked.

The monotonic transformations are plotted for each of the ten news services. These plots are the values of the raw ranks (with the missing ranks replaced by the maximum rank plus one) versus the rescored (transformed) ranks. The transformations are the step functions that maximize the fit of the data to the principal component model. Smoother transformations could be found by using MSPLINE transformations, but MSPLINE transformations would not correctly handle the missing data problem.

The following statements perform the final analysis and produce Output 53.2.2 through Output 53.2.3:

```   * Perform the Final Principal Component Analysis;
proc factor nfactors=1;
var TCSN -- TSportsIllustrated;
title4 'Principal Component Analysis';
run;

proc sort;
by Prin1;
run;

* Display Scores on the First Principal Component;
proc print;
title4 'Teams Ordered by Scores on First Principal Component';
var School Prin1;
run;

* Plot the Transformations;
goptions goutmode=replace nodisplay;
%let opts = haxis=axis2 vaxis=axis1 frame cframe=ligr;
* Depending on your goptions, these plot options may work better:
* %let opts = haxis=axis2 vaxis=axis1 frame;

proc gplot;
title;
axis1 minor=none label=(angle=90 rotate=0)
order=(-3 to 2 by 1);
axis2 minor=none order=(0 to 40 by 10);
plot TCSN*CSN                             / &opts name='prqex1';
plot TDurhamSun*DurhamSun                 / &opts name='prqex2';
plot TDurhamHerald*DurhamHerald           / &opts name='prqex3';
plot TWashingtonPost*WashingtonPost       / &opts name='prqex4';
plot TUSA_Today*USA_Today                 / &opts name='prqex5';
plot TSportMagazine*SportMagazine         / &opts name='prqex6';
plot TInsideSports*InsideSports           / &opts name='prqex7';
plot TUPI*UPI                             / &opts name='prqex8';
plot TAP*AP                               / &opts name='prqex9';
plot TSportsIllustrated*SportsIllustrated / &opts name='prqex10';
symbol1 c=blue;
run; quit;

goptions display;
proc greplay nofs tc=sashelp.templt template=l2r2;
igout gseg;
treplay 1:prqex1 2:prqex2 3:prqex3 4:prqex4;
treplay 1:prqex5 2:prqex6 3:prqex7 4:prqex8;
treplay 1:prqex9 3:prqex10;
run; quit;
```

Output 53.2.2: Alternative Approach for Analyzing Basketball Rankings

 1985 Preseason College Basketball Rankings Optimal Monotonic Transformation of Ranked Teams with Constrained Estimation of Unranked Teams Principal Component Analysis

 The FACTOR Procedure Initial Factor Method: Principal Components

 Prior Communality Estimates: ONE

 Eigenvalues of the Correlation Matrix: Total= 10 Average = 1 Eigenvalue Difference Proportion Cumulative 1 9.49808040 9.27698055 0.9498 0.9498 2 0.22109985 0.13434105 0.0221 0.9719 3 0.08675881 0.01266762 0.0087 0.9806 4 0.07409119 0.03048596 0.0074 0.9880 5 0.04360523 0.00567364 0.0044 0.9924 6 0.03793160 0.02098385 0.0038 0.9962 7 0.01694775 0.00299099 0.0017 0.9979 8 0.01395675 0.00982630 0.0014 0.9992 9 0.00413045 0.00073249 0.0004 0.9997 10 0.00339797 0.0003 1.0000

 1 factor will be retained by the NFACTOR criterion.

 Factor Pattern Factor1 TCSN CSN Transformation 0.91136 TDurhamSun DurhamSun Transformation 0.98887 TDurhamHerald DurhamHerald Transformation 0.97402 TWashingtonPost WashingtonPost Transformation 0.97408 TUSA_Today USA_Today Transformation 0.98867 TSportMagazine SportMagazine Transformation 0.95331 TInsideSports InsideSports Transformation 0.98521 TUPI UPI Transformation 0.98534 TAP AP Transformation 0.99590 TSportsIllustrated SportsIllustrated Transformation 0.98615

 Variance Explainedby Each Factor Factor1 9.4980804

 Final Communality Estimates: Total = 9.498080 TCSN TDurhamSun TDurhamHerald TWashingtonPost TUSA_Today TSportMagazine TInsideSports TUPI TAP TSportsIllustrated 0.83057866 0.97785439 0.94870875 0.94882907 0.97747798 0.90879058 0.97064640 0.97088804 0.99181626 0.97249026

 1985 Preseason College Basketball Rankings Optimal Monotonic Transformation of Ranked Teams with Constrained Estimation of Unranked Teams Teams Ordered by Scores on First Principal Component

 Obs School Prin1 1 Georgia Tech -6.20315 2 UNC -5.93314 3 Michigan -5.71034 4 Kansas -4.78699 5 Duke -4.75896 6 Illinois -4.19220 7 Georgetown -4.02861 8 Louisville -3.73087 9 Syracuse -3.47497 10 Auburn -1.78429 11 LSU -0.35928 12 Memphis State 0.46737 13 Kentucky 0.63661 14 Notre Dame 0.71919 15 Navy 0.76187 16 UAB 0.98316 17 DePaul 1.09891 18 Oklahoma 1.12012 19 NC State 1.15144 20 UNLV 1.28766 21 Iowa 1.45260 22 Indiana 1.48123 23 Maryland 1.54935 24 Virginia 2.01385 25 Arkansas 2.02718 26 Washington 2.10878 27 Tennessee 2.27770 28 Virginia Tech 2.36103 29 St. Johns 2.37387 30 Montana 2.43502 31 UCLA 2.52481 32 Pittsburgh 3.00907 33 Old Dominion 3.03324 34 St. Joseph's 3.39259 35 Houston 4.69614

Output 53.2.3: Monotonic Transformation for Each News Service

The ordinary PROC PRINQUAL missing data handling facilities do not work for these data because they do not constrain the missing data estimates properly. If you code the missing ranks as missing and specify linear transformations, then you can compute least-squares estimates of the missing values without transforming the observed values. The first principal component then accounts for 92 percent of the variance after 20 iterations. However, Virginia Tech is ranked number 11 by its score even though it appeared in only one poll (InsideSports ranked it number 13, anchoring it firmly in the middle). Specifying monotone transformations is also inappropriate since they too allow unranked teams to move in between ranked teams.

With these data, the combination of monotone transformations and the freedom to score the missing ranks without constraint leads to degenerate transformations. PROC PRINQUAL tries to merge the 35 points into two points, producing a perfect fit in one dimension. There is evidence for this after 20 iterations when the Average Change, Maximum Change, and Variance Change values are all increasing, instead of the more stable decreasing change rate seen in the analysis shown. The change rates all stop increasing after 41 iterations, and it is clear by 70 or 80 iterations that one component will account for 100 percent of the transformed variables variance after sufficient iteration. While this may seem desirable (after all, it is a perfect fit), you should, in fact, be on guard when this happens. Whenever convergence is slow, the rates of change increase, or the final data perfectly fit the model, the solution is probably degenerating due to too few constraints on the scorings.

PROC PRINQUAL can account for 100 percent of the variance by scoring Montana and UCLA with one positive value on all variables and scoring all the other teams with one negative value on all variables. This inappropriate analysis suggests that all ranked teams are equally good except for two teams that are less good. Both of these two teams are ranked by only one news service, and their only nonmissing rank is last in the poll. This accounts for the degeneracy.

 Chapter Contents Previous Next Top