Chapter Contents Previous Next
 The PRINCOMP Procedure

## Example 52.1: Crime Rates

The following data provide crime rates per 100,000 people in seven categories for each of the fifty states in 1977. Since there are seven numeric variables, it is impossible to plot all the variables simultaneously. Principal components can be used to summarize the data in two or three dimensions, and they help to visualize the data. The following statements produce Output 52.1.1:

```   data Crime;
title 'Crime Rates per 100,000 Population by State';
input State \$1-15 Murder Rape Robbery Assault
Burglary Larceny Auto_Theft;
datalines;
Alabama        14.2 25.2  96.8 278.3 1135.5 1881.9 280.7
Alaska         10.8 51.6  96.8 284.0 1331.7 3369.8 753.3
Arizona         9.5 34.2 138.2 312.3 2346.1 4467.4 439.5
Arkansas        8.8 27.6  83.2 203.4  972.6 1862.1 183.4
California     11.5 49.4 287.0 358.0 2139.4 3499.8 663.5
Colorado        6.3 42.0 170.7 292.9 1935.2 3903.2 477.1
Connecticut     4.2 16.8 129.5 131.8 1346.0 2620.7 593.2
Delaware        6.0 24.9 157.0 194.2 1682.6 3678.4 467.0
Florida        10.2 39.6 187.9 449.1 1859.9 3840.5 351.4
Georgia        11.7 31.1 140.5 256.5 1351.1 2170.2 297.9
Hawaii          7.2 25.5 128.0  64.1 1911.5 3920.4 489.4
Idaho           5.5 19.4  39.6 172.5 1050.8 2599.6 237.6
Illinois        9.9 21.8 211.3 209.0 1085.0 2828.5 528.6
Indiana         7.4 26.5 123.2 153.5 1086.2 2498.7 377.4
Iowa            2.3 10.6  41.2  89.8  812.5 2685.1 219.9
Kansas          6.6 22.0 100.7 180.5 1270.4 2739.3 244.3
Kentucky       10.1 19.1  81.1 123.3  872.2 1662.1 245.4
Louisiana      15.5 30.9 142.9 335.5 1165.5 2469.9 337.7
Maine           2.4 13.5  38.7 170.0 1253.1 2350.7 246.9
Maryland        8.0 34.8 292.1 358.9 1400.0 3177.7 428.5
Massachusetts   3.1 20.8 169.1 231.6 1532.2 2311.3 1140.1
Michigan        9.3 38.9 261.9 274.6 1522.7 3159.0 545.5
Minnesota       2.7 19.5  85.9  85.8 1134.7 2559.3 343.1
Mississippi    14.3 19.6  65.7 189.1  915.6 1239.9 144.4
Missouri        9.6 28.3 189.0 233.5 1318.3 2424.2 378.4
Montana         5.4 16.7  39.2 156.8  804.9 2773.2 309.2
Nebraska        3.9 18.1  64.7 112.7  760.0 2316.1 249.1
Nevada         15.8 49.1 323.1 355.0 2453.1 4212.6 559.2
New Hampshire   3.2 10.7  23.2  76.0 1041.7 2343.9 293.4
New Jersey      5.6 21.0 180.4 185.1 1435.8 2774.5 511.5
New Mexico      8.8 39.1 109.6 343.4 1418.7 3008.6 259.5
New York       10.7 29.4 472.6 319.1 1728.0 2782.0 745.8
North Carolina 10.6 17.0  61.3 318.3 1154.1 2037.8 192.1
North Dakota    0.9  9.0  13.3  43.8  446.1 1843.0 144.7
Ohio            7.8 27.3 190.5 181.1 1216.0 2696.8 400.4
Oklahoma        8.6 29.2  73.8 205.0 1288.2 2228.1 326.8
Oregon          4.9 39.9 124.1 286.9 1636.4 3506.1 388.9
Pennsylvania    5.6 19.0 130.3 128.0  877.5 1624.1 333.2
Rhode Island    3.6 10.5  86.5 201.0 1489.5 2844.1 791.4
South Carolina 11.9 33.0 105.9 485.3 1613.6 2342.4 245.1
South Dakota    2.0 13.5  17.9 155.7  570.5 1704.4 147.5
Tennessee      10.1 29.7 145.8 203.9 1259.7 1776.5 314.0
Texas          13.3 33.8 152.4 208.2 1603.1 2988.7 397.6
Utah            3.5 20.3  68.8 147.3 1171.6 3004.6 334.5
Vermont         1.4 15.9  30.8 101.2 1348.2 2201.0 265.2
Virginia        9.0 23.3  92.1 165.7  986.2 2521.2 226.7
Washington      4.3 39.6 106.2 224.8 1605.6 3386.9 360.3
West Virginia   6.0 13.2  42.2  90.9  597.4 1341.7 163.3
Wisconsin       2.8 12.9  52.2  63.7  846.9 2614.2 220.7
Wyoming         5.4 21.9  39.7 173.9  811.6 2772.2 282.0
;

proc princomp out=Crime_Components;
run;
```

Output 52.1.1: Results of Principal Component Analysis: PROC PRINCOMP

 Crime Rates per 100,000 Population by State

 The PRINCOMP Procedure

 Observations 50 Variables 7

 Simple Statistics Murder Rape Robbery Assault Burglary Larceny Auto_Theft Mean 7.444000000 25.73400000 124.0920000 211.3000000 1291.904000 2671.288000 377.5260000 StD 3.866768941 10.75962995 88.3485672 100.2530492 432.455711 725.908707 193.3944175

 Correlation Matrix Murder Rape Robbery Assault Burglary Larceny Auto_Theft Murder 1.0000 0.6012 0.4837 0.6486 0.3858 0.1019 0.0688 Rape 0.6012 1.0000 0.5919 0.7403 0.7121 0.6140 0.3489 Robbery 0.4837 0.5919 1.0000 0.5571 0.6372 0.4467 0.5907 Assault 0.6486 0.7403 0.5571 1.0000 0.6229 0.4044 0.2758 Burglary 0.3858 0.7121 0.6372 0.6229 1.0000 0.7921 0.5580 Larceny 0.1019 0.6140 0.4467 0.4044 0.7921 1.0000 0.4442 Auto_Theft 0.0688 0.3489 0.5907 0.2758 0.5580 0.4442 1.0000

 Eigenvalues of the Correlation Matrix Eigenvalue Difference Proportion Cumulative 1 4.11495951 2.87623768 0.5879 0.5879 2 1.23872183 0.51290521 0.1770 0.7648 3 0.72581663 0.40938458 0.1037 0.8685 4 0.31643205 0.05845759 0.0452 0.9137 5 0.25797446 0.03593499 0.0369 0.9506 6 0.22203947 0.09798342 0.0317 0.9823 7 0.12405606 0.0177 1.0000

 Eigenvectors Prin1 Prin2 Prin3 Prin4 Prin5 Prin6 Prin7 Murder 0.300279 -.629174 0.178245 -.232114 0.538123 0.259117 0.267593 Rape 0.431759 -.169435 -.244198 0.062216 0.188471 -.773271 -.296485 Robbery 0.396875 0.042247 0.495861 -.557989 -.519977 -.114385 -.003903 Assault 0.396652 -.343528 -.069510 0.629804 -.506651 0.172363 0.191745 Burglary 0.440157 0.203341 -.209895 -.057555 0.101033 0.535987 -.648117 Larceny 0.357360 0.402319 -.539231 -.234890 0.030099 0.039406 0.601690 Auto_Theft 0.295177 0.502421 0.568384 0.419238 0.369753 -.057298 0.147046

The eigenvalues indicate that two or three components provide a good summary of the data, two components accounting for 76 percent of the total variance and three components explaining 87 percent. Subsequent components contribute less than 5 percent each.

The first component is a measure of overall crime rate since the first eigenvector shows approximately equal loadings on all variables. The second eigenvector has high positive loadings on the variables Auto_Theft and Larceny and high negative loadings on the variables Murder and Assault. There is also a small positive loading on Burglary and a small negative loading on Rape. This component seems to measure the preponderance of property crime over violent crime. The interpretation of the third component is not obvious.

A simple way to examine the principal components in more detail is to display the output data set sorted by each of the large components. The following statements produce Output 52.1.2 through Output 52.1.3:

```   proc sort;
by Prin1;
run;

proc print;
id State;
var Prin1 Prin2 Murder Rape Robbery
Assault Burglary Larceny Auto_Theft;
title2 'States Listed in Order of Overall Crime Rate';
title3 'As Determined by the First Principal Component';
run;

proc sort;
by Prin2;
run;

proc print;
id State;
var Prin1 Prin2 Murder Rape Robbery
Assault Burglary Larceny Auto_Theft;
title2 'States Listed in Order of Property Vs.
Violent Crime';
title3 'As Determined by the Second Principal Component';
run;
```

Output 52.1.2: OUT= Data Set Sorted by First Principal Component

 Crime Rates per 100,000 Population by State States Listed in Order of Overall Crime Rate As Determined by the First Principal Component

 State Prin1 Prin2 Murder Rape Robbery Assault Burglary Larceny Auto_Theft North Dakota -3.96408 0.38767 0.9 9.0 13.3 43.8 446.1 1843.0 144.7 South Dakota -3.17203 -0.25446 2.0 13.5 17.9 155.7 570.5 1704.4 147.5 West Virginia -3.14772 -0.81425 6.0 13.2 42.2 90.9 597.4 1341.7 163.3 Iowa -2.58156 0.82475 2.3 10.6 41.2 89.8 812.5 2685.1 219.9 Wisconsin -2.50296 0.78083 2.8 12.9 52.2 63.7 846.9 2614.2 220.7 New Hampshire -2.46562 0.82503 3.2 10.7 23.2 76.0 1041.7 2343.9 293.4 Nebraska -2.15071 0.22574 3.9 18.1 64.7 112.7 760.0 2316.1 249.1 Vermont -2.06433 0.94497 1.4 15.9 30.8 101.2 1348.2 2201.0 265.2 Maine -1.82631 0.57878 2.4 13.5 38.7 170.0 1253.1 2350.7 246.9 Kentucky -1.72691 -1.14663 10.1 19.1 81.1 123.3 872.2 1662.1 245.4 Pennsylvania -1.72007 -0.19590 5.6 19.0 130.3 128.0 877.5 1624.1 333.2 Montana -1.66801 0.27099 5.4 16.7 39.2 156.8 804.9 2773.2 309.2 Minnesota -1.55434 1.05644 2.7 19.5 85.9 85.8 1134.7 2559.3 343.1 Mississippi -1.50736 -2.54671 14.3 19.6 65.7 189.1 915.6 1239.9 144.4 Idaho -1.43245 -0.00801 5.5 19.4 39.6 172.5 1050.8 2599.6 237.6 Wyoming -1.42463 0.06268 5.4 21.9 39.7 173.9 811.6 2772.2 282.0 Arkansas -1.05441 -1.34544 8.8 27.6 83.2 203.4 972.6 1862.1 183.4 Utah -1.04996 0.93656 3.5 20.3 68.8 147.3 1171.6 3004.6 334.5 Virginia -0.91621 -0.69265 9.0 23.3 92.1 165.7 986.2 2521.2 226.7 North Carolina -0.69925 -1.67027 10.6 17.0 61.3 318.3 1154.1 2037.8 192.1 Kansas -0.63407 -0.02804 6.6 22.0 100.7 180.5 1270.4 2739.3 244.3 Connecticut -0.54133 1.50123 4.2 16.8 129.5 131.8 1346.0 2620.7 593.2 Indiana -0.49990 0.00003 7.4 26.5 123.2 153.5 1086.2 2498.7 377.4 Oklahoma -0.32136 -0.62429 8.6 29.2 73.8 205.0 1288.2 2228.1 326.8 Rhode Island -0.20156 2.14658 3.6 10.5 86.5 201.0 1489.5 2844.1 791.4 Tennessee -0.13660 -1.13498 10.1 29.7 145.8 203.9 1259.7 1776.5 314.0 Alabama -0.04988 -2.09610 14.2 25.2 96.8 278.3 1135.5 1881.9 280.7 New Jersey 0.21787 0.96421 5.6 21.0 180.4 185.1 1435.8 2774.5 511.5 Ohio 0.23953 0.09053 7.8 27.3 190.5 181.1 1216.0 2696.8 400.4 Georgia 0.49041 -1.38079 11.7 31.1 140.5 256.5 1351.1 2170.2 297.9 Illinois 0.51290 0.09423 9.9 21.8 211.3 209.0 1085.0 2828.5 528.6 Missouri 0.55637 -0.55851 9.6 28.3 189.0 233.5 1318.3 2424.2 378.4 Hawaii 0.82313 1.82392 7.2 25.5 128.0 64.1 1911.5 3920.4 489.4 Washington 0.93058 0.73776 4.3 39.6 106.2 224.8 1605.6 3386.9 360.3 Delaware 0.96458 1.29674 6.0 24.9 157.0 194.2 1682.6 3678.4 467.0 Massachusetts 0.97844 2.63105 3.1 20.8 169.1 231.6 1532.2 2311.3 1140.1 Louisiana 1.12020 -2.08327 15.5 30.9 142.9 335.5 1165.5 2469.9 337.7 New Mexico 1.21417 -0.95076 8.8 39.1 109.6 343.4 1418.7 3008.6 259.5 Texas 1.39696 -0.68131 13.3 33.8 152.4 208.2 1603.1 2988.7 397.6 Oregon 1.44900 0.58603 4.9 39.9 124.1 286.9 1636.4 3506.1 388.9 South Carolina 1.60336 -2.16211 11.9 33.0 105.9 485.3 1613.6 2342.4 245.1 Maryland 2.18280 -0.19474 8.0 34.8 292.1 358.9 1400.0 3177.7 428.5 Michigan 2.27333 0.15487 9.3 38.9 261.9 274.6 1522.7 3159.0 545.5 Alaska 2.42151 0.16652 10.8 51.6 96.8 284.0 1331.7 3369.8 753.3 Colorado 2.50929 0.91660 6.3 42.0 170.7 292.9 1935.2 3903.2 477.1 Arizona 3.01414 0.84495 9.5 34.2 138.2 312.3 2346.1 4467.4 439.5 Florida 3.11175 -0.60392 10.2 39.6 187.9 449.1 1859.9 3840.5 351.4 New York 3.45248 0.43289 10.7 29.4 472.6 319.1 1728.0 2782.0 745.8 California 4.28380 0.14319 11.5 49.4 287.0 358.0 2139.4 3499.8 663.5 Nevada 5.26699 -0.25262 15.8 49.1 323.1 355.0 2453.1 4212.6 559.2

Output 52.1.3: OUT= Data Set Sorted by Second Principal Component

 Crime Rates per 100,000 Population by State States Listed in Order of Property Vs. Violent Crime As Determined by the Second Principal Component

 State Prin1 Prin2 Murder Rape Robbery Assault Burglary Larceny Auto_Theft Mississippi -1.50736 -2.54671 14.3 19.6 65.7 189.1 915.6 1239.9 144.4 South Carolina 1.60336 -2.16211 11.9 33.0 105.9 485.3 1613.6 2342.4 245.1 Alabama -0.04988 -2.09610 14.2 25.2 96.8 278.3 1135.5 1881.9 280.7 Louisiana 1.12020 -2.08327 15.5 30.9 142.9 335.5 1165.5 2469.9 337.7 North Carolina -0.69925 -1.67027 10.6 17.0 61.3 318.3 1154.1 2037.8 192.1 Georgia 0.49041 -1.38079 11.7 31.1 140.5 256.5 1351.1 2170.2 297.9 Arkansas -1.05441 -1.34544 8.8 27.6 83.2 203.4 972.6 1862.1 183.4 Kentucky -1.72691 -1.14663 10.1 19.1 81.1 123.3 872.2 1662.1 245.4 Tennessee -0.13660 -1.13498 10.1 29.7 145.8 203.9 1259.7 1776.5 314.0 New Mexico 1.21417 -0.95076 8.8 39.1 109.6 343.4 1418.7 3008.6 259.5 West Virginia -3.14772 -0.81425 6.0 13.2 42.2 90.9 597.4 1341.7 163.3 Virginia -0.91621 -0.69265 9.0 23.3 92.1 165.7 986.2 2521.2 226.7 Texas 1.39696 -0.68131 13.3 33.8 152.4 208.2 1603.1 2988.7 397.6 Oklahoma -0.32136 -0.62429 8.6 29.2 73.8 205.0 1288.2 2228.1 326.8 Florida 3.11175 -0.60392 10.2 39.6 187.9 449.1 1859.9 3840.5 351.4 Missouri 0.55637 -0.55851 9.6 28.3 189.0 233.5 1318.3 2424.2 378.4 South Dakota -3.17203 -0.25446 2.0 13.5 17.9 155.7 570.5 1704.4 147.5 Nevada 5.26699 -0.25262 15.8 49.1 323.1 355.0 2453.1 4212.6 559.2 Pennsylvania -1.72007 -0.19590 5.6 19.0 130.3 128.0 877.5 1624.1 333.2 Maryland 2.18280 -0.19474 8.0 34.8 292.1 358.9 1400.0 3177.7 428.5 Kansas -0.63407 -0.02804 6.6 22.0 100.7 180.5 1270.4 2739.3 244.3 Idaho -1.43245 -0.00801 5.5 19.4 39.6 172.5 1050.8 2599.6 237.6 Indiana -0.49990 0.00003 7.4 26.5 123.2 153.5 1086.2 2498.7 377.4 Wyoming -1.42463 0.06268 5.4 21.9 39.7 173.9 811.6 2772.2 282.0 Ohio 0.23953 0.09053 7.8 27.3 190.5 181.1 1216.0 2696.8 400.4 Illinois 0.51290 0.09423 9.9 21.8 211.3 209.0 1085.0 2828.5 528.6 California 4.28380 0.14319 11.5 49.4 287.0 358.0 2139.4 3499.8 663.5 Michigan 2.27333 0.15487 9.3 38.9 261.9 274.6 1522.7 3159.0 545.5 Alaska 2.42151 0.16652 10.8 51.6 96.8 284.0 1331.7 3369.8 753.3 Nebraska -2.15071 0.22574 3.9 18.1 64.7 112.7 760.0 2316.1 249.1 Montana -1.66801 0.27099 5.4 16.7 39.2 156.8 804.9 2773.2 309.2 North Dakota -3.96408 0.38767 0.9 9.0 13.3 43.8 446.1 1843.0 144.7 New York 3.45248 0.43289 10.7 29.4 472.6 319.1 1728.0 2782.0 745.8 Maine -1.82631 0.57878 2.4 13.5 38.7 170.0 1253.1 2350.7 246.9 Oregon 1.44900 0.58603 4.9 39.9 124.1 286.9 1636.4 3506.1 388.9 Washington 0.93058 0.73776 4.3 39.6 106.2 224.8 1605.6 3386.9 360.3 Wisconsin -2.50296 0.78083 2.8 12.9 52.2 63.7 846.9 2614.2 220.7 Iowa -2.58156 0.82475 2.3 10.6 41.2 89.8 812.5 2685.1 219.9 New Hampshire -2.46562 0.82503 3.2 10.7 23.2 76.0 1041.7 2343.9 293.4 Arizona 3.01414 0.84495 9.5 34.2 138.2 312.3 2346.1 4467.4 439.5 Colorado 2.50929 0.91660 6.3 42.0 170.7 292.9 1935.2 3903.2 477.1 Utah -1.04996 0.93656 3.5 20.3 68.8 147.3 1171.6 3004.6 334.5 Vermont -2.06433 0.94497 1.4 15.9 30.8 101.2 1348.2 2201.0 265.2 New Jersey 0.21787 0.96421 5.6 21.0 180.4 185.1 1435.8 2774.5 511.5 Minnesota -1.55434 1.05644 2.7 19.5 85.9 85.8 1134.7 2559.3 343.1 Delaware 0.96458 1.29674 6.0 24.9 157.0 194.2 1682.6 3678.4 467.0 Connecticut -0.54133 1.50123 4.2 16.8 129.5 131.8 1346.0 2620.7 593.2 Hawaii 0.82313 1.82392 7.2 25.5 128.0 64.1 1911.5 3920.4 489.4 Rhode Island -0.20156 2.14658 3.6 10.5 86.5 201.0 1489.5 2844.1 791.4 Massachusetts 0.97844 2.63105 3.1 20.8 169.1 231.6 1532.2 2311.3 1140.1

Another recommended procedure is to make scatter plots of the first few components. The sorted listings help to identify observations on the plots. The following statements produce Output 52.1.4 through Output 52.1.5:

```   title2 'Plot of the First Two Principal Components';
%plotit(data=Crime_Components, labelvar=State,
plotvars=Prin2 Prin1, color=black, colors=blue);
run;

title2 'Plot of the First and Third Principal Components';
%plotit(data=Crime_Components, labelvar=State,
plotvars=Prin3 Prin1, color=black, colors=blue);
run;
```

Output 52.1.4: Plot of the First Two Principal Components

Output 52.1.5: Plot of the First and Third Principal Components

It is possible to identify regional trends on the plot of the first two components. Nevada and California are at the extreme right, with high overall crime rates but an average ratio of property crime to violent crime. North and South Dakota are on the extreme left with low overall crime rates. Southeastern states tend to be in the bottom of the plot, with a higher-than-average ratio of violent crime to property crime. New England states tend to be in the upper part of the plot, with a greater-than-average ratio of property crime to violent crime.

The most striking feature of the plot of the first and third principal components is that Massachusetts and New York are outliers on the third component.

 Chapter Contents Previous Next Top