Chapter Contents Previous Next
 The FACTOR Procedure

## Example 26.1: Principal Component Analysis

The following example analyzes socioeconomic data provided by Harman (1976). The five variables represent total population, median school years, total employment, miscellaneous professional services, and median house value. Each observation represents one of twelve census tracts in the Los Angeles Standard Metropolitan Statistical Area.

The first analysis is a principal component analysis. Simple descriptive statistics and correlations are also displayed. This example produces Output 26.1.1:

```   data SocioEconomics;
title 'Five Socioeconomic Variables';
title2 'See Page 14 of Harman: Modern Factor Analysis, 3rd Ed';
input Population School Employment Services HouseValue;
datalines;
5700     12.8      2500      270       25000
1000     10.9      600       10        10000
3400     8.8       1000      10        9000
3800     13.6      1700      140       25000
4000     12.8      1600      140       25000
8200     8.3       2600      60        12000
1200     11.4      400       10        16000
9100     11.5      3300      60        14000
9900     12.5      3400      180       18000
9600     13.7      3600      390       25000
9600     9.6       3300      80        12000
9400     11.4      4000      100       13000
;
proc factor data=SocioEconomics simple corr;
title3 'Principal Component Analysis';
run;
```

There are two large eigenvalues, 2.8733 and 1.7967, which together account for 93.4% of the standardized variance. Thus, the first two principal components provide an adequate summary of the data for most purposes. Three components, explaining 97.7% of the variation, should be sufficient for almost any application. PROC FACTOR retains two components on the basis of the eigenvalues-greater-than-one rule since the third eigenvalue is only 0.2148.

The first component has large positive loadings for all five variables. The correlation with Services (0.93239) is especially high. The second component is a contrast of Population (0.80642) and Employment (0.72605) against School (-0.54476) and HouseValue (-0.55818), with a very small loading on Services (-0.10431).

The final communality estimates show that all the variables are well accounted for by two components, with final communality estimates ranging from 0.880236 for Services to 0.987826 for Population.

Output 26.1.1: Principal Component Analysis

 Five Socioeconomic Variables See Page 14 of Harman: Modern Factor Analysis, 3rd Ed Principal Component Analysis

 The FACTOR Procedure

 Means and Standard Deviations from12 Observations Variable Mean Std Dev Population 6241.667 3439.9943 School 11.442 1.7865 Employment 2333.333 1241.2115 Services 120.833 114.9275 HouseValue 17000.000 6367.5313

 Correlations Population School Employment Services HouseValue Population 1.00000 0.00975 0.97245 0.43887 0.02241 School 0.00975 1.00000 0.15428 0.69141 0.86307 Employment 0.97245 0.15428 1.00000 0.51472 0.12193 Services 0.43887 0.69141 0.51472 1.00000 0.77765 HouseValue 0.02241 0.86307 0.12193 0.77765 1.00000

 Principal Component Analysis

 The FACTOR Procedure Initial Factor Method: Principal Components

 Eigenvalues of the Correlation Matrix: Total= 5 Average = 1 Eigenvalue Difference Proportion Cumulative 1 2.87331359 1.07665350 0.5747 0.5747 2 1.79666009 1.58182321 0.3593 0.9340 3 0.21483689 0.11490283 0.0430 0.9770 4 0.09993405 0.08467868 0.0200 0.9969 5 0.01525537 0.0031 1.0000

 Factor Pattern Factor1 Factor2 Population 0.58096 0.80642 School 0.76704 -0.54476 Employment 0.67243 0.72605 Services 0.93239 -0.10431 HouseValue 0.79116 -0.55818

 Variance Explained by EachFactor Factor1 Factor2 2.8733136 1.7966601

 Final Communality Estimates: Total = 4.669974 Population School Employment Services HouseValue 0.98782629 0.88510555 0.97930583 0.88023562 0.93750041

 Chapter Contents Previous Next Top