Chapter Contents Previous Next
 The MODECLUS Procedure

## Example 42.2: Cluster Analysis of Flying Mileages between Ten American Cities

This example uses distance data and illustrates the use of the TRANSPOSE procedure and the DATA step to fill in the upper triangle of the distance matrix. The results are displayed in Output 42.2.1 through Output 42.2.2.

The following statements produce Output 42.2.1:

```   title 'Modeclus Analysis of 10 American Cities';
title2 'Based on Flying Mileages';
options ls=90;

data mileages(type=distance);
input (ATLANTA CHICAGO DENVER HOUSTON LOSANGELES
MIAMI NEWYORK SANFRAN SEATTLE WASHDC) (5.)
@53 CITY \$15.;
datalines;
0                                                ATLANTA
587    0                                           CHICAGO
1212  920    0                                      DENVER
701  940  879    0                                 HOUSTON
1936 1745  831 1374    0                            LOS ANGELES
604 1188 1726  968 2339    0                       MIAMI
748  713 1631 1420 2451 1092    0                  NEW YORK
2139 1858  949 1645  347 2594 2571    0             SAN FRANCISCO
2182 1737 1021 1891  959 2734 2408  678    0        SEATTLE
543  597 1494 1220 2300  923  205 2442 2329    0   WASHINGTON D.C.
;

*-----Fill in Upper Triangle of Distance Matrix---------------;
proc transpose out=tran;
copy CITY;
data mileages(type=distance);
merge mileages tran;
array var ATLANTA--WASHDC;
array col col1-col10;
drop col1-col10 _name_;
do over var;
var=sum(var,col);
end;

*-----Clustering with K-Nearest-Neighbor Density Estimates-----;
proc modeclus data=mileages all m=1 k=3;
id CITY;
run;
```

Output 42.2.1: Clustering with K-Nearest-Neighbor Density Estimates

 Modeclus Analysis of 10 American Cities Based on Flying Mileages

 The MODECLUS Procedure

 Nearest Neighbor List CITY Neighbor Distance ATLANTA WASHINGTON D.C. 543.0000000 CHICAGO 587.0000000 CHICAGO ATLANTA 587.0000000 WASHINGTON D.C. 597.0000000 DENVER LOS ANGELES 831.0000000 HOUSTON 879.0000000 HOUSTON ATLANTA 701.0000000 DENVER 879.0000000 LOS ANGELES SAN FRANCISCO 347.0000000 DENVER 831.0000000 MIAMI ATLANTA 604.0000000 WASHINGTON D.C. 923.0000000 NEW YORK WASHINGTON D.C. 205.0000000 CHICAGO 713.0000000 SAN FRANCISCO LOS ANGELES 347.0000000 SEATTLE 678.0000000 SEATTLE SAN FRANCISCO 678.0000000 LOS ANGELES 959.0000000 WASHINGTON D.C. NEW YORK 205.0000000 ATLANTA 543.0000000

 Modeclus Analysis of 10 American Cities Based on Flying Mileages

 The MODECLUS Procedure K=3 METHOD=1

 Sums of Density Estimates Within Neighborhood Cluster CITY EstimatedDensity SameCluster OtherClusters Total ClusterProportionSame/Total 1 ATLANTA 0.00025554 0.0005275 0 0.0005275 1.000 CHICAGO 0.00025126 0.00053178 0 0.00053178 1.000 HOUSTON 0.00017065 0.00025554 0.00017065 0.00042619 0.600 MIAMI 0.00016251 0.00053178 0 0.00053178 1.000 NEW YORK 0.00021038 0.0005275 0 0.0005275 1.000 WASHINGTON D.C. 0.00027624 0.00046592 0 0.00046592 1.000 2 DENVER 0.00017065 0.00018051 0.00017065 0.00035115 0.514 LOS ANGELES 0.00018051 0.00039189 0 0.00039189 1.000 SAN FRANCISCO 0.00022124 0.00033692 0 0.00033692 1.000 SEATTLE 0.00015641 0.00040174 0 0.00040174 1.000

 Boundary Objects -Cluster Proportions- CITY Density Cluster 1 2 DENVER 0.0001706485 2 0.486 0.514 HOUSTON 0.0001706485 1 0.600 0.400

 Cluster Statistics Cluster Frequency MaximumEstimatedDensity BoundaryFrequency EstimatedSaddleDensity 1 6 0.00027624 1 0.00017065 2 4 0.00022124 1 0.00017065

 Modeclus Analysis of 10 American Cities Based on Flying Mileages

 The MODECLUS Procedure

 Cluster Summary K Number ofClusters Frequency ofUnclassifiedObjects 3 2 0

The following statements produce Output 42.2.2:

```   *------Clustering with Uniform Kernel Density Estimates--------;
proc modeclus data=mileages all m=1 r=600 800;
id CITY;
run;
```

Output 42.2.2: Clustering with Uniform Kernel Density Estimates

 Modeclus Analysis of 10 American Cities Based on Flying Mileages

 The MODECLUS Procedure

 Nearest Neighbor List CITY Neighbor Distance ATLANTA WASHINGTON D.C. 543.0000000 CHICAGO 587.0000000 MIAMI 604.0000000 HOUSTON 701.0000000 NEW YORK 748.0000000 CHICAGO ATLANTA 587.0000000 WASHINGTON D.C. 597.0000000 NEW YORK 713.0000000 HOUSTON ATLANTA 701.0000000 LOS ANGELES SAN FRANCISCO 347.0000000 MIAMI ATLANTA 604.0000000 NEW YORK WASHINGTON D.C. 205.0000000 CHICAGO 713.0000000 ATLANTA 748.0000000 SAN FRANCISCO LOS ANGELES 347.0000000 SEATTLE 678.0000000 SEATTLE SAN FRANCISCO 678.0000000 WASHINGTON D.C. NEW YORK 205.0000000 ATLANTA 543.0000000 CHICAGO 597.0000000

 Modeclus Analysis of 10 American Cities Based on Flying Mileages

 The MODECLUS Procedure R=600 METHOD=1

 Sums of Density Estimates Within Neighborhood Cluster CITY EstimatedDensity SameCluster OtherClusters Total ClusterProportionSame/Total 1 ATLANTA 0.00025 0.00058333 0 0.00058333 1.000 CHICAGO 0.00025 0.00058333 0 0.00058333 1.000 NEW YORK 0.00016667 0.00033333 0 0.00033333 1.000 WASHINGTON D.C. 0.00033333 0.00066667 0 0.00066667 1.000 2 LOS ANGELES 0.00016667 0.00016667 0 0.00016667 1.000 SAN FRANCISCO 0.00016667 0.00016667 0 0.00016667 1.000 3 DENVER 0.00008333 0 0 0 . 4 HOUSTON 0.00008333 0 0 0 . 5 MIAMI 0.00008333 0 0 0 . 6 SEATTLE 0.00008333 0 0 0 .

 No Boundary Objects

 Cluster Statistics Cluster Frequency MaximumEstimatedDensity BoundaryFrequency EstimatedSaddleDensity 1 4 0.00033333 0 . 2 2 0.00016667 0 . 3 1 0.00008333 0 . 4 1 0.00008333 0 . 5 1 0.00008333 0 . 6 1 0.00008333 0 .

 Modeclus Analysis of 10 American Cities Based on Flying Mileages

 The MODECLUS Procedure R=800 METHOD=1

 Sums of Density Estimates Within Neighborhood Cluster CITY EstimatedDensity SameCluster OtherClusters Total ClusterProportionSame/Total 1 ATLANTA 0.000375 0.001 0 0.001 1.000 CHICAGO 0.00025 0.000875 0 0.000875 1.000 HOUSTON 0.000125 0.000375 0 0.000375 1.000 MIAMI 0.000125 0.000375 0 0.000375 1.000 NEW YORK 0.00025 0.000875 0 0.000875 1.000 WASHINGTON D.C. 0.00025 0.000875 0 0.000875 1.000 2 LOS ANGELES 0.000125 0.0001875 0 0.0001875 1.000 SAN FRANCISCO 0.0001875 0.00025 0 0.00025 1.000 SEATTLE 0.000125 0.0001875 0 0.0001875 1.000 3 DENVER 0.0000625 0 0 0 .

 No Boundary Objects

 Cluster Statistics Cluster Frequency MaximumEstimatedDensity BoundaryFrequency EstimatedSaddleDensity 1 6 0.000375 0 . 2 3 0.0001875 0 . 3 1 0.0000625 0 .

 Modeclus Analysis of 10 American Cities Based on Flying Mileages

 The MODECLUS Procedure

 Cluster Summary R Number ofClusters Frequency ofUnclassifiedObjects 600 6 0 800 3 0

The following statements produce Output 42.2.3:

```   *------Uniform Kernel Density Estimates, Clustering
Neighborhoods extended to nearest neighbor--------------;
proc modeclus data=mileages list m=1 ck=2 r=600 800;
id CITY;
run;
```

Output 42.2.3: Uniform Kernel Density Estimates, Clustering Neighborhoods Extended to Nearest Neighbor

 Modeclus Analysis of 10 American Cities Based on Flying Mileages

 The MODECLUS Procedure CK=2 R=600 METHOD=1

 Sums of Density Estimates Within Neighborhood Cluster CITY EstimatedDensity SameCluster OtherClusters Total ClusterProportionSame/Total 1 ATLANTA 0.00025 0.00058333 0 0.00058333 1.000 CHICAGO 0.00025 0.00058333 0 0.00058333 1.000 HOUSTON 0.00008333 0.00025 0 0.00025 1.000 MIAMI 0.00008333 0.00025 0 0.00025 1.000 NEW YORK 0.00016667 0.00033333 0 0.00033333 1.000 WASHINGTON D.C. 0.00033333 0.00066667 0 0.00066667 1.000 2 DENVER 0.00008333 0.00016667 0 0.00016667 1.000 LOS ANGELES 0.00016667 0.00016667 0 0.00016667 1.000 SAN FRANCISCO 0.00016667 0.00016667 0 0.00016667 1.000 SEATTLE 0.00008333 0.00016667 0 0.00016667 1.000

 Cluster Statistics Cluster Frequency MaximumEstimatedDensity BoundaryFrequency EstimatedSaddleDensity 1 6 0.00033333 0 . 2 4 0.00016667 0 .

 Modeclus Analysis of 10 American Cities Based on Flying Mileages

 The MODECLUS Procedure CK=2 R=800 METHOD=1

 Sums of Density Estimates Within Neighborhood Cluster CITY EstimatedDensity SameCluster OtherClusters Total ClusterProportionSame/Total 1 ATLANTA 0.000375 0.001 0 0.001 1.000 CHICAGO 0.00025 0.000875 0 0.000875 1.000 HOUSTON 0.000125 0.000375 0 0.000375 1.000 MIAMI 0.000125 0.000375 0 0.000375 1.000 NEW YORK 0.00025 0.000875 0 0.000875 1.000 WASHINGTON D.C. 0.00025 0.000875 0 0.000875 1.000 2 DENVER 0.0000625 0.000125 0 0.000125 1.000 LOS ANGELES 0.000125 0.0001875 0 0.0001875 1.000 SAN FRANCISCO 0.0001875 0.00025 0 0.00025 1.000 SEATTLE 0.000125 0.0001875 0 0.0001875 1.000

 Cluster Statistics Cluster Frequency MaximumEstimatedDensity BoundaryFrequency EstimatedSaddleDensity 1 6 0.000375 0 . 2 4 0.0001875 0 .

 Modeclus Analysis of 10 American Cities Based on Flying Mileages

 The MODECLUS Procedure

 Cluster Summary R CK Number ofClusters Frequency ofUnclassifiedObjects 600 2 2 0 800 2 2 0

 Chapter Contents Previous Next Top