Chapter Contents Previous Next
 The CALIS Procedure

## Assessment of Fit

This section contains a collection of formulas used in computing indices to assess the goodness of fit by PROC CALIS. The following notation is used:

• N for the sample size
• n for the number of manifest variables
• t for the number of parameters to estimate
• df for the degrees of freedom
• for the t vector of optimal parameter estimates
• S = (sij) for the n ×n input COV, CORR, UCOV, or UCORR matrix
• for the predicted model matrix
• W for the weight matrix (W = I for ULS, W = S for default GLS, and W = C for ML estimates)
• U for the n2 ×n2 asymptotic covariance matrix of sample covariances
• for the cumulative distribution function of the noncentral chi-squared distribution with noncentrality parameter
The following notation is for indices that allow testing nested models by a difference test:
• f0 for the function value of the independence model
• df0 for the degrees of freedom of the independence model
• fmin = F for the function value of the fitted model
• dfmin = df for the degrees of freedom of the fitted model

The degrees of freedom dfmin and the number of parameters t are adjusted automatically when there are active constraints in the analysis. The computation of many fit statistics and indices are affected. You can turn off the automatic adjustment using the NOADJDF option. See the section "Counting the Degrees of Freedom" for more information.

### Residuals

PROC CALIS computes four types of residuals and writes them to the OUTSTAT= data set:
• Raw Residuals
Res = S- C, Resij = sij - cij
The raw residuals are displayed whenever the PALL, the PRINT, or the RESIDUAL option is specified.
• Variance Standardized Residuals
The variance standardized residuals are displayed when you specify
• the PALL, the PRINT, or the RESIDUAL option and METHOD=NONE, METHOD=ULS, or METHOD=DWLS
• RESIDUAL=VARSTAND
The variance standardized residuals are equal to those computed by the EQS 3 program (Bentler 1989).
• Asymptotically Standardized Residuals
The matrix J is the n2 ×t Jacobian matrix , and is the t ×t asymptotic covariance matrix of parameter estimates (the inverse of the information matrix). Asymptotically standardized residuals are displayed when one of the following conditions is met:
• The PALL, the PRINT, or the RESIDUAL option is specified, and METHOD=ML, METHOD=GLS, or METHOD=WLS, and the expensive information and Jacobian matrices are computed for some other reason.
• RESIDUAL= ASYSTAND is specified.
The asymptotically standardized residuals are equal to those computed by the LISREL 7 program (Jreskog and Srbom 1988) except for the denominator NM in the definition of matrix U.
• Normalized Residuals
where the diagonal elements uij,ij of the n2 ×n2 asymptotic covariance matrix U of sample covariances are defined for the following methods.
• GLS as
• ML as
• WLS as uij,ij = wij,ij
Normalized residuals are displayed when one of the following conditions is met:
• The PALL, the PRINT, or the RESIDUAL option is specified, and METHOD=ML, METHOD=GLS, or METHOD=WLS, and the expensive information and Jacobian matrices are not computed for some other reason.
• RESIDUAL=NORM is specified.
The normalized residuals are equal to those computed by the LISREL VI program (Jreskog and Srbom 1985) except for the definition of the denominator NM in matrix U.

For estimation methods that are not BGLS estimation methods (Browne 1982, 1984), such as METHOD=NONE, METHOD=ULS, or METHOD=DWLS, the assumption of an asymptotic covariance matrix U of sample covariances does not seem to be appropriate. In this case, the normalized residuals should be replaced by the more relaxed variance standardized residuals. Computation of asymptotically standardized residuals requires computing the Jacobian and information matrices. This is computationally very expensive and is done only if the Jacobian matrix has to be computed for some other reason, that is, if at least one of the following items is true:

• The default, PRINT, or PALL displayed output is requested, and neither the NOMOD nor NOSTDERR option is specified.
• Either the MODIFICATION (included in PALL), PCOVES, or STDERR (included in default, PRINT, and PALL output) option is requested or RESIDUAL=ASYSTAND is specified.
• The LEVMAR or NEWRAP optimization technique is used.
• An OUTRAM= data set is specified without using the NOSTDERR option.
• An OUTEST= data set is specified without using the NOSTDERR option.

Since normalized residuals use an overestimate of the asymptotic covariance matrix of residuals (the diagonal of U), the normalized residuals cannot be larger than the asymptotically standardized residuals (which use the diagonal of ).

Together with the residual matrices, the values of the average residual, the average off-diagonal residual, and the rank order of the largest values are displayed. The distribution of the normalized and standardized residuals is displayed also.

### Goodness-of-Fit Indices Based on Residuals

The following items are computed for all five kinds of estimation:ULS, GLS, ML, WLS, and DWLS. All these indices are written to the OUTRAM= data set. The goodness of fit (GFI), adjusted goodness of fit (AGFI), and root mean square residual (RMR) are computed as in the LISREL VI program of Jreskog and Srbom (1985).

• Goodness-of-Fit Index
The goodness-of-fit index for the ULS, GLS, and ML estimation methods is
but for WLS and DWLS estimation, it is
where W = diag for DWLS estimation, and Vec(sij - cij) denotes the vector of the n(n+1)/2 elements of the lower triangle of the symmetric matrix S- C. For a constant weight matrix W, the goodness-of-fit index is 1 minus the ratio of the minimum function value and the function value before any model has been fitted. The GFI should be between 0 and 1. The data probably do not fit the model if the GFI is negative or much larger than 1.
The AGFI is the GFI adjusted for the degrees of freedom of the model
The AGFI corresponds to the GFI in replacing the total sum of squares by the mean sum of squares.

Caution:

• Large n and small df can result in a negative AGFI. For example, GFI=0.90, n=19, and df=2 result in an AGFI of -8.5.
• AGFI is not defined for a saturated model, due to division by df=0.
• AGFI is not sensitive to losses in df.
The AGFI should be between 0 and 1. The data probably do not fit the model if the AGFI is negative or much larger than 1. For more information, refer to Mulaik et al. (1989).
• Root Mean Square Residual
The RMR is the mean of the squared residuals:
• Parsimonious Goodness-of-Fit Index
The PGFI (Mulaik et al. 1989) is a modification of the GFI that takes the parsimony of the model into account:
The PGFI uses the same parsimonious factor as the parsimonious normed Bentler-Bonett index (James, Mulaik, and Brett 1982).

### Goodness-of-Fit Indices Based on the

The following items are transformations of the overall value and in general depend on the sample size N. These indices are not computed for ULS or DWLS estimates.

• Uncorrected
The overall measure is the optimum function value F multiplied by N - 1 if a CORR or COV matrix is analyzed, or multiplied by N if a UCORR or UCOV matrix is analyzed. This gives the likelihood ratio test statistic for the null hypothesis that the predicted matrix C has the specified model structure against the alternative that C is unconstrained. The test is valid only if the observations are independent and identically distributed, the analysis is based on the nonstandardized sample covariance matrix S, and the sample size N is sufficiently large (Browne 1982; Bollen 1989b; Jreskog and Srbom 1985). For ML and GLS estimates, the variables must also have an approximately multivariate normal distribution. The notation Prob>Chi**2 means "the probability under the null hypothesis of obtaining a greater statistic than that observed."
where F is the function value at the minimum.
• Value of the Independence Model
The value of the independence model
and the corresponding degrees of freedom df0 can be used (in large samples) to evaluate the gain of explanation by fitting the specific model (Bentler 1989).
• RMSEA Index (Steiger and Lind 1980)
The Steiger and Lind (1980) root mean squared error approximation (RMSEA) coefficient is
The lower and upper limits of the confidence interval are computed using the cumulative distribution function of the noncentral chi-squared distribution , with x=NM*F, satisfying ,and satisfying :
Refer to Browne and Du Toit (1992) for more details. The size of the confidence interval is defined by the option ALPHARMS=, .The default is , which corresponds to the 90% confidence interval for the RMSEA.
• Probability for Test of Close Fit (Browne and Cudeck 1993)
The traditional exact test hypothesis is replaced by the null hypothesis of close fit and the exceedance probability P is computed as
where x=NM*F and .The null hypothesis of close fit is rejected if P is smaller than a prespecified level (for example, P < 0.05).
• Expected Cross Validation Index (Browne and Cudeck 1993)
For GLS and WLS, the estimator c of the ECVI is linearly related to AIC:
For ML estimation, cML is used.
The confidence interval (cL ; cU) for c is computed using the cumulative distribution function of the noncentral chi-squared distribution,
with nnt = n(n+1)/2 + t, x=NM * F, , and .The confidence interval (c*L ; c*U) for cML is
where nnt = n(n+1)/2 + t, x=(NM-n-1) * F, and .Refer to Browne and Cudeck (1993). The size of the confidence interval is defined by the option ALPHAECV=, .The default is , which corresponds to the 90% confidence interval for the ECVI.
• Comparative Fit Index (Bentler 1989)
If the variables are n-variate elliptic rather than normal and have significant amounts of multivariate kurtosis (leptokurtic or platykurtic), the value can be adjusted to
where is the multivariate relative kurtosis coefficient.
• Normal Theory Reweighted LS Value
This index is displayed only if METHOD=ML. Instead of the function value FML, the reweighted goodness-of-fit function FGWLS is used,
where FGWLS is the value of the function at the minimum.
• Akaike's Information Criterion (AIC) (Akaike 1974; Akaike 1987)
This is a criterion for selecting the best model among a number of candidate models. The model that yields the smallest value of AIC is considered the best.
• Consistent Akaike's Information Criterion (CAIC) (Bozdogan 1987)
This is another criterion, similar to AIC, for selecting the best model among alternatives. The model that yields the smallest value of CAIC is considered the best. CAIC is preferred by some people to AIC or the test.
• Schwarz's Bayesian Criterion (SBC) (Schwarz 1978; Sclove 1987)
This is another criterion, similar to AIC, for selecting the best model. The model that yields the smallest value of SBC is considered the best. SBC is preferred by some people to AIC or the test.
• McDonald's Measure of Centrality (McDonald and Hartmann 1992)
• Parsimonious Normed Fit Index (James, Mulaik, and Brett 1982)
The PNFI is a modification of Bentler-Bonett's normed fit index that takes parsimony of the model into account,
The PNFI uses the same parsimonious factor as the parsimonious GFI of Mulaik et al. (1989).
• Z-Test (Wilson and Hilferty 1931)
The Z-Test of Wilson and Hilferty assumes an n-variate normal distribution:
Refer to McArdle (1988) and Bishop, Fienberg, and Holland (1977, p. 527) for an application of the Z-Test.
• Nonnormed Coefficient (Bentler and Bonett 1980)
Refer to Tucker and Lewis (1973).
• Normed Coefficient (Bentler and Bonett 1980)
Mulaik et al. (1989) recommend the parsimonious weighted form PNFI.
• Normed Index (Bollen 1986)
is always less than or equal to 1; is unlikely in practice. Refer to the discussion in Bollen (1989a).
• Nonnormed Index (Bollen 1989a)
is a modification of Bentler & Bonett's that uses df and "lessens the dependence" on N. Refer to the discussion in Bollen (1989b). is identical to Mulaik et al.'s (1989) IFI2 index.
• Critical N Index (Hoelter 1983)
where is the critical chi-square value for the given df degrees of freedom and probability , and F is the value of the estimation criterion (minimization function). Refer to Bollen (1989b, p. 277). Hoelter (1983) suggests that CN should be at least 200; however, Bollen (1989b) notes that the CN value may lead to an overly pessimistic assessment of fit for small samples.

### Squared Multiple Correlation

The following are measures of the squared multiple correlation for manifest and endogenous variables and are computed for all five estimation methods: ULS, GLS, ML, WLS, and DWLS. These coefficients are computed as in the LISREL VI program of Jreskog and Srbom (1985). The DETAE, DETSE, and DETMV determination coefficients are intended to be global means of the squared multiple correlations for different subsets of model equations and variables. These coefficients are displayed only when you specify the PDETERM option with a RAM or LINEQS model.

• R2 Values Corresponding to Endogenous Variables
• Total Determination of All Equations
• Total Determination of the Structural Equations
• Total Determination of the Manifest Variables

Caution: In the LISREL program, the structural equations are defined by specifying the BETA matrix. In PROC CALIS, a structural equation has a dependent left-hand-side variable that appears at least once on the right-hand side of another equation, or the equation has at least one right-hand-side variable that is the left-hand-side variable of another equation. Therefore, PROC CALIS sometimes identifies more equations as structural equations than the LISREL program does.

 Chapter Contents Previous Next Top