|
HLM2 provides the data analyst with a means of checking the fit and distributional assumptions of the model by producing residual files for the level-1 and level-2 models. These files may be requested using the Basic Model Specifications dialog box . The level-1 and level-2 residual files will be written as SPSS,SAS,STATA,SYSTAT or ASCII data files. In the case of SPSS and STATA, the residual files will be written out so that the respective packages may use them immediately. The other forms of raw data will require submitting them as command streams.
The level-1 residual file
The level-1 residual file will contain level-1 residuals (the differences between the observed and fitted value s), the fitted values, the square root of , the values of the level-1 and level-2 predictors entered in the model, and those of other level-1 and level-2 variables selected by the user.
The level-2 residual file
This file will contain the EB residuals (see Equation 1.10 above), OL residuals (see Equation 1.9 above), and fitted values, i.e.,

for each level-1 coefficient. By adding the OL residuals to the corresponding fitted value s, the analyst can also obtain the OL estimate of the corresponding level-1 coefficient . The file also produces the EB estimate of each level-1 coefficient, .
In addition, the file will contain Mahalanobis distances (which are discussed below), estimates of the total and residual standard deviations (log metric) within each unit, the values of the predictors used in the level-2 model, and any other level-2 prediction variables selected by the user.
The residual file contains a single record per unit. The first variable in this file is the level-2 unit ID (here named l2ID), followed by the number of level-1 units within that level-2 unit (denoted by nj), and various summary statistics (chipct through mdrsvar explained below). These are followed by the two EB residuals (ebintrcp and ebses); the two OLS residuals (olintrcp and olses); and the fitted values, that is, the predicted values based on the estimated level-2 model (fvintrcp and fvses). Next are the EB coefficients (ecintrcp and ecses), which are the sum of the fitted values and the EB residuals. The posterior variances and covariances of the level-2 residuals are given next (pv00 for the posterior variance of the intercept residual, pv10 for the posterior covariance between the intercept residual and the slope residual, and pv11 for the posterior variance of the slope residual). Next are the corresponding posterior variances and covariances of the random intercept and coefficient (pvc00 for the posterior variance of the random intercept, pvc10 for the posterior covariance between the random intercept and the random slope, and pv11 for the posterior variance of the random slope). Finally, the level-2 predictors used in the analysis plus those additional level-2 predictors requested by the user for inclusion in the file are given.
While most of this is straightforward, the information contained in the first set of variables for each unit merits elaboration. nj is the number of cases for level-2 unit . It is followed by two variables, chipct and mdist. If we model level-1 coefficients, mdist would be the Mahalanobis distance (i.e., the standardized squared distance of a unit from the center of a -dimensional distribution, where is the number of random effects per unit). Essentially, mdist provides a single, summary measure of the distance of a unit's EB estimates, , from its "fitted value ," .
Note that the units in the residual file are sorted in ascending order by mdist. If the normality assumption is true, then the Mahalanobis distances should be distributed approximately . Analogous to univariate normal probability plotting, we can construct a Q-Q plot of mdist vs. chipct. chipct are the expected values of the order statistics for a sample of size selected from a population that is distributed . If the Q-Q plot resembles a 45 degree line, we have evidence that the random effects are distributed v-variate normal. In addition, the plot will help us detect outlying units (i.e., units with large mdist values well above the 45 degree line). It should be noted that such plots are good diagnostic tools only when the level-1 sample sizes, nj, are at least moderately large. (For further discussion see Hierarchical Linear Models, pp. 274-280.)
After mdist, three estimates of the level-1 variability are given:
- The natural logarithm of the total standard deviation within each unit, lntotvar.
- The natural logarithm of the residual standard deviation within each unit based on its least squares regression, olsrsvar. Note, this estimate exists only for those units which have sufficient data to compute level-1 OLS estimates.
- The mdrsvar, the natural logarithm of the residual standard deviation from the final fitted fixed effects model.
The natural log of these three standard deviations (with the addition of a bias-correction factor for varying degrees of freedom) is reported (see Hierarchical Linear Models, p. 219). We note that these statistics can be used as input for the V-known option in HLM2 in research on group-level correlates of diversity (Raudenbush & Bryk, 1987).
An example of an SPSS version of a level-2 residual file is shown below. Only the data from the first ten units and the first 8 variables are reproduced here. This file can be used to construct various diagnostic plots.

|