ÇÊ »çÀ̾𽺠| Systat | CambridgeSoft | EndNote

History
Adivisory Board
Research
Lisrel General
New
Lisrel Download
Lisrel Resource
Lisrel Example
Advanced Topic
Lisrel FAQ
Lisrel Reference
HLM General
HLM New
HLM Download
HLM Resource
HLM Example
HLM FAQ
HLM Reference
°¡°Ý ¹× ÁÖ¹®
±³À°±â°ü °¡°Ý
¼­Àû
±âŸ ¼ÒÇÁÆ®¿þ¾î
¹®ÀÇ
±â¼ú Áö¿ø
IRT General
IRT Resource
IRT Example
IRT Reference
SUperMix General
  TESTFACT: Adaptive item factor analysis and factor score (EAP) estimation

This example analyzes 32 items selected from the 48-item version of the Jenkins Activity Survey for Health Prediction, Form B (Jenkins, Rosenman, and Zyzanski, 1972). The data are responses of 598 men from central Finland drawn from a larger survey sample. Most of the items are rated on three-point scales representing little or no, occasional, or frequent occurrence of the activity or behavior in question. For purposes of the present analysis, the scales have been dichotomized near the median. Wording in the positive or negative direction varies from item to ttem as follows (item numbers are those of the original pool of items from which those of the present form was selected):

-Q156,-Q157,+Q158,-Q165,-Q166,-Q167,+Q247,+Q248,-Q249,-Q250,+Q251,+Q252,    +Q253,+Q254,+Q255,+Q256,+Q257,-Q258,-Q259,+Q260,+Q261,+Q262,+Q263,+Q264,
+Q265,-Q266,+Q267,+Q268,+Q269,+Q270,+Q271,+Q272,-Q273,-Q274,-Q275,+Q276,
+Q277,+Q278,-Q279,-Q280,+Q307,+Q308,+Q309,+Q310,+Q311,-Q312,-Q313,-Q314.

The first 7 lines of the data file exampl03.dat are shown below.

201000220122112221022212202112211101122112222000
001221211011100111111111111110111102211111211020
0010.02100222122021221222112112212.0011111222001
002020220212012120011112112221221022211111222202
201000221000211221221112012211122112211111222000
001001221022011120022222212222211101121112222101
102100111022112120021212212221121212111022200021

The first 10 columns of each record are used as case identification and are read first. Starting again in the first column by using the ‘T?operator, the responses to the 48 items are read as single fields (48A1).

(10A1,T1,48A1)

The SELECT keyword on the PROBLEM command indicates that 32 items are selected from the original 48 items. The SELECT command provides the selected items in the order in which they will be used. The RESPONSE command lists the 5 responses indicated on the PROBLEM command (RESPONSE keyword) and the KEY command provides the correct responses for each of the 48 items. The NOTPRESENTED option on the PROBLEM command is required if one of the response codes identifies not presented items. The ??code on the RESPONSE command identifies these responses.

The TETRACHORIC command requests the printing of the coefficients to 3 decimal places (NDEC = 3) in the printed output file (LIST option). The tetrachoric correlation matrix, item parameters, rotated factor loadings, and the factor scores will be saved in the files exampl03.cor, exampl03.par, example03.rot, and exampl03.fsc, respectively as specified on the SAVE command. The FACTOR and FULL commands are used to specify parameters for the full-information item factor analysis. Three factors and ten latent roots are to be extracted, as indicated by the NFAC and NROOT keywords respectively. A VARIMAX rotation is requested. Note that this keyword may not be abbreviated in the FACTOR command. A maximum of 80 EM cycles will be performed (CYCLES keyword on the FULL command). The convergence criterion for the EM cycles is given by the PRECISION keyword on the TECHNICAL command.

Cases will be scored by EAP (Expected A Posteriori, or Bayes) estimation with adaptive quadrature (METHOD = 2 on the SCORE command). Posterior standard deviations will also be computed. Results will be saved in the exampl03.fsc file (FSCORE option on the SAVE command). The factor scores for the first 20 cases will be listed in the output file (LIST = 20). See next example for MAP (Maximum A Posteriori, or Bayes Modal) estimation for the same cases.

>TITLE
   ITEMS FROM THE JENKINS ACTIVITY SURVEY
       ADAPTIVE ITEM FACTOR ANALYSIS AND FACTOR SCORE ESTIMATION
>PROBLEM NITEM=48,SELECT=32,RESPONSES=5,NOTPRESENTED;
>NAMES Q156,Q157,Q158,Q165,Q166,Q167,Q247,Q248,Q249,Q250,Q251,Q252,
       Q253,Q254,Q255,Q256,Q257,Q258,Q259,Q260,Q261,Q262,Q263,Q264,
       Q265,Q266,Q267,Q268,Q269,Q270,Q271,Q272,Q273,Q274,Q275,Q276,
       Q277,Q278,Q279,Q280,Q307,Q308,Q309,Q310,Q311,Q312,Q313,Q314;
>RESPONSE  '8','0','1','2','.';
>KEY 002000220022222220022222202222220002220022222000;
>SELECT  3,5,6,7,9,11(1)14,17(1)23,25(1)30,32,33,35,36,39(1)42,47,48;
>TETRACHORIC LIST, NDEC=3;
>FACTOR NFAC=3,NROOT=10,ROTATE=VARIMAX;
>FULL CYCLES=80;
>TECHNICAL PRECISION=0.005;
>SCORE METHOD=2,LIST=20;
>SAVE CORR,PARM,FSCORE;
>INPUT NIDW=10,SCORES,FILE='EXAMPL03.DAT';
(10A1,T1,48A1)
>STOP

  Discussion of output

The first part of the output contains the name of the command file (exampl03.tsf) and the name of the output file (exampl03.out). Each TESTFACT run produces output under one or more of the following headings, depending on the type of analysis.

The analysis specified in exampl03.tsf produces Phase 0, Phase 1, Phase 2, Phase 5 and Phase 7 output.

  Phase 0: Input commands

Regardless of the type of analysis, a Phase 0 output is produced, being an echo of the input commands contained in the *.tsf file.

PHASE 0: INPUT COMMANDS
   ITEMS FROM THE JENKINS ACTIVITY SURVEY
       ADAPTIVE ITEM FACTOR ANALYSIS AND FACTOR SCORE ESTIMATION
 ------------------------------------------------------------
 >PROBLEM NITEM=48,SELECT=32,RESPONSES=5,NOTPRESENTED;
 This example analyzes 32 items selected from the 48-item version
 of the Jenkins Activity Survey for Health Prediction, Form B (Jenkins,
 Rosenman, and Zyzanski, 1972). The data are responses of 598 men from
 central Finland drawn from a larger survey sample. Most of the items
 are rated on three-point scales representing little or no, occasional,
 or frequent occurance of the activity or behavior in question. For
 purposes of the present analysis, the scales have been dichotomized
 near the median. Wording in the positive or negative direction varies
 from item to time as follows (item numbers are those of the original
 pool of items from which those of the present form was selected):

-Q156,-Q157,+Q158,-Q165,-Q166,-Q167,+Q247,+Q248,-Q249,-Q250,
+Q251,+Q252,+Q253,+Q254,+Q255,+Q256,+Q257,-Q258,-Q259,+Q260,+Q261,
+Q262,+Q263,+Q264,+Q265,-Q266,+Q267,+Q268,+Q269,+Q270,+Q271,+Q272,
-Q273,-Q274,-Q275,+Q276,+Q277,+Q278,-Q279,-Q280,+Q307,+Q308,+Q309,
+Q310,+Q311,-Q312,-Q313,-Q314.

The tetrachoric correlation matrix, item parameters, rotated factor
 loadings, and the factor scores will be saved in the files EXAMPL03.COR,
 EXAMPL03.PAR, EXAMPL03.ROT, and EXAMPL03.FSC, respectively.

Cases will be scored by EAP (Expected A Posteriori, or Bayes)
 estimation with adaptive quadrature (Method 2). Posterior standard
 deviations will also be computed. Results will be saved in the
 EXAMPL03.FSC file. See Exampl3a.tsf for MAP (Maximum A Posteriori,
 or Bayes Modal) estimation for the same cases.

>NAMES Q156,Q157,Q158,Q165,Q166,Q167,Q247,Q248,Q249,Q250,Q251,Q252,
       Q253,Q254,Q255,Q256,Q257,Q258,Q259,Q260,Q261,Q262,Q263,Q264,
       Q265,Q266,Q267,Q268,Q269,Q270,Q271,Q272,Q273,Q274,Q275,Q276,
       Q277,Q278,Q279,Q280,Q307,Q308,Q309,Q310,Q311,Q312,Q313,Q314;
 >RESPONSE '8','0','1','2','.';
 >KEY 002000220022222220022222202222220002220022222000;
 >SELECT 3,5,6,7,9,11(1)14,17(1)23,25(1)30,32,33,35,36,39(1)42,47,48;
 >TETRACHORIC LIST, NDEC=3;
 >FACTOR NFAC=3,NROOT=10,ROTATE=VARIMAX;
 >FULL CYCLES=80;
 >TECHNICAL PRECISION=0.005;
 >SCORE METHOD=2,LIST=20;
 >SAVE CORR,PARM,FSCORE;
 >INPUT NIDW=10,SCORES,FILE='EXAMPL03.DAT';

   DATA FILE NAME IS EXAMPL03.DAT

DATA FORMAT=
 (10A1,T1,48A1)

  Phase 1: Data description

Values of the response categories (8, 0, 1, 2, .), the answer key, contents of the first observation, the sum of weights and number of records are given. This information enables you to verify that the data values were read correctly from the data file exampl03.dat. The response categories indicate a code of ??for omitted responses (first value) and a code of ??for not-presented items (last value).

Thirty-two items were selected from the 48-item test. Based on the answer key values, a total score for each of the 598 respondents is scored. Each item has a set of responses: right, wrong, omit, or not presented. For item j, j = 1, 2, ? 32, the response of person i, i = 1, 2, ? 598 can be written as

 if the response is correct, and

 if the response is incorrect.

At your option, omitted items can be considered either wrong or not presented. The total test score  for person i is

Respondent 1, for example, has a total score of 19 correct out of a possible 32 as shown below.

Answer key:

20020222220022222022222002002200

Respondent 1:

10020221121022212021121101211200

ITEMS FROM THE JENKINS ACTIVITY SURVEY
       ADAPTIVE ITEM FACTOR ANALYSIS AND FACTOR SCORE ESTIMATION
 ------------------------------------------------------------
      RESPONSE CATEGORIES: 8 0 1 2 .
      ANSWER KEY: 20020222220022222022222002002200

CONTENTS OF FIRST OBSERVATION:
      ID=2010002201
      WEIGHT=         1
      ITEM RESPONSES= 201000220122112221022212202112211101122112222000
      ITEM RESPONSES AFTER SELECTION =
                     10020221121022212021121101211200

SUM OF WEIGHTS =       598
NUMBER OF RECORDS=      598

Using this information, a frequency table of the score distribution is calculated and presented graphically.

PHASE 1: HISTOGRAM AND BASIC STATISTICS

ITEMS FROM THE JENKINS ACTIVITY SURVEY
       ADAPTIVE ITEM FACTOR ANALYSIS AND FACTOR SCORE ESTIMATION
 ------------------------------------------------------------

MAIN TEST HISTOGRAM

FREQUENCY :
       |
       |
       |                **
       |              ****
       |             *****
   8.0+             *****
       |             *****
       |             *****
       |             ***** *
       |           * ***** *
       |           *********
       |          **********
       |          ***********
       |          ***********
       |          ***********
   4.0+          ***********
       |          ***********
       |         *************
       |        **************
       |        **************
       |        **************
       |        ***************
       |        ****************
       |      *******************
       |      *******************
   0.0+-----+----+----+----+----+----+----+----+----+----+----+----+----+--
       0.   5.   10. 15.  20. 25.  30.
SCORES

NUMBER OF OBSERVATIONS AT EACH SCORE
   SCORE    COUNT  FREQ |  SCORE     COUNT FREQ |   SCORE    COUNT  FREQ
      0         0   0.0 |    11        35   5.9 |    22        21   3.5
      1         0   0.0 |    12        40   6.7 |    23        10   1.7
      2         0   0.0 |    13        38   6.4 |    24         8   1.3
      3         0   0.0 |    14        52   8.7 |    25         6   1.0
      4         1   0.2 |    15        54   9.0 |    26         1   0.2
      5         2   0.3 |    16        54   9.0 |    27         1   0.2
      6         1   0.2 |    17        56   9.4 |    28         0   0.0
      7         5   0.8 |    18         57  9.5 |     29         0   0.0
      8         7   1.2 |    19        36   6.0 |    30         0   0.0
      9        18   3.0 |    20        43   7.2 |    31         0   0.0
     10        20   3.3 |    21        32   5.4 |    32          0  0.0

 

The last portion of the Phase 1 output gives the mean (15.9) and standard deviation (4.0) of the Total Scores.

TEST    RECORD     NUMBER     MEAN     S.D.   PROPORTION     S.D.
MAIN       598       598      15.9     4.0      0.497      0.500

The proportion of correct responses, p, is

with a standard deviation

  Phase 2: Item statistics

For each item, eight statistics are produced. The Number, Mean and S.D. for item 2, for example, are 590, 15.92, and 4.03 respectively. These values are obtained by ‘deleting?each row of the data if a not presented code is encountered for item 2. Since 8 rows contain not-presented codes, the mean and standard deviation of the Total Scores is calculated for the remaining 590 cases. Note, for example, that item 1 was presented to all 598 persons, while item 4 was presented to 592 persons.

PHASE 2: ITEM STATISTICS

ITEMS FROM THE JENKINS ACTIVITY SURVEY
ADAPTIVE ITEM FACTOR ANALYSIS AND FACTOR SCORE ESTIMATION
-----------------------------------------------------------

MAIN TEST ITEM STATISTICS

  ITEM         NUMBER   MEAN  S.D.  RMEAN FACILITY  DIFF     BIS  P.BIS
 1 Q158           598  15.91  4.01  14.46   0.206  16.29 -0.262  -0.185
 2 Q166           590  15.92  4.03  17.13   0.653  11.43  0.532   0.413
 3 Q167           596  15.90  4.01  16.35   0.790   9.77  0.305   0.215
 4 Q247           592  15.93  4.01  16.71   0.694  10.97  0.384   0.292
 5 Q249           594  15.92  4.01  15.89   0.466  13.34 -0.008  -0.006
 6 Q251           598  15.91  4.01  17.16   0.532  12.68  0.417   0.332
 7 Q252           598  15.91  4.01  17.39   0.490  13.10  0.451   0.360
 8 Q253           598  15.91  4.01  18.16   0.410  13.91  0.591   0.467
 9 Q254           597  15.91  4.02  18.99   0.203  16.33  0.551   0.387
10 Q257           597  15.92  4.01  17.99   0.449  13.51  0.585   0.466
 ?br> 31 Q313           597  15.91  4.02  16.31   0.843   8.98  0.349   0.231
32 Q314           594  15.93  4.02  16.86   0.586  12.13  0.351   0.278

The mean score for those subjects who get a specific item correct is denoted by RMEAN. For example, since 385 respondents selected the correct response for item 2, RMEAN for item 2 is calculated as the mean of the corresponding 385 Total Scores and equals 17.13.

The item facility (FACILITY) is the proportion correct response for a specific item. For example, 385 of the 590 respondents presented with item 2 selected the correct response, and hence

The delta statistic ( or DIFF) is calculated as

where p is the item facility and  denotes the inverse normal transformation. This statistic has an effective range of 1 to 25, with a mean and standard deviation of 13 and 4 respectively.

The last 2 statistics are the biserial (BIS) and point biserial (P.BIS) correlations. The formula for the sample point biserial correlation is

.

For item 8, for example,

The point biserial correlation is the correlation between the item score and the total score, or subtest score. Theoretically  but in practice  Therefore, 0.467 indicates a relatively strong association between item 8 and the Total Score.

The formula for calculating the sample biserial correlation coefficient, BIS, is

Consider, for example, the item 3 facility, which equals 0.790. From the inverse normal tables, this corresponds to a -value of 0.8062.

For item 3,

  Phase 5: Tetrachoric correlations

The first part of the output contains, for each selected item, the Number of Cases, Percent Correct, Percent Omitted, Percent Not Reached and Percent Not Presented.

PHASE 5:  TETRACHORIC CORRELATIONS

ITEMS FROM THE JENKINS ACTIVITY SURVEY
       ADAPTIVE ITEM FACTOR ANALYSIS AND FACTOR SCORE ESTIMATION
 ------------------------------------------------------------

MAIN TEST MISSING RESPONSE INFORMATION
----------------------------------------------------------------------------
 ITEM        NUMBER     PERCENT     PERCENT     PERCENT      PERCENT
            OF CASES   CORRECT     OMITTED   NOT REACHED NOT PRESENTED
----------------------------------------------------------------------------
   1. Q158        598        20.6        0.0         0.0        0.0
   2. Q166        590        64.4        0.0         0.0        1.3
   3. Q167        596        78.8        0.0          0.0        0.3
   4. Q247        592        68.7        0.0         0.0        1.0
   5. Q249        594        46.3        0.0         0.0        0.7
       ...
   31. Q313        597        84.1        0.0         0.0        0.2
   32. Q314        594        58.2        0.0         0.0        0.7
----------------------------------------------------------------------------

This summary indicates that there were no omitted codes in the data and that all 598 respondents could complete the test. The percent Not Presented varies from 0.0 to a maximum of 1.3 for item 2. For item 2, this percentage is calculated as

Note that the Percent Correct is calculated here as the number of respondents who selected the correct answer, divided by the total number of cases. For item 2

This value differs from the facility estimate (385/590) given under Phase 2 of the output.

Display 1: Tetrachoric correlation matrix

The tetrachoric correlation coefficient is widely used as a measure of association between two dichotomous items. Tetrachoric correlations are obtained by hypothesizing, for each item, the existence of a continuous 'latent' variable underlying the 'right-wrong' dichotomy imposed in scoring. It is additionally hypothesized that, for each pair of items, the corresponding two continuous 'latent' variables have a bivariate normal distribution.

AVERAGE TETRACHORIC CORRELATION = 0.0654
     STANDARD DEVIATION =   0.2384
     NUMBER OF VALID ITEM PAIRS =   496

DISPLAY   1.  TETRACHORIC CORRELATION MATRIX

                     1       2       3       4       5       6
                    Q158     Q166    Q167     Q247    Q249     Q251
     1  Q158      1.000
     2  Q166      -0.383   1.000
     3  Q167      -0.145   0.124   1.000
     4  Q247      -0.535   0.368   0.054   1.000
     5  Q249      0.106   -0.019   0.016   -0.161   1.000
     6  Q251      -0.065   0.017   0.019   0.016  -0.126   1.000
                   ?/p>

In TESTFACT, use is made of , (n = number of items)  frequency tables to calculate the tetrachoric coefficients. From the computer output, the number of valid item pairs is 496. Since the number of items equals 32, 32(32 ?1)/2 = 496, this data set contains no non-valid pairs. Non-valid pairs have zero off-diagonal or marginal frequencies. Examples of non-valid pairs are

  R W
R

 

O

W

O

 

 

  R W
R

O

O

W

 

 

 and

 

R W
R

 

O

W

 

O

The average tetrachoric correlation equals 0.0654. Since the output contains both negative and positive correlation coefficients, the average value does not shed much light on the actual strength of association between item pairs. Note that tetrachoric correlation matrices are not necessarily positive definite.

  Phase 6: Factor analysis

Display 2: The positive latent roots of the correlation matrix

By definition, a symmetric matrix is positive definite if all its characteristic roots are positive. From the output below, it is seen that only the first 31 of the 32 roots are positive, and therefore the  matrix of tetrachoric correlations is not positive definite. This problem can be corrected by replacing the negative roots of the matrix by zero or a small non-zero quantity.

DISPLAY   2.  THE POSITIVE LATENT ROOTS OF THE CORRELATION MATRIX

             1        2        3        4        5        6
     1    7.491350  3.442602 2.592276  1.745235 1.576302  1.442306

             7        8        9       10       11       12
     1    1.248438  1.118638 1.015248  0.971235 0.908476  0.835705

            13       14       15       16       17       18
     1    0.768426  0.719607 0.657375  0.638227 0.631485  0.555802

            19       20       21       22       23       24
     1    0.514488  0.461871 0.398661  0.375292 0.349726  0.312994

            25       26        27       28       29       30
     1    0.292964  0.243591 0.218973  0.183170 0.167582  0.117183

            31
     1    0.055375

Display 3: Number of items and sum of latent roots and their ratio

This section of the output shows the sum of positive roots and the ratio with which each root has to be multiplied to obtain a sum of 'corrected roots' which equals the number of items. To illustrate, consider a  correlation matrix with latent roots 3, 1, 0.8, 0.3, and -0.1. The sum of the roots equals 5. In general, for any correlation matrix based on n items, the sum of roots equals n.

Suppose the value of -0.1 is replaced by 0.0001, then the new sum of roots equals 5.1001. However, by multiplying each root by the ratio 5/5.1001 = 0.9804, a 'corrected' set of roots is obtained in the sense that their sum equals 5.

From the Display 3 part of the output, the ratio required to obtain a corrected set of latent roots equals 0.9984211. The corrected set is given under the Display 4 heading.

DISPLAY   3.    NUMBER OF ITEMS AND SUM OF LATENT ROOTS
                 AND THEIR RATIO
                 32      32.0506033      0.9984211

Display 4: Corrected latent roots

DISPLAY   4.  THE CORRECTED LATENT ROOTS OF THE CORRELATION MATRIX

     1        2        3        4        5        6
1     7.479522 3.437167  2.588184 1.742479  1.573814 1.440029
        ...        ...        ...        ...        ...        ...

Display 5: Initial smoothed inter-item correlation matrix

Any symmetric matrix can be decomposed as

           

where  is a diagonal matrix with diagonal elements the characteristic roots of  As mentioned previously, if all roots are positive, that is, all the diagonal elements of  are positive,  is a positive definite matrix. When this is not the case, a 'smoothed' correlation matrix,  may be obtained by replacing the elements of  with the corrected roots and negative roots with either 0 or some small positive quantity, so that

where the columns of  are eigenvectors and the elements of  the corrected latent roots. The elements of the smoothed correlation matrix for the first 6 of the 32 items are given below.

DISPLAY   5.  INITIAL SMOOTHED INTER-ITEM CORRELATION MATRIX

                    1       2       3       4       5       6
                   Q158     Q166    Q167     Q247    Q249     Q251
     1  Q158      1.000
     2  Q166      -0.383   1.000
     3  Q167      -0.145   0.124   1.000
     4  Q247      -0.534   0.368   0.054   1.000
     5  Q249      0.106   -0.019   0.016   -0.161   1.000
     6  Q251      -0.066   0.017   0.019   0.016   -0.126   1.000

Display 6: Iterated communality estimates

A communality is defined as the squared multiple correlation between an observed variable and the set of factors. The output below shows the estimated communalities for iterations 1, 2, 3, and 4. Note the small changes in the estimated values going from iteration 3 to iteration 4.

At iteration 1, the squared multiple correlation of an item with all other items is calculated for each of the 32 items. The MINRES method (see Display 7) is subsequently used to obtain post-solution improvements to these initial multiple regression communality estimates.

DISPLAY   6.  ITERATED COMMUNALITY ESTIMATES

                      1      2     3      4
     1  Q158       0.413  0.373 0.371  0.371
     2  Q166       0.370  0.325 0.323  0.322
     3  Q167       0.156  0.116 0.115  0.115
     4  Q247       0.516  0.471 0.466  0.465
     5  Q249       0.142  0.088 0.087  0.087
     6  Q251       0.351  0.269 0.257  0.255
        ...
   31  Q313       0.477  0.422 0.415  0.414
   32  Q314       0.458  0.396 0.387  0.386

Display 7: The NROOT largest latent roots of the correlation matrix

TESTFACT uses the minimum squared residuals (MINRES) met