|
This example analyzes 32
items selected from the 48-item version of the Jenkins Activity
Survey for Health Prediction, Form B (Jenkins, Rosenman, and
Zyzanski, 1972). The data are responses of 598 men from central
Finland drawn from a larger survey sample. Most of the items
are rated on three-point scales representing little or no,
occasional, or frequent occurrence of the activity or behavior
in question. For purposes of the present analysis, the scales
have been dichotomized near the median. Wording in the positive
or negative direction varies from item to ttem as follows
(item numbers are those of the original pool of items from
which those of the present form was selected):
-Q156,-Q157,+Q158,-Q165,-Q166,-Q167,+Q247,+Q248,-Q249,-Q250,+Q251,+Q252, +Q253,+Q254,+Q255,+Q256,+Q257,-Q258,-Q259,+Q260,+Q261,+Q262,+Q263,+Q264,
+Q265,-Q266,+Q267,+Q268,+Q269,+Q270,+Q271,+Q272,-Q273,-Q274,-Q275,+Q276,
+Q277,+Q278,-Q279,-Q280,+Q307,+Q308,+Q309,+Q310,+Q311,-Q312,-Q313,-Q314.
The first 7 lines of the
data file exampl03.dat are shown below.
201000220122112221022212202112211101122112222000
001221211011100111111111111110111102211111211020
0010.02100222122021221222112112212.0011111222001
002020220212012120011112112221221022211111222202
201000221000211221221112012211122112211111222000
001001221022011120022222212222211101121112222101
102100111022112120021212212221121212111022200021
The first 10 columns of
each record are used as case identification and are read first.
Starting again in the first column by using the ‘T?operator,
the responses to the 48 items are read as single fields (48A1).
(10A1,T1,48A1)
The SELECT keyword on the
PROBLEM command indicates that 32 items are selected from
the original 48 items. The SELECT command provides the selected
items in the order in which they will be used. The RESPONSE
command lists the 5 responses indicated on the PROBLEM command
(RESPONSE keyword) and the KEY command provides the correct
responses for each of the 48 items. The NOTPRESENTED option
on the PROBLEM command is required if one of the response
codes identifies not presented items. The ??code on the
RESPONSE command identifies these responses.
The TETRACHORIC command
requests the printing of the coefficients to 3 decimal places
(NDEC = 3) in the printed output file (LIST option). The tetrachoric
correlation matrix,
item parameters, rotated factor loadings, and the factor scores
will be saved in the files exampl03.cor, exampl03.par,
example03.rot, and exampl03.fsc, respectively
as specified on the SAVE command. The FACTOR and FULL commands
are used to specify parameters for the full-information item
factor analysis. Three factors and ten latent roots are to
be extracted, as indicated by the NFAC and NROOT keywords
respectively. A VARIMAX rotation is requested. Note that this
keyword may not be abbreviated in the FACTOR command. A maximum
of 80 EM cycles will be performed (CYCLES keyword on the FULL
command). The convergence criterion for the EM cycles is given
by the PRECISION keyword on the TECHNICAL command.
Cases will be scored by
EAP (Expected A Posteriori, or Bayes) estimation
with adaptive quadrature (METHOD = 2 on the SCORE command).
Posterior standard deviations will also be computed. Results
will be saved in the exampl03.fsc file (FSCORE option
on the SAVE command). The factor scores for the first 20 cases
will be listed in the output file (LIST = 20). See next
example for MAP (Maximum A Posteriori, or Bayes Modal)
estimation for the same cases.
>TITLE
ITEMS FROM THE JENKINS
ACTIVITY SURVEY
ADAPTIVE
ITEM FACTOR ANALYSIS AND FACTOR SCORE ESTIMATION
>PROBLEM NITEM=48,SELECT=32,RESPONSES=5,NOTPRESENTED;
>NAMES Q156,Q157,Q158,Q165,Q166,Q167,Q247,Q248,Q249,Q250,Q251,Q252,
Q253,Q254,Q255,Q256,Q257,Q258,Q259,Q260,Q261,Q262,Q263,Q264,
Q265,Q266,Q267,Q268,Q269,Q270,Q271,Q272,Q273,Q274,Q275,Q276,
Q277,Q278,Q279,Q280,Q307,Q308,Q309,Q310,Q311,Q312,Q313,Q314;
>RESPONSE '8','0','1','2','.';
>KEY 002000220022222220022222202222220002220022222000;
>SELECT 3,5,6,7,9,11(1)14,17(1)23,25(1)30,32,33,35,36,39(1)42,47,48;
>TETRACHORIC LIST, NDEC=3;
>FACTOR NFAC=3,NROOT=10,ROTATE=VARIMAX;
>FULL CYCLES=80;
>TECHNICAL PRECISION=0.005;
>SCORE METHOD=2,LIST=20;
>SAVE CORR,PARM,FSCORE;
>INPUT NIDW=10,SCORES,FILE='EXAMPL03.DAT';
(10A1,T1,48A1)
>STOP

 |
The first part of the output
contains the name of the command file (exampl03.tsf)
and the name of the output file (exampl03.out). Each
TESTFACT run produces output under one or more of the following
headings, depending on the type of analysis.
The analysis specified
in exampl03.tsf produces Phase 0, Phase 1, Phase 2,
Phase 5 and Phase 7 output.

Regardless of the type
of analysis, a Phase 0
output is produced, being an echo of the input commands contained
in the *.tsf file.
PHASE 0: INPUT COMMANDS
ITEMS FROM THE JENKINS
ACTIVITY SURVEY
ADAPTIVE ITEM FACTOR ANALYSIS AND FACTOR SCORE ESTIMATION
------------------------------------------------------------
>PROBLEM NITEM=48,SELECT=32,RESPONSES=5,NOTPRESENTED;
This example analyzes 32 items
selected from the 48-item version
of the Jenkins Activity Survey
for Health Prediction, Form B (Jenkins,
Rosenman, and Zyzanski, 1972). The data are responses of 598 men from
central Finland drawn from a larger
survey sample. Most of the items
are rated on three-point scales
representing little or no, occasional,
or frequent occurance of the activity
or behavior in question. For
purposes of the present analysis,
the scales have been dichotomized
near the median. Wording in the
positive or negative direction varies
from item to time as follows (item
numbers are those of the original
pool of items from which those
of the present form was selected):
-Q156,-Q157,+Q158,-Q165,-Q166,-Q167,+Q247,+Q248,-Q249,-Q250,
+Q251,+Q252,+Q253,+Q254,+Q255,+Q256,+Q257,-Q258,-Q259,+Q260,+Q261,
+Q262,+Q263,+Q264,+Q265,-Q266,+Q267,+Q268,+Q269,+Q270,+Q271,+Q272,
-Q273,-Q274,-Q275,+Q276,+Q277,+Q278,-Q279,-Q280,+Q307,+Q308,+Q309,
+Q310,+Q311,-Q312,-Q313,-Q314.
The tetrachoric correlation matrix,
item parameters, rotated factor
loadings, and the factor scores
will be saved in the files EXAMPL03.COR,
EXAMPL03.PAR, EXAMPL03.ROT, and
EXAMPL03.FSC, respectively.
Cases will be scored by EAP (Expected
A Posteriori, or Bayes)
estimation with adaptive quadrature
(Method 2). Posterior standard
deviations will also be computed. Results will be saved in the
EXAMPL03.FSC file. See Exampl3a.tsf for MAP (Maximum A Posteriori,
or Bayes Modal) estimation for
the same cases.
>NAMES Q156,Q157,Q158,Q165,Q166,Q167,Q247,Q248,Q249,Q250,Q251,Q252,
Q253,Q254,Q255,Q256,Q257,Q258,Q259,Q260,Q261,Q262,Q263,Q264,
Q265,Q266,Q267,Q268,Q269,Q270,Q271,Q272,Q273,Q274,Q275,Q276,
Q277,Q278,Q279,Q280,Q307,Q308,Q309,Q310,Q311,Q312,Q313,Q314;
>RESPONSE '8','0','1','2','.';
>KEY 002000220022222220022222202222220002220022222000;
>SELECT 3,5,6,7,9,11(1)14,17(1)23,25(1)30,32,33,35,36,39(1)42,47,48;
>TETRACHORIC LIST, NDEC=3;
>FACTOR NFAC=3,NROOT=10,ROTATE=VARIMAX;
>FULL CYCLES=80;
>TECHNICAL PRECISION=0.005;
>SCORE METHOD=2,LIST=20;
>SAVE CORR,PARM,FSCORE;
>INPUT NIDW=10,SCORES,FILE='EXAMPL03.DAT';
DATA FILE NAME IS EXAMPL03.DAT
DATA FORMAT=
(10A1,T1,48A1)

Values of the response
categories (8, 0, 1, 2, .), the answer key, contents of the
first observation, the sum of weights and number of records
are given. This information enables you to verify that the
data values were read correctly from the data file exampl03.dat.
The response categories indicate a code of ??for omitted
responses (first value) and a code of ??for not-presented
items (last value).
Thirty-two items were selected
from the 48-item test. Based on the answer key values, a total
score for each of the 598 respondents is scored. Each item
has a set of responses: right, wrong, omit, or not presented.
For item j, j = 1, 2, ? 32, the response of
person i, i = 1, 2, ? 598 can be written as
if the response is correct,
and
if the response is incorrect.
At your option, omitted
items can be considered either wrong or not presented. The
total test score
for person i is
Respondent 1, for example,
has a total score of 19 correct out of a possible 32 as shown
below.
Answer key:
20020222220022222022222002002200
Respondent 1:
10020221121022212021121101211200
ITEMS FROM THE JENKINS ACTIVITY SURVEY
ADAPTIVE ITEM FACTOR ANALYSIS AND FACTOR SCORE ESTIMATION
------------------------------------------------------------
RESPONSE
CATEGORIES: 8 0 1 2 .
ANSWER
KEY: 20020222220022222022222002002200
CONTENTS OF FIRST OBSERVATION:
ID=2010002201
WEIGHT= 1
ITEM
RESPONSES= 201000220122112221022212202112211101122112222000
ITEM
RESPONSES AFTER SELECTION =
10020221121022212021121101211200
SUM OF WEIGHTS = 598
NUMBER OF RECORDS= 598
Using this information,
a frequency table of the score distribution is calculated
and presented graphically.
PHASE 1: HISTOGRAM AND BASIC STATISTICS
ITEMS FROM THE JENKINS ACTIVITY SURVEY
ADAPTIVE ITEM FACTOR ANALYSIS AND FACTOR SCORE ESTIMATION
------------------------------------------------------------
MAIN TEST HISTOGRAM
FREQUENCY :
|
|
| **
| ****
| *****
8.0+ *****
| *****
| *****
| ***** *
| * ***** *
| *********
| **********
| ***********
| ***********
| ***********
4.0+ ***********
| ***********
| *************
| **************
| **************
| **************
| ***************
| ****************
| *******************
| *******************
0.0+-----+----+----+----+----+----+----+----+----+----+----+----+----+--
0. 5. 10. 15. 20. 25. 30.
SCORES
NUMBER OF OBSERVATIONS AT EACH SCORE
SCORE COUNT FREQ | SCORE COUNT FREQ | SCORE COUNT FREQ
0 0 0.0 | 11 35 5.9 | 22 21 3.5
1 0 0.0 | 12 40 6.7 | 23 10 1.7
2 0 0.0 | 13 38 6.4 | 24 8 1.3
3 0 0.0 | 14 52 8.7 | 25 6 1.0
4 1 0.2 | 15 54 9.0 | 26 1 0.2
5 2 0.3 | 16 54 9.0 | 27 1 0.2
6 1 0.2 | 17 56 9.4 | 28 0 0.0
7 5 0.8 | 18 57 9.5 | 29 0 0.0
8 7 1.2 | 19 36 6.0 | 30 0 0.0
9 18 3.0 | 20 43 7.2 | 31 0 0.0
10 20 3.3 | 21 32 5.4 | 32 0 0.0
The last portion of the
Phase 1
output gives the mean (15.9) and standard deviation (4.0)
of the Total Scores.
TEST RECORD NUMBER MEAN S.D. PROPORTION S.D.
MAIN 598 598 15.9 4.0 0.497 0.500
The proportion of correct
responses, p, is

with a standard deviation

For each item, eight statistics
are produced. The Number, Mean and S.D. for item 2, for example,
are 590, 15.92, and 4.03 respectively. These values are obtained
by ‘deleting?each row of the data if a not presented code
is encountered for item 2. Since 8 rows contain not-presented
codes, the mean and standard deviation of the Total Scores
is calculated for the remaining 590 cases. Note, for example,
that item 1 was presented to all 598 persons, while item 4
was presented to 592 persons.
PHASE 2: ITEM STATISTICS
ITEMS FROM THE JENKINS ACTIVITY SURVEY
ADAPTIVE ITEM FACTOR ANALYSIS AND FACTOR SCORE ESTIMATION
-----------------------------------------------------------
MAIN TEST ITEM STATISTICS
ITEM NUMBER MEAN S.D. RMEAN FACILITY DIFF BIS P.BIS
1 Q158 598 15.91 4.01 14.46 0.206 16.29 -0.262 -0.185
2 Q166 590 15.92 4.03 17.13 0.653 11.43 0.532 0.413
3 Q167 596 15.90 4.01 16.35 0.790 9.77 0.305 0.215
4 Q247 592 15.93 4.01 16.71 0.694 10.97 0.384 0.292
5 Q249 594 15.92 4.01 15.89 0.466 13.34 -0.008 -0.006
6 Q251 598 15.91 4.01 17.16 0.532 12.68 0.417 0.332
7 Q252 598 15.91 4.01 17.39 0.490 13.10 0.451 0.360
8 Q253 598 15.91 4.01 18.16 0.410 13.91 0.591 0.467
9 Q254 597 15.91 4.02 18.99 0.203 16.33 0.551 0.387
10 Q257 597 15.92 4.01 17.99 0.449 13.51 0.585 0.466
?br>
31 Q313 597 15.91 4.02 16.31 0.843 8.98 0.349 0.231
32 Q314 594 15.93 4.02 16.86 0.586 12.13 0.351 0.278
The mean score for those
subjects who get a specific item correct is denoted by RMEAN.
For example, since 385 respondents selected the correct response
for item 2, RMEAN for item
2 is calculated as the mean of the corresponding 385 Total
Scores and equals 17.13.
The item facility (FACILITY)
is the proportion correct response for a specific item. For
example, 385 of the 590 respondents presented with item 2
selected the correct response, and hence

The delta statistic (
or DIFF) is calculated as

where p is the item
facility and
denotes the inverse normal
transformation. This statistic has an effective range of 1
to 25, with a mean and standard deviation of 13 and 4 respectively.
The last 2 statistics are
the biserial (BIS) and
point biserial (P.BIS)
correlations. The formula for the sample point biserial correlation
is
.
For item 8, for example,
The point biserial correlation
is the correlation between the item score and the total score,
or subtest score. Theoretically
but in practice
Therefore, 0.467 indicates
a relatively strong association between item 8 and the Total
Score.
The formula for calculating
the sample biserial correlation coefficient, BIS, is

Consider, for example,
the item 3 facility, which equals 0.790. From the inverse
normal tables, this corresponds to a
-value of 0.8062.

For item 3,

The first part of the output
contains, for each selected item, the Number of Cases, Percent
Correct, Percent Omitted, Percent Not Reached and Percent
Not Presented.
PHASE 5: TETRACHORIC
CORRELATIONS
ITEMS FROM THE JENKINS ACTIVITY SURVEY
ADAPTIVE ITEM FACTOR ANALYSIS AND FACTOR SCORE ESTIMATION
------------------------------------------------------------
MAIN TEST MISSING RESPONSE INFORMATION
----------------------------------------------------------------------------
ITEM NUMBER PERCENT PERCENT PERCENT PERCENT
OF CASES CORRECT OMITTED NOT REACHED NOT PRESENTED
----------------------------------------------------------------------------
1. Q158 598 20.6 0.0 0.0 0.0
2. Q166 590 64.4 0.0 0.0 1.3
3. Q167 596 78.8 0.0 0.0 0.3
4. Q247 592 68.7 0.0 0.0 1.0
5. Q249 594 46.3 0.0 0.0 0.7
...
31. Q313 597 84.1 0.0 0.0 0.2
32. Q314 594 58.2 0.0 0.0 0.7
----------------------------------------------------------------------------
This summary indicates
that there were no omitted codes in the data and that all
598 respondents could complete the test. The percent Not Presented
varies from 0.0 to a maximum of 1.3 for item 2. For item 2,
this percentage is calculated as

Note that the Percent Correct
is calculated here as the number of respondents who selected
the correct answer, divided by the total number of cases.
For item 2

This value differs from
the facility estimate (385/590) given under Phase 2 of the
output.
Display
1: Tetrachoric correlation matrix
The tetrachoric correlation
coefficient is widely used as a measure of association between
two dichotomous items. Tetrachoric correlations are obtained
by hypothesizing, for each item, the existence of a continuous
'latent' variable underlying the 'right-wrong' dichotomy imposed
in scoring. It is additionally hypothesized that, for each
pair of items, the corresponding two continuous 'latent' variables
have a bivariate normal distribution.
AVERAGE TETRACHORIC CORRELATION = 0.0654
STANDARD DEVIATION
= 0.2384
NUMBER OF VALID
ITEM PAIRS = 496
DISPLAY 1. TETRACHORIC CORRELATION MATRIX
1 2 3 4 5 6
Q158 Q166 Q167 Q247 Q249 Q251
1 Q158 1.000
2 Q166 -0.383 1.000
3 Q167 -0.145 0.124 1.000
4 Q247 -0.535 0.368 0.054 1.000
5 Q249 0.106 -0.019 0.016 -0.161 1.000
6 Q251 -0.065 0.017 0.019 0.016 -0.126 1.000
?/p>
In TESTFACT, use is made
of
, (n = number of items)
frequency tables to calculate
the tetrachoric coefficients. From the computer output, the
number of valid item pairs is 496. Since the number of items
equals 32, 32(32 ?1)/2 = 496, this data set contains no non-valid
pairs. Non-valid pairs have zero off-diagonal or marginal
frequencies. Examples of non-valid pairs are
and
The average tetrachoric
correlation equals 0.0654. Since the output contains both
negative and positive correlation coefficients, the average
value does not shed much light on the actual strength of association
between item pairs. Note that tetrachoric correlation matrices
are not necessarily positive definite.

Display 2: The positive latent roots
of the correlation matrix
By definition, a symmetric
matrix is positive definite if all its characteristic roots
are positive. From the output below, it is seen that only
the first 31 of the 32 roots are positive, and therefore the
matrix of tetrachoric correlations
is not positive definite. This problem can be corrected by
replacing the negative roots of the matrix by zero or a small
non-zero quantity.
DISPLAY 2. THE POSITIVE LATENT ROOTS OF THE CORRELATION MATRIX
1 2 3 4 5 6
1 7.491350 3.442602 2.592276 1.745235 1.576302 1.442306
7 8 9 10 11 12
1 1.248438 1.118638 1.015248 0.971235 0.908476 0.835705
13 14 15 16 17 18
1 0.768426 0.719607 0.657375 0.638227 0.631485 0.555802
19 20 21 22 23 24
1 0.514488 0.461871 0.398661 0.375292 0.349726 0.312994
25 26 27 28 29 30
1 0.292964 0.243591 0.218973 0.183170 0.167582 0.117183
31
1 0.055375
Display 3: Number of items and sum
of latent roots and their ratio
This section of the output
shows the sum of positive roots and the ratio with which each
root has to be multiplied to obtain a sum of 'corrected roots' which equals the number of items. To illustrate, consider
a
correlation matrix with
latent roots 3, 1, 0.8, 0.3, and -0.1. The sum of the roots
equals 5. In general, for any correlation matrix based on
n items, the sum of roots equals n.
Suppose the value of -0.1
is replaced by 0.0001, then the new sum of roots equals 5.1001.
However, by multiplying each root by the ratio 5/5.1001 =
0.9804, a 'corrected' set of roots is obtained in the sense
that their sum equals 5.
From the Display 3 part
of the output, the ratio required to obtain a corrected set
of latent roots equals 0.9984211. The corrected set is given
under the Display 4 heading.
DISPLAY 3. NUMBER OF ITEMS AND SUM OF LATENT ROOTS
AND THEIR RATIO
32 32.0506033 0.9984211
Display 4: Corrected latent roots
DISPLAY 4. THE CORRECTED LATENT ROOTS OF THE CORRELATION MATRIX
1 2 3 4 5 6
1 7.479522 3.437167 2.588184 1.742479 1.573814 1.440029
... ... ... ... ... ...
Display 5: Initial smoothed inter-item
correlation matrix
Any symmetric matrix can
be decomposed as
where
is a diagonal matrix with
diagonal elements the characteristic roots of
As mentioned previously,
if all roots are positive, that is, all the diagonal elements
of
are positive,
is a positive definite
matrix. When this is not the case, a 'smoothed' correlation
matrix,
may be obtained by replacing
the elements of
with the corrected roots
and negative roots with either 0 or some small positive quantity,
so that
where the columns of
are eigenvectors and the
elements of
the corrected latent roots.
The elements of the smoothed correlation matrix for the first
6 of the 32 items are given below.
DISPLAY 5. INITIAL SMOOTHED INTER-ITEM CORRELATION MATRIX
1 2 3 4 5 6
Q158 Q166 Q167 Q247 Q249 Q251
1 Q158 1.000
2 Q166 -0.383 1.000
3 Q167 -0.145 0.124 1.000
4 Q247 -0.534 0.368 0.054 1.000
5 Q249 0.106 -0.019 0.016 -0.161 1.000
6 Q251 -0.066 0.017 0.019 0.016 -0.126 1.000
Display 6: Iterated communality
estimates
A communality is defined
as the squared multiple correlation between an observed variable
and the set of factors. The output below shows the estimated
communalities for iterations 1, 2, 3, and 4. Note the small
changes in the estimated values going from iteration 3 to
iteration 4.
At iteration 1, the squared
multiple correlation of an item with all other items is calculated
for each of the 32 items. The MINRES method (see Display 7)
is subsequently used to obtain post-solution improvements
to these initial multiple regression communality estimates.
DISPLAY 6. ITERATED COMMUNALITY ESTIMATES
1 2 3 4
1 Q158 0.413 0.373 0.371 0.371
2 Q166 0.370 0.325 0.323 0.322
3 Q167 0.156 0.116 0.115 0.115
4 Q247 0.516 0.471 0.466 0.465
5 Q249 0.142 0.088 0.087 0.087
6 Q251 0.351 0.269 0.257 0.255
...
31 Q313 0.477 0.422 0.415 0.414
32 Q314 0.458 0.396 0.387 0.386
Display 7: The NROOT largest latent
roots of the correlation matrix
TESTFACT uses the minimum
squared residuals (MINRES) met |