196
R. Maritz et al.
Level 1 all 4 subsamples were analysed separately (MSKt1,
MSKt2, NEURt1 and NEURt2). In Level 2 the rehabilitation
group and time-point subsamples were aggregated respectively
(MSKt1&t2, NEURt1&t2, t1MSK&NEUR, t2MSK&NEUR).
Level 3 represents the aggregation of all 4 subsamples, i.e. the
entire calibration sample (FIM_all). Together, these 3 aggrega-
tion levels resulted in 9 analysis steps.
For both testlet approaches, the emphasis is on making exis-
ting assessment tools work without the need to delete items or
change the scoring structure.
Differential Item Functioning strategy
DIF was analysed in situations in which local dependencies
could be accommodated satisfactorily with testlets. Where a
lack of group invariance was observed, the testlets for the con-
textual factor were split on the basis of the strongest DIF, and
continued until no further DIF was present (33). The split and
unsplit solutions were then compared with each other on the
basis of the Rasch person estimates, anchored to each other with
an unsplit item free of DIF. An effect size calculation, based on
the mean of the person estimates, their standard deviations, and
the correlation of the split and unsplit version (34) was applied
to determine whether DIF split was necessary for the final
transformation table. If the effect size was below 0.2, DIF was
considered small (35) and no action was taken to adjust for DIF.
Transformation table
The second specific aim of this study was to develop a transfor-
mation table in case fit to the Rasch model could be achieved.
The solution with the best fit to the Rasch model was taken as
a basis for this transformation, i.e. the solution with the most
satisfactory core values for the entire calibration sample. The
transformation table from FIM™ raw ordinal total scores to the
corresponding interval-scaled values was based on the respec-
tive estimates according to the Rasch model.
RESULTS
Sample characteristics
The calibration sample included 946 cases. Of these,
476 were musculoskeletal cases and 470 neurological
cases. A total of 474 cases were from time-point 1 ad-
mission, and 472 from time-point 2 discharge (see Fig.
1). FIM™ total scores had a mean of 81.7 (standard
deviation (SD) = 27.5, median = 84). The mean age
of subjects in the calibration sample was 71.6 years
(SD = 14.5, 20–102 years). The calibration sample was
43% (n = 403) male and 57% (n = 543) female; 41%
(n = 392) were from the German-speaking region of
Switzerland, 25% (n = 238) from the French-speaking
region and 34% (n = 316) from the Italian-speaking re-
gion; 84% (n = 798) of the sample were Swiss and 16%
(n = 148) had another nationality. Insurance status was:
67% (n = 633) general, 18% (n = 172) semi-private, and
15% (n = 141) private.
Baseline Rasch analysis
In the 9 baseline analysis steps across the 3 aggregation
levels of the calibration sample, no fit to the Rasch mo-
del was achieved (Table I). In all analyses the p-values
of the item-trait χ 2 were significant. Furthermore, in
all analysis steps there were items that showed local
dependencies among each other, DIF and threshold
disordering. Information on threshold disordering and
local dependency of the baseline analyses are shown
in Appendix S1 1 .
http://www.medicaljournals.se/jrm/content/?doi=10.2340/16501977-0000
1
Table I. Functional Independence Measure (FIM™) baseline analyses
Person-fit residuals
Mean (SD) χ 2 p-value PSI α DIF (items)
0.193 (2.496) –0.183 (1.304) 0.000 0.961 0.967 230/4
476/8 0.098 (2.191)
0.193 (3.255) –0.165 (1.359)
–0.155 (1.280) 0.000
0.000 0.966
0.963 0.968
0.967 NEUR_t1
NEUR_t2
NEUR_all 228/4
242/4
470/8 –0.046 (3.559)
–0.461 (3.449)
–0.369 (4.919) –0.314 (1.745)
–0.358 (1.595)
–0.349 (1.678) 0.000
0.000
0.000 0.964
0.964
0.963 0.972
0.973
0.972 t1_all 474/8 0.101 (4.274) –0.239 (1.609) 0.000 0.96 0.968 t2_all 472/8 –0.284 (3.957) –0.293 (1.553) 0.000 0.964 0.971 FIM_all 946/10 –0.077 (5.779) –0.265 (1.609) 0.000 0.962 0.969 SD < 1.4 > 0.01 > 0.7 > 0.7 age (M), language (A, B,
D, F, L, R)
language (B, D, F, L, N, P)
gender (Q), age L, N),
language (B, C, D, F, H, L,
M, N, Q, R), time-point (L,
M, N, O)
language (Q)
No DIF
language (D, F, M, N, P, Q),
time-point (L)
age (F, I, J, N, Q, R),
language (B, D, F, L, N, Q),
rehab-group (C, E, K, M, O,
P, Q, R)
language (B, D, M, N, Q),
rehab-group (C, E, K, L,
O, P, Q)
gender (L), age (N, O),
language (B, D, F, H, L, M,
N, Q, R), nationality (Q),
insurance (O), time-point
(L, M), rehab-group (C, E,
K, L, M, O, P, Q, R)
No DIF present
Sample n/CI
MSK_t1 246/4
MSK_t2
MSK_all
Acceptable values
Item-fit residuals
Mean (SD)
SD < 1.4
Paired t-test
(Lower ci %), %
9.8 (0.0)
17.4 (0.0)
16.2 (0.0)
17.1 (14.3)
15.3 (12.5)
15.3 (13.3)
12.9 (10.9)
13.1 (11.2)
11.1 (9.7)
At least Lower ci < 5
MSK: musculoskeletal rehabilitation; NEUR: neurological rehabilitation; t1: admission; t2: discharge; all: combination of time-points and/or rehabilitation-groups; n:
sample size; CI: class intervals; SD: standard deviation; PSI: Person separation index; α: Cronbach’s alpha; DIF: differential item functioning; ci: confidence interval.
www.medicaljournals.se/jrm