Overall
Cases from 23 rehabilitation clinics with
complete data in all FIM TM items
(n=11,103)
FIM™ internal construct validity revisited
Not selected in sampling
(n= 10,157)
Complete Calibration Sample
“FIM_all”
(n =946)
Musculoskeletal
Subsample “MSK_all”
(n =476)
Musculoskeletal
Subsample
time point 1
“MSK_t1”
(n=246)
Time point 1
Subsample
“t1_all”
(n=474)
Musculoskeletal
Subsample
time point 2
“MSK_t2”
(n=230)
Time point 2
Subsample
“t2_all”
(n=472)
Neurological Subsample
“NEUR_all”
(n = 470)
Neurological
Subsample
time point 1
“NEUR_t1”
(n=228)
Neurological
Subsample
time point 2
“NEUR_t2”
(n=242)
Fig. 1. Flow chart calibration sample with 3 different aggregation levels. FIM: Functional Independence Measure.
functioning (DIF) was evaluated, which indicates that, while
accounting for the trait, an item works differently for certain
groups defined by a contextual factor, such as gender or age.
The partial credit model was applied, which has been shown
previously to be the appropriate parametrization for the FIM™
(17, 28).
Baseline analyses
The baseline analysis tested how well the observed data from
all 18 items fit the Rasch model (15). To do so, the individual
and overall item-fit, the person-fit, the reliability indices α and
person separation index (PSI), and the χ 2 p-value of the item-
trait interaction standing for the fit of the data to the Rasch
model were ascertained. The respective acceptable levels are
represented in the bottom line of the corresponding results ta-
ble. In addition, local response dependency among items was
scrutinized, along with threshold disordering of item categories,
and DIF for the following 7 factors: gender, age (4 age groups
according to the interquartile ranges), nationality (Swiss or
other), insurance (general, semi-private, private), rehabilita-
tion group (neurological or musculoskeletal rehabilitation),
clinic language (German, French or Italian) and time-point
of measurement (admission t1, discharge t2). Both individual
item-fit and DIF analyses p-values are Bonferroni adjusted in
the RUMM2030 software.
Testlet approaches
195
Where the local independence assumption of the Rasch model
was not met, testlet approaches were applied. A testlet is a simple
sum score from a set of associated items, making the set into a
single new “super”-item in order to absorb their dependencies
(20, 29–31). The creation of testlets revealed positive results
in earlier Rasch analyses of the FIM™ motor scale (17). Two
different testlet approaches were used: one approach, refer-
red to as traditional testlet approach, creating testlets oriented
at conceptually associated items and based on their residual
correlations (32). By grouping similar items into super-items,
such as, for example, all the transfer items of the FIM™ , this
traditional testlet approach highlights the potential differences,
e.g. dimensionality between testlets unifies similar items, such
as “self-care”’ or “transfer”. The other approach, referred to
as alternative 2-testlet approach, divides conceptually similar
items into 2 distinct testlets of equal size, taking alternative
items in each testlet. This approach focuses on the total score
of the FIM™ rather than the single items or groups of items by
emphasizing the similarity of the items, as together they should
measure the concept of functional independence. In delivering a
bi-factor equivalent approach, the alternative 2-testlet approach
has the advantage of creating testlets of equal size, as recom-
mended by Andrich (29). Another advantage of the 2-testlet
approach is that it allows for a conditional test of fit. Further-
more, all testlet-based approaches allow the calculation of the
“explained common variance”’ attributable to the general “first
factor”, indicating the proportion of variance retained to create a
unidimensional latent estimate (29). Acceptable values of these
additional statistics are indicated at the bottom of the respective
testlet result table. The analysis of threshold disordering is not
meaningful at the level of testlets, as a particular score can be
derived in a number of ways, and is therefore not reported.
To ensure robustness of the results, the baseline analysis
and the best-fitting testlet approach was conducted at 3 le-
vels of aggregation of the calibration sample (see Fig. 1). In
J Rehabil Med 51, 2019