Journal of Rehabilitation Medicine 51-3

Overall Cases from 23 rehabilitation clinics with complete data in all FIM TM items (n=11,103) FIM™ internal construct validity revisited Not selected in sampling (n= 10,157) Complete Calibration Sample “FIM_all” (n =946) Musculoskeletal Subsample “MSK_all” (n =476) Musculoskeletal Subsample time point 1 “MSK_t1” (n=246) Time point 1 Subsample “t1_all” (n=474) Musculoskeletal Subsample time point 2 “MSK_t2” (n=230) Time point 2 Subsample “t2_all” (n=472) Neurological Subsample “NEUR_all” (n = 470) Neurological Subsample time point 1 “NEUR_t1” (n=228) Neurological Subsample time point 2 “NEUR_t2” (n=242) Fig. 1. Flow chart calibration sample with 3 different aggregation levels. FIM: Functional Independence Measure. functioning (DIF) was evaluated, which indicates that, while accounting for the trait, an item works differently for certain groups defined by a contextual factor, such as gender or age. The partial credit model was applied, which has been shown previously to be the appropriate parametrization for the FIM™ (17, 28). Baseline analyses The baseline analysis tested how well the observed data from all 18 items fit the Rasch model (15). To do so, the individual and overall item-fit, the person-fit, the reliability indices α and person separation index (PSI), and the χ 2 p-value of the item- trait interaction standing for the fit of the data to the Rasch model were ascertained. The respective acceptable levels are represented in the bottom line of the corresponding results ta- ble. In addition, local response dependency among items was scrutinized, along with threshold disordering of item categories, and DIF for the following 7 factors: gender, age (4 age groups according to the interquartile ranges), nationality (Swiss or other), insurance (general, semi-private, private), rehabilita- tion group (neurological or musculoskeletal rehabilitation), clinic language (German, French or Italian) and time-point of measurement (admission t1, discharge t2). Both individual item-fit and DIF analyses p-values are Bonferroni adjusted in the RUMM2030 software. Testlet approaches 195 Where the local independence assumption of the Rasch model was not met, testlet approaches were applied. A testlet is a simple sum score from a set of associated items, making the set into a single new “super”-item in order to absorb their dependencies (20, 29–31). The creation of testlets revealed positive results in earlier Rasch analyses of the FIM™ motor scale (17). Two different testlet approaches were used: one approach, refer- red to as traditional testlet approach, creating testlets oriented at conceptually associated items and based on their residual correlations (32). By grouping similar items into super-items, such as, for example, all the transfer items of the FIM™ , this traditional testlet approach highlights the potential differences, e.g. dimensionality between testlets unifies similar items, such as “self-care”’ or “transfer”. The other approach, referred to as alternative 2-testlet approach, divides conceptually similar items into 2 distinct testlets of equal size, taking alternative items in each testlet. This approach focuses on the total score of the FIM™ rather than the single items or groups of items by emphasizing the similarity of the items, as together they should measure the concept of functional independence. In delivering a bi-factor equivalent approach, the alternative 2-testlet approach has the advantage of creating testlets of equal size, as recom- mended by Andrich (29). Another advantage of the 2-testlet approach is that it allows for a conditional test of fit. Further- more, all testlet-based approaches allow the calculation of the “explained common variance”’ attributable to the general “first factor”, indicating the proportion of variance retained to create a unidimensional latent estimate (29). Acceptable values of these additional statistics are indicated at the bottom of the respective testlet result table. The analysis of threshold disordering is not meaningful at the level of testlets, as a particular score can be derived in a number of ways, and is therefore not reported. To ensure robustness of the results, the baseline analysis and the best-fitting testlet approach was conducted at 3 le- vels of aggregation of the calibration sample (see Fig. 1). In J Rehabil Med 51, 2019

Journal of Rehabilitation Medicine 51-3 | Page 49