Journal of Rehabilitation Medicine 51-9

Reliability of Fugl-Meyer Assessment of upper extremity in stroke summed scores in relatively small samples (generally ≤ 30). A recent study reported, however, weighted kappa values of ≥ 0.7 for inter-rater reliability of the individual item scores when FMA-UE scored from a video were compared with direct observation in chronic stroke (19). The item-level reliability evalua- tion is essential, since single items of FMA-UE have been proposed for prediction of motor recovery after stroke (20). The FMA-UE has been translated to se- veral languages and recently into Colombian Spanish (21) following the protocol and manual according to the original English/Swedish version (22). The cur- rent study used the translated Colombian Spanish FMA-UE for reliability evaluation. Thus, the aim of this study was to evaluate the intra- and inter-rater reliability of the FMA-UE at item, subscale and total score level in people with early subacute stroke. METHODS Population In total, 60 individuals with stroke were consecutively included during a 17-month period (Fig. 1). The inclusion criteria were: first-ever stroke, admitted to the Central Military Hospital of Colombia 4–9 days post-stroke, National Institutes of Health Stroke Scale (NIHSS) greater than 0 at admission, and age bet- ween 18 and 90 years. Exclusion criteria were: other disorders, such as blindness, deafness, amputation of lower or upper limb, cerebellar stroke. Patients who could not cooperate in FMA tes- ting due to impaired cognition or severe medical condition were also excluded. Ethical approval was received from the ethics committee of the Central Military Hospital (Act number 9, 12 June 2013) and a signed informed consent was obtained from all participants or their family member. Data collection was carried out between November 2014 and April 2016. The Strengthe- ning the Reporting of Observational studies in Epidemiology (STROBE) guidelines (23) and the checklist for reliability evaluation from the consensus-based standards for selection of health status measurement instruments (COSMIN) were followed to ensure the methodological quality of the study (24). The sample size was based on preliminary results from the pilot study with 10 individuals with stroke (21) and previous 653 studies using the rank invariant method for reliability testing at the item level (25). For the planned study design 60 individuals with stroke were considered to be sufficient. Fugl-Meyer Assessment of upper extremity The FMA-UE examines reflex activity, voluntary movements within, partially out and independent of synergies (22). The scale includes 33 items divided into 4 subscales: shoulder/elbow (A, 18 items), wrist (B, 5 items), hand (C, 7 items) and coordination/ speed (D, 3 items). Each item is scored on an ordinal 3-point scale, where 2 points are assigned when the movement is per- formed fully, 1 point when performed partially, and 0 points when the movement cannot be performed. A total score of 66 indicates better sensorimotor function. Three trained physiotherapists (raters A, B and C) with more than 20 years of clinical experience were randomly assigned into pairs to perform assessments. For practical reasons a fourth rater (also trained and experienced) was involved in assess- ments of 4 patients. These assessments were not included in the intra-rater analysis. All raters were involved in the translation process of the FMA-UE to Spanish, which also included joint practical training with guidance of experts and data collection for a previous pilot study (21). The patient’s performance on the FMA-UE was simultaneously, but independently, scored by one pair of raters on 2 consecutive days. The first assessment was performed between 4 and 9 days post-stroke. During the first assessment one of the raters was acting as test leader (i.e. instructing the patient and scoring) and the other as observer (scoring by observing). These roles were assigned randomly and switched on the second assessment day. The raters did not com- municate during the testing session or afterwards regarding the scoring. The scoring protocols of different colours were used for different days, and the completed protocols were stored in sealed envelopes, which were opened at the time of statistical analysis. Other clinical assessments The initial severity of the stroke was evaluated using the NIHSS at hospital admission (26–28). The minimum score of 0 indi- cates no impairment, and the maximum score of 42 indicates severe impairment. Stroke severity was classified as mild (0–4), moderate (5–15), or severe (16–24), or very severe (≥ 25) (26). The disability level was assessed by using the Modified Rankin Scale (0–6) at discharge, on which a lower score indicates less disability (29). Statistical analysis Fig 1. Flowchart for study inclusion. Descriptive statistics were calculated for the background data. The floor and ceiling effect for the FMA-UE was defined as present in the patient cohort when more than 15% of patients received the lowest or highest score of the scale (30). For the intra- and inter-rater reliability, a rank invariant met- hod especially designed for analysis of disagreements in paired ordinal data was used (18, 31, 32) (the software is available at http://avdic.se/svenssonsmetod.html). The systematic disagree ment between raters was expressed as relative position (RP), relative concentration (RC) and relative rank variation (RV) (18). RP indicates the extent to which the distribution of scores from an assessment is systematically shifted towards higher or lower categories. RC shows whether the scores are more or less concentrated towards the central categories of the scale compared with the other assessment. RP and RC values can vary from –1 to 1, where 0 means no difference between raters. J Rehabil Med 51, 2019

Journal of Rehabilitation Medicine 51-9 | Page 29