Journal of Rehabilitation Medicine 51-9

656 E. D. Hernández et al. Table III. Intra-rater agreement within each rater (A,B and C) and inter-rater agreement between all raters during test occasion 1 and 2 for Fugl-Meyer Assessment Upper Extremity (FMA-UE) subscale B, C and D items, sums and the total score A–D Intra-rater agreement (PA %) Rater A B. WRIST Stability at 15° dorsiflexion Repeated wrist flexion Stability at 15° dorsiflexion Repeated wrist flexion Circumduction SUM B, range 0–10p SUM B, 1 point C. HAND Mass flexion Mass extension Hook grasp Thumb adduction Opposition/pincer grasp Cylinder grasp Spherical grasp SUM C, range 0–14p SUM C, 1 point D. COORDINATION/SPEED Tremor Dysmetria Time SUM D, range 0–6p TOTAL A–D, range 0–66p TOTAL A–D, 1 point difference TOTAL A–D, 2 points difference TOTAL A–D, 3 points difference Rater B Inter-rater agreement (PA %) Rater C Test occasion 1 Test occasion 2 90 92 85 87 90 66 88 94 91 88 88 91 74 87 89 89 83 89 97 72 87 98 97 95 97 93 89 97 98 97 95 97 93 89 97 97 97 90 92 85 92 92 75 95 100 100 94 97 91 100 100 89 97 100 100 89 94 92 94 97 74 92 100 100 100 97 93 97 98 93 98 100 100 100 100 100 98 100 98 100 87 82 85 67 94 88 88 71 92 83 97 78 93 90 98 85 93 92 98 85 33 (RC) a 60 70 78 45 61 66 79 46 61 68 82 67 80 87 97 75 83 88 93 a Statistically significant disagreement (absolute value of RP/RC≥ 0.1 and 95% CI does not include 0) marked in bold. PA: percentage of agreement, RP: relative position; RC: relative concentration. measured as RV, were all close to zero. Exact RP and RC values along with 95% CI are displayed in Tables SIII–IV 1 . The PA was above 90% for all items between the raters (Tables II and III). Full agreement (100%) was observed for reflex activity (A.I.), adduction/internal rotation and elbow extension (A.II), stability of wrist (B), mass extension and flexion of the hand, hook grasp, thumb adduction, pincer grasp and spherical grasp (C) at least in 1 of the test occasions. At the subscale and total score level the PA varied between 67% and 93% on the first test occasion and between 75% and 98% on the second test occasion, which indi- cates improved agreement on the second test occasion. An 80% PA was reached for the subscale A and total score A–D when a 1-point difference between test oc- casions was accepted. DISCUSSION This study demonstrated that the FMA-UE is a re- liable clinical instrument for the evaluation of upper extremity motor function early after stroke. Only one item (forearm pronation within synergies) in the inter-rater reliability and 4 items (elbow extension, forearm pronation within synergies, shoulder flexion to 90°, normal reflex activity) out of 33 in the intra- rater reliability testing showed statistically significant www.medicaljournals.se/jrm systematic disagreements, either in relative position or in concentration. A systematic shift towards higher scores on the second test occasion within the same rater was observed for some items and for the total score. In addition, the intra- and inter-rater agreement was high (79–100%) for all single items, which confirms that the use of single items from the FMA-UE might be warranted. The 70% intra-rater agreement was also reached for the subscale C, but a 1-point difference was needed for the subscale B and D and a 3-point for the subscale A and the total score A–D. Inter-rater agre- ement was above 80% for subscales B, C and D, and only 1-point difference was needed for subscale A and the total score A–D to reach this level of agreement. This study is the first to investigate the item-level intra- and inter-rater reliability of the FMA-UE in a relatively large sample of patients early after stroke. Previous studies have to a large extent evaluated reliability in relatively small samples and used sta- tistical methods, such as ICC, which are less suitable for ordinal data (33). However, a recent study used weighted kappa statistics and reported high item-level reliability when the scorings of the FMA-UE were made from the video (19). Weighted kappa is a com- monly used measure of agreement, but it still fails to identify the systematic disagreements and ignores the rank invariant properties of ordinal data. It also assumes that the raters have equal skill level, which

Journal of Rehabilitation Medicine 51-9 | Page 32