Reliability of Fugl-Meyer Assessment of upper extremity in stroke
summed scores in relatively small samples (generally
≤ 30). A recent study reported, however, weighted
kappa values of ≥ 0.7 for inter-rater reliability of the
individual item scores when FMA-UE scored from
a video were compared with direct observation in
chronic stroke (19). The item-level reliability evalua-
tion is essential, since single items of FMA-UE have
been proposed for prediction of motor recovery after
stroke (20). The FMA-UE has been translated to se-
veral languages and recently into Colombian Spanish
(21) following the protocol and manual according to
the original English/Swedish version (22). The cur-
rent study used the translated Colombian Spanish
FMA-UE for reliability evaluation. Thus, the aim of
this study was to evaluate the intra- and inter-rater
reliability of the FMA-UE at item, subscale and total
score level in people with early subacute stroke.
METHODS
Population
In total, 60 individuals with stroke were consecutively included
during a 17-month period (Fig. 1). The inclusion criteria were:
first-ever stroke, admitted to the Central Military Hospital of
Colombia 4–9 days post-stroke, National Institutes of Health
Stroke Scale (NIHSS) greater than 0 at admission, and age bet-
ween 18 and 90 years. Exclusion criteria were: other disorders,
such as blindness, deafness, amputation of lower or upper limb,
cerebellar stroke. Patients who could not cooperate in FMA tes-
ting due to impaired cognition or severe medical condition were
also excluded. Ethical approval was received from the ethics
committee of the Central Military Hospital (Act number 9, 12
June 2013) and a signed informed consent was obtained from all
participants or their family member. Data collection was carried
out between November 2014 and April 2016. The Strengthe-
ning the Reporting of Observational studies in Epidemiology
(STROBE) guidelines (23) and the checklist for reliability
evaluation from the consensus-based standards for selection
of health status measurement instruments (COSMIN) were
followed to ensure the methodological quality of the study (24).
The sample size was based on preliminary results from the
pilot study with 10 individuals with stroke (21) and previous
653
studies using the rank invariant method for reliability testing at
the item level (25). For the planned study design 60 individuals
with stroke were considered to be sufficient.
Fugl-Meyer Assessment of upper extremity
The FMA-UE examines reflex activity, voluntary movements
within, partially out and independent of synergies (22). The scale
includes 33 items divided into 4 subscales: shoulder/elbow (A,
18 items), wrist (B, 5 items), hand (C, 7 items) and coordination/
speed (D, 3 items). Each item is scored on an ordinal 3-point
scale, where 2 points are assigned when the movement is per-
formed fully, 1 point when performed partially, and 0 points
when the movement cannot be performed. A total score of 66
indicates better sensorimotor function.
Three trained physiotherapists (raters A, B and C) with more
than 20 years of clinical experience were randomly assigned
into pairs to perform assessments. For practical reasons a fourth
rater (also trained and experienced) was involved in assess-
ments of 4 patients. These assessments were not included in the
intra-rater analysis. All raters were involved in the translation
process of the FMA-UE to Spanish, which also included joint
practical training with guidance of experts and data collection
for a previous pilot study (21). The patient’s performance on
the FMA-UE was simultaneously, but independently, scored by
one pair of raters on 2 consecutive days. The first assessment
was performed between 4 and 9 days post-stroke. During the
first assessment one of the raters was acting as test leader (i.e.
instructing the patient and scoring) and the other as observer
(scoring by observing). These roles were assigned randomly and
switched on the second assessment day. The raters did not com-
municate during the testing session or afterwards regarding the
scoring. The scoring protocols of different colours were used for
different days, and the completed protocols were stored in sealed
envelopes, which were opened at the time of statistical analysis.
Other clinical assessments
The initial severity of the stroke was evaluated using the NIHSS
at hospital admission (26–28). The minimum score of 0 indi-
cates no impairment, and the maximum score of 42 indicates
severe impairment. Stroke severity was classified as mild (0–4),
moderate (5–15), or severe (16–24), or very severe (≥ 25) (26).
The disability level was assessed by using the Modified Rankin
Scale (0–6) at discharge, on which a lower score indicates less
disability (29).
Statistical analysis
Fig 1. Flowchart for study inclusion.
Descriptive statistics were calculated for the background data.
The floor and ceiling effect for the FMA-UE was defined as
present in the patient cohort when more than 15% of patients
received the lowest or highest score of the scale (30).
For the intra- and inter-rater reliability, a rank invariant met-
hod especially designed for analysis of disagreements in paired
ordinal data was used (18, 31, 32) (the software is available at
http://avdic.se/svenssonsmetod.html). The systematic disagree
ment between raters was expressed as relative position (RP),
relative concentration (RC) and relative rank variation (RV)
(18). RP indicates the extent to which the distribution of scores
from an assessment is systematically shifted towards higher
or lower categories. RC shows whether the scores are more or
less concentrated towards the central categories of the scale
compared with the other assessment. RP and RC values can
vary from –1 to 1, where 0 means no difference between raters.
J Rehabil Med 51, 2019