654
E. D. Hernández et al.
Values within –0.1 and 0.1 were considered negligibly small
with reference to clinical relevance, while values outside this
range were considered as clinically relevant disagreements (33).
The RV indicates disagreement caused by individual variability
and varies between 0 and 1 and a value < 0.1 means that the
difference is negligible. Statistically significant disagreement of
RP, RC and RV was indicated with a 95% confidence interval
(95% CI) that did not include the value zero. Scatterplot and
relative operating curve (ROC) were used to visually analyse the
systematic disagreements. The degree of agreement was deter-
mined by using the percentage of agreement (PA). Agreement
≥ 70% was considered satisfactory. For the summed scores
(subscale and total scores), a minimum disagreement in points
to reach at least 70% PA was also calculated.
RESULTS
In total 105 patients were screened, of whom 60 (48%
women, mean age 65.9 years) met the inclusion criteria
and were assessed with the FMA-UE (Table I). The
main reason for exclusion was severe cognitive impair-
ment that hindered cooperation during the assessment
(n = 21) (Fig. 1). Among the included patients, 93%
had ischaemic stroke and 7% haemorrhagic stroke.
The FMA-UE scores of the entire group ranged from
4 to 66 points. Out of 60 patients 25% scored ≤ 48 and
25% ≥ 65. There was no floor effect observed, since
all patients received some points on the first occasion.
However, 13 patients (21.7%) received a full score
of 66 points on the first occasion, which indicates a
ceiling effect.
Table I. Demographic and clinical characteristics (n = 60)
Characteristics
Age, years, mean (SD)
Sex, male/female, %
Ischaemic/haemorrhagic stroke, %
Right/left hemiparesis, %
Thrombolysis, n
Hospitalization, days, mean (SD)
Modified Rankin Scale, median (Q1–Q3)
0 Without symptoms
1 Without significant disability
2 Mild disability
3 Moderate disability
4 Moderately severe disability
5 Severe disability
NIHSS Scale, median (Q1–Q3)
Mild 0–5
Moderate 6–14
Severe 15–24
Very severe ≥ 25
Patients without NIHSS scorings
Discharged from hospital, n
Home
Homecare
Intermediate care
Died in hospital
Fugl Meyer Assessment of upper extremity
FMA-UE, 1 st occasion, median (Q1–Q3)
FMA-UE, 2 nd occasion, median (Q1–Q3)
65.9 (17.3)
52/48
93/7
55/45
8
12 (10)
2 (1–4)
3
22
10
5
16
4
5 (3–10)
25
20
2
0
13
56
1
1
2
At the item level, statistically significant systematic
disagreement of relative position (RP) was noted for
shoulder flexion 0–90° (A.III.) and normal reflex ac-
tivity (A.V., Table II). All these disagreements were
positive, which indicate that a higher category was
systematically more frequently used for these items
on the second occasion. A negative RC value was
noted for one of the raters for elbow extension and
forearm pronation within extensor synergy, which
means that a more central scoring was more often
used on the first occasion compared with the second
within the same rater. This disagreement showed the
same tendency, as seen in RP values, indicating that a
higher score was more frequently used on the second
test occasion compared with the first for these items.
A shift towards higher score was also seen in the total
score A–D. Individual disagreements, measured as
RV, were all close to zero across all raters. Scatterplots
showing paired intra-rater and inter-rater assessments
of the total score A–D along with ROC are presented
in Fig. 2. A curved ROC indicates disagreement in
position and an S-shaped curve indicates that the raters
concentrate their assessments differently on the scale
categories. Exact RP and RC values along with 95%
CI are displayed in Tables SI–II 1 .
The PA between test occasion 1 and 2 within each
rater was above 79% for all tested items (Tables II
and III). For the reflex activity (A.I.), full agreement
was reached. Full agreement at least in one rater was
also noted for following items: hand to lumbar spine,
mass flexion and extension of the hand, cylinder and
spherical grasp. The PA was, as expected, lower for the
subscale A (48–59%), B, C and D (63–89%), and for
the total score A–D (33–46%), than for single items,
since the sum-scores include larger number of catego-
ries. A 70% PA was reached for subscale B, C and D
when a 1-point difference between test occasions was
accepted. Two- and 3-point difference was needed to
reach 70% PA in all 3 raters for the subscale A and the
total score A–D, respectively.
Inter-rater reliability
A statistically significant systematic disagreement in
RC was noted for the forearm pronation (A.II.), which
means that the rater with a role of leader was syste-
matically using a more central score compared with
the rater who acted as observer (Table II). All other
observed systematic disagreements were negligible or
not statistically significant. Individual disagreements,
58 (48–65)
59.5 (45–66)
FMA-UE: Fugl-Meyer Assessment Upper Extremity; SD: standard deviation;
NIHSS: National Institutes of Health Stroke Scale.
www.medicaljournals.se/jrm
Intra-rater reliability
http://www.medicaljournals.se/jrm/content/?doi=10.2340/16501977-2590
1