686
T. Winairuk et al.
Table II. Intra-rater reliability of the S-BESTest, Brief-BESTest, and Mini-BESTest in people with subacute stroke (n = 12)
Intra-rater reliability
S-BESTest
Total
Domain
Domain
Domain
Domain
Domain
Domain
I: Biomechanical constraints
II: Stability limits
III: Anticipatory postural adjustment
IV: Reactive postural response
V: Sensory orientation
VI: Stability in gait
Brief-BESTest
Mini-BESTest
ICC (3,5) 95% CI ICC (3,5) 95% CI ICC (3,5) 95% CI
0.98
0.97
0.98
0.96
0.96
0.97
0.96 0.97–0.99
0.93–0.99
0.96–0.99
0.92–0.99
0.92–0.99
0.94–0.99
0.92–0.99 0.98
0.97
0.99
0.96
0.98
0.99
0.97 0.97–0.99
0.93–0.99
0.98–0.99
0.91–0.99
0.95–0.99
0.98–0.99
0.94–0.99 0.98
–
–
0.95
0.97
0.95
0.98 0.97–0.99
–
–
0.90–0.99
0.94–0.99
0.90–0.98
0.96–0.99
All intraclass correlation coefficient (ICCs) were significant, with p-value of < 0.001. 95% CI: 95% confidence interval.
and 2, k, respectively, for the S-BESTest, Brief-BESTest and
Mini-BESTest (27). The ICC values were interpreted using the
criteria: 0.8 indicates good reliability, 0.8–0.6 indicates mode-
rate reliability and 0.6–0.4 indicates weak reliability (20, 27).
The concurrent validity of the S-BESTest, Brief-BESTest
and Mini-BESTest was assessed with the BBS using the
Spearman rank-order correlations. Floor and ceiling effect
of S-BESTest, Brief-BESTest, Mini-BESTest, and BESTest
were calculated as the percentage for minimum or maximum
possible scores of the sample scoring, respectively. Floor and
ceiling effects greater than or equal to 20% were interpreted
as significant (28). Comparisons of balance scores between
baseline and 2 weeks post-rehabilitation and between 2 and
4 weeks post-rehabilitation were analysed using paired t-test
with significance level p < 0.05.
Internal and external responsiveness of the S-BESTest, Brief-
BESTest, Mini-BESTest, and BESTest were assessed. Internal
responsiveness refers to the possibility of detecting any change
before and after a known treatment. Internal responsiveness was
examined using the standardized response mean (SRM) and mi-
nimal detectable change (MDC) (29, 30). SRM of 0.8 or greater
represented a large change, values from 0.5 to 0.8 represented
moderate change, and values of 0.2–0.5 represented small change.
MDC was calculated as the standard error of the mean (SEM)
multiplied by 1.96(√2) (29, 30). SEM was calculated as standard
deviation (SD) multiplied by √(1–reliability). The limitation of
internal responsiveness is that it lacks information on the quality
of changes, such as worsening or improvement (31).
In contrast, external responsiveness is associated with the
concept of clinical relevance, which depends on the choice of
external standard (32). In this study, 2 external standard scales,
the BBS and the GRC were selected to compare between the
change in performance and patient’s own perception. External
responsiveness was assessed by using receiver operator curve
(ROC) analysis to establish which version of the BESTest could
best identify patients whose balance had improved using a change
in BBS score of 7 points as the milestone value for deciding if
change had occurred (33, 34). The area under the curve (AUC)
value was used to reflect this. The AUC values were compared
across test versions using a t-test and significance level of 0.05.
The ROC analysis was repeated using the change in GRC (5
points) (25). The AUC was used to interpret the probability
of correctly discriminating between patients with and without
balance improvement (29). An AUC of 0.8 or greater indicated
excellent discrimination (27). Paired t-test was used to compare
the AUC between 2 testing scales with significance level at
p < 0.01. A likelihood ratio demonstrates accuracy of post-test
probabilities; values of LR+ above 5 and values of LR– below
0.2 were considered meaningful (27). The optimal cut-off score
was also chosen from the sensitivity and specificity (27).
www.medicaljournals.se/jrm
RESULTS
Reliability
A total of 12 patients with stroke (8 males and 4
females) were included in the reliability assessment.
The mean age of patients was 58.42 years (SD 13.41
years) with a mean time since stroke onset of 40.60
days (SD 45.39 days). Correlation between concur-
rent test with videotape scoring of the S-BESTest total
scores (r = 0.97) and domain scores (r from 0.90 to 1.0)
were excellent. The mean and SD of the S-BESTest
scores at time 1 (day 1) and time 2 (day 7) were 20.45
(SD 0.66) and 20.53 (SD 1.13), respectively. Those
scores of the Mini-BESTest were 12.62 (SD 1.11)
(day 1) and 12.52 (SD 1.46) (day 7) and the scores of
the Brief-BESTest were 8.32 (SD 0.53) (day 1) and
8.23 (SD 0.80) (day 7). The intra-rater and inter-rater
reliability of the total score and domain score of the 3
short-form BESTests were excellent (ICC = 0.86–0.99)
(Tables II and III).
Concurrent validity and floor-ceiling effect
Demographic and clinical characteristics of 70 patients
with subacute stroke selected for the validity and
responsiveness study are presented in Table IV. All
patients had lower extremity and balance impairment
based on the FM-LE and BESTest scores. High cor-
relations between the BBS and the BESTest (r = 0.96),
the S-BESTest (r = 0.95), the Brief-BESTest (r = 0.93)
and the Mini-BESTest (r = 0.95) were observed, indi-
cating excellent concurrent validity of all versions of
the BESTest. Table V shows floor and ceiling effect
using all BEStest scales across 3 intervals. It can be
seen that the number of patients with minimum scores
decreased between baseline and 4 weeks post-rehabi-
litation, while the number of patients with maximum
scores increased over time. The Brief-BESTest and the
Mini-BESTest demonstrated a significant floor effect
at baseline, but the S-BESTest and the BESTest sho-
wed no significant floor effect (< 20%). Although all
4 balance scales showed no ceiling effect at 4 weeks,