Journal of Rehabilitation Medicine 51-9

686 T. Winairuk et al. Table II. Intra-rater reliability of the S-BESTest, Brief-BESTest, and Mini-BESTest in people with subacute stroke (n = 12) Intra-rater reliability S-BESTest Total Domain Domain Domain Domain Domain Domain I: Biomechanical constraints II: Stability limits III: Anticipatory postural adjustment IV: Reactive postural response V: Sensory orientation VI: Stability in gait Brief-BESTest Mini-BESTest ICC (3,5) 95% CI ICC (3,5) 95% CI ICC (3,5) 95% CI 0.98 0.97 0.98 0.96 0.96 0.97 0.96 0.97–0.99 0.93–0.99 0.96–0.99 0.92–0.99 0.92–0.99 0.94–0.99 0.92–0.99 0.98 0.97 0.99 0.96 0.98 0.99 0.97 0.97–0.99 0.93–0.99 0.98–0.99 0.91–0.99 0.95–0.99 0.98–0.99 0.94–0.99 0.98 – – 0.95 0.97 0.95 0.98 0.97–0.99 – – 0.90–0.99 0.94–0.99 0.90–0.98 0.96–0.99 All intraclass correlation coefficient (ICCs) were significant, with p-value of < 0.001. 95% CI: 95% confidence interval. and 2, k, respectively, for the S-BESTest, Brief-BESTest and Mini-BESTest (27). The ICC values were interpreted using the criteria: 0.8 indicates good reliability, 0.8–0.6 indicates mode- rate reliability and 0.6–0.4 indicates weak reliability (20, 27). The concurrent validity of the S-BESTest, Brief-BESTest and Mini-BESTest was assessed with the BBS using the Spearman rank-order correlations. Floor and ceiling effect of S-BESTest, Brief-BESTest, Mini-BESTest, and BESTest were calculated as the percentage for minimum or maximum possible scores of the sample scoring, respectively. Floor and ceiling effects greater than or equal to 20% were interpreted as significant (28). Comparisons of balance scores between baseline and 2 weeks post-rehabilitation and between 2 and 4 weeks post-rehabilitation were analysed using paired t-test with significance level p < 0.05. Internal and external responsiveness of the S-BESTest, Brief- BESTest, Mini-BESTest, and BESTest were assessed. Internal responsiveness refers to the possibility of detecting any change before and after a known treatment. Internal responsiveness was examined using the standardized response mean (SRM) and mi- nimal detectable change (MDC) (29, 30). SRM of 0.8 or greater represented a large change, values from 0.5 to 0.8 represented moderate change, and values of 0.2–0.5 represented small change. MDC was calculated as the standard error of the mean (SEM) multiplied by 1.96(√2) (29, 30). SEM was calculated as standard deviation (SD) multiplied by √(1–reliability). The limitation of internal responsiveness is that it lacks information on the quality of changes, such as worsening or improvement (31). In contrast, external responsiveness is associated with the concept of clinical relevance, which depends on the choice of external standard (32). In this study, 2 external standard scales, the BBS and the GRC were selected to compare between the change in performance and patient’s own perception. External responsiveness was assessed by using receiver operator curve (ROC) analysis to establish which version of the BESTest could best identify patients whose balance had improved using a change in BBS score of 7 points as the milestone value for deciding if change had occurred (33, 34). The area under the curve (AUC) value was used to reflect this. The AUC values were compared across test versions using a t-test and significance level of 0.05. The ROC analysis was repeated using the change in GRC (5 points) (25). The AUC was used to interpret the probability of correctly discriminating between patients with and without balance improvement (29). An AUC of 0.8 or greater indicated excellent discrimination (27). Paired t-test was used to compare the AUC between 2 testing scales with significance level at p < 0.01. A likelihood ratio demonstrates accuracy of post-test probabilities; values of LR+ above 5 and values of LR– below 0.2 were considered meaningful (27). The optimal cut-off score was also chosen from the sensitivity and specificity (27). www.medicaljournals.se/jrm RESULTS Reliability A total of 12 patients with stroke (8 males and 4 females) were included in the reliability assessment. The mean age of patients was 58.42 years (SD 13.41 years) with a mean time since stroke onset of 40.60 days (SD 45.39 days). Correlation between concur- rent test with videotape scoring of the S-BESTest total scores (r = 0.97) and domain scores (r from 0.90 to 1.0) were excellent. The mean and SD of the S-BESTest scores at time 1 (day 1) and time 2 (day 7) were 20.45 (SD 0.66) and 20.53 (SD 1.13), respectively. Those scores of the Mini-BESTest were 12.62 (SD 1.11) (day 1) and 12.52 (SD 1.46) (day 7) and the scores of the Brief-BESTest were 8.32 (SD 0.53) (day 1) and 8.23 (SD 0.80) (day 7). The intra-rater and inter-rater reliability of the total score and domain score of the 3 short-form BESTests were excellent (ICC = 0.86–0.99) (Tables II and III). Concurrent validity and floor-ceiling effect Demographic and clinical characteristics of 70 patients with subacute stroke selected for the validity and responsiveness study are presented in Table IV. All patients had lower extremity and balance impairment based on the FM-LE and BESTest scores. High cor- relations between the BBS and the BESTest (r = 0.96), the S-BESTest (r = 0.95), the Brief-BESTest (r = 0.93) and the Mini-BESTest (r = 0.95) were observed, indi- cating excellent concurrent validity of all versions of the BESTest. Table V shows floor and ceiling effect using all BEStest scales across 3 intervals. It can be seen that the number of patients with minimum scores decreased between baseline and 4 weeks post-rehabi- litation, while the number of patients with maximum scores increased over time. The Brief-BESTest and the Mini-BESTest demonstrated a significant floor effect at baseline, but the S-BESTest and the BESTest sho- wed no significant floor effect (< 20%). Although all 4 balance scales showed no ceiling effect at 4 weeks,

Journal of Rehabilitation Medicine 51-9 | Page 62