INTRODUCTION
Lateral ankle sprains (LASs) are among the most common musculoskeletal injuries.1 Historically, LASs have been considered minor injuries with good recovery, but up to 70% of individuals experience residual symptoms and develop chronic ankle instability (CAI).1 This condition is characterized by various chronic residual symptoms that persist after 12 months following the initial LAS.2 Biological sex is considered one of the risk factors for LAS.3,4 Although a previous study has reported comparable incidence rates between males and females,5 a systematic review concluded that females are at higher risk of sustaining an initial LAS than males.3 Interestingly, following an initial LAS, the risk of sustaining a subsequent LAS was greater in males but not in females.4
CAI has been associated with impaired postural control and functional performance deficits.6 The balance and functional hop tasks used in this study include the Star Excursion Balance Test (SEBT), the static single-leg stance test using the Biodex Balance System (BBS), the triple-crossover hop (TCH) test, and the figure-of-8 hop (F8H) test. These tasks provide quantifiable measures of static and dynamic postural control, as well as functional performance. Researchers and clinicians have used these tasks to identify predictors of lower extremity injury, functional performance deficits, and/or return-to-sport decisions.6–8 Therefore, accurate assessment procedures are of the utmost importance when measuring outcomes using these tasks. However, there is limited data available regarding the number of practice trials required prior to actual testing to reduce learning effects.
Evidence demonstrates that a certain number of practice trials are necessary to obtain repeatable and consistent measures of the SEBT and BBS.9–14 For the SEBT, six practice trials were initially recommended,9,10 but more recent studies have suggested reducing the number of practice trials from six to four.11,12 For the BBS, while two studies have recommended two to three practice trials for dynamic postural control assessment,13,14 scientific evidence regarding practice trials for static postural control assessment remains lacking. Currently, available data support the need for practice trials to become familiar with hop tests, but the number of practice trials used varies widely, ranging from one to 10.15,16 As these tests are widely used by researchers and clinicians, it is crucial to have uniform practice procedures to ensure the highest level of repeatability.
Many studies used to determine the standard number of practice trials utilized small sample sizes and did not account for potential sex differences.9–14 Biological sex plays a significant role in physical performance due to biomechanical and/or physiological factors, including force loading rates,17 muscle mass,18 and cross-sectional area of type Ⅱa muscle fibers.19 For example, Demirbüken et al. demonstrated that while males showed significantly higher leg stiffness than females under preferred hopping conditions, the sex difference disappeared under fastest hopping conditions as females increased their leg stiffness more than males.20 The authors suggest that males and females adopt different movement strategies to achieve performance demands.20 Furthermore, females demonstrated significantly better static postural stability than males, but no differences were observed in dynamic postural stability.21 Given these sex differences, it may be necessary to determine the optimal number of practice trials required for males and females separately.
Therefore, the primary purpose of this study was to determine the number of practice trials necessary to achieve consistent performance on a clinical static balance test (BBS), a dynamic balance test (SEBT), and hop tests (TCH and F8H tests) between male and female participants. We hypothesized that the recommended number of practice trials would differ between male and female participants for balance and hop tests.
METHODS
PARTICIPANTS
One hundred participants with CAI were recruited from a university population. Participants with CAI were recruited to represent a lower extremity joint injury population because the ankle is the second most-injured joint in the body and recurrent injuries often lead to CAI.22 Participants were selected based on the recommendation of the International Ankle Consortium.2 Inclusion criteria included: (i) a history of at least one significant ankle sprain that occurred at least 12 months prior to the study enrollment, (ii) at least two “giving way” episodes in the prior six months, (iii) a score of < 90% on the Foot and Ankle Ability Measure – Activities of Daily Living (FAAM-ADL),23 a score of < 80% on the FAAM-Sports scale,23 and at least five “yes” responses on the Ankle Instability Instrument (AII),24 and (iv) a history of a moderate level of weight-bearing physical activity at least three days per week for a total of 90 min in the previous three months. Exclusion criteria included: (i) a history of lower extremity surgeries to the musculoskeletal structures (e.g., bones, joint structures, and nerves), (ii) a history of lower extremity fractures, and/or (iii) acute musculoskeletal injury to lower extremity joints in the prior three months. Each participant provided informed consent prior to enrollment and participation. This study was approved by the Brigham Young University Institutional Review Board (Human Research Ethics Committee approval number: F16455).
EXPERIMENTAL PROCEDURES
The experimental procedures are illustrated in Figure 1. Each participant reported to the biomechanics laboratory for two separate sessions: DAY 1 (practice trial session) and DAY 2 (test trial session), separated by two to three days. Both sessions were used for data collection. Participants performed the tests in athletic clothing and shoes. On DAY 1, participants were familiarized with the clinical tests, including the SEBT anterior (ANT), posteromedial (PM), and posterolateral (PL) directions; the BBS single-leg stance test; the TCH test; and the F8H test. The order of the tests was randomized using a random number generator. Given the limited data to support the appropriate number of practice trials, six practice trials were performed based on recommendations determined by Hertel et al.10 On DAY 2, participants performed one warm-up trial and three test trials for each clinical test. To minimize learning and fatigue effects on clinical testing, the test trial session was conducted on a separate day, which enabled accurate recording of both practice and test trial data. A three-minute rest was provided between each clinical test. The determination of the optimal number of practice trials was based on the statistical stabilization of performance (i.e., a plateau).10,11 This plateau was identified by assessing the point at which no further significant changes (p > 0.05) occurred, while simultaneously verifying that performance increments fell within the range of measurement error (i.e., Minimal Detectable Change [MDC]).11,14
STAR EXCURSION BALANCE TEST
For the SEBT, Hertel et al. found a high intratester reliability with intraclass correlation coefficient (ICC) values ranging from 0.78 to 0.96.10 Participants performed the test according to the methods of Plisky et al.7 Specifically, participants performed the test barefoot in the ANT, PM, and PL directions. The participant stood on a single leg in the center of a grid, with the most distal aspect of the great toe at the starting line. While maintaining a single-leg stance, the participant was instructed to reach with the free limb in the ANT, PM, and PL directions in relation to the stance foot. The maximal reach distance was measured by marking the tape measure with erasable ink at the point where the most distal part of the foot reached. The trial was discarded and repeated if the participant (i) failed to maintain a single-leg stance, (ii) lifted or moved the stance foot from the ground, (iii) touched the ground with the reach foot, (iv) failed to return the reach foot to the starting position, or (v) removed the hands from the hips. There was a 60-second rest between each trial. Reach distances were recorded to the nearest 0.5 cm and normalized to the participant’s limb length (from anterior superior iliac spine to medial malleolus) using the formula: (reach distance / limb length) × 100.
BIODEX BALANCE SYSTEM
The BBS was used to measure the overall stability index (OSI) during static single-leg stance, calculated as the sum of anterior-posterior stability index (APSI) and medial-lateral stability index (MLSI). Arifin et al. found excellent intra-rater test-retest reliability with ICC values of 0.78 to 0.85.8 Participants performed the test according to the methods of Arifin et al.8 Participants performed the test barefoot on the Biodex platform. Participants were instructed to stand on their dominant leg in full extension while holding the opposite knee to 90° flexion and keeping both hands on their hips. The dominant leg was determined by asking participants which leg they preferred to use when kicking a ball. Participants were then instructed to position the tested foot at the center of the platform and look ahead at the visual feedback display adjusted at their eye level to prevent vestibular distraction and head movement. They adjusted their foot to a comfortable standing position while maintaining the moving pointer at or near the center point of the visual feedback display. The platform was then locked in a stable position, and foot placement was recorded per manufacturer’s guidelines. The position of the foot remained constant throughout the static balance test. Each trial consisted of three 20-second repetitions. There was a 60-second rest between each trial. During the 60-second rest, participants were encouraged to bear their weight on the opposite limb to minimize fatigue in the tested limb. The BBS software automatically computed the OSI, APSI, and MLSI for analysis.
TRIPLE-CROSSOVER HOP TEST
For the TCH test, Bolgla and Keskula found excellent intratester reliability with ICC values of 0.96.25 Participants performed the test according to the methods of Wikstrom et al.26 Participants performed the test with athletic shoes. Participants were instructed to start by standing on a single leg, with their non-stance limb flexed to 90° at the knee, and their hip flexed until the thighs of both lower extremities were parallel. Participants were then instructed to hop three times from a start line in a zigzag pattern, crossing over a line that was 6 m long and 15 cm wide. The first and final hops were towards the lateral side of the tested limb. The distance was measured using a tape measure from the start line to where the heel landed on the third hop. Failed trials were counted if participants put the contralateral limb down, fell, touched the 15-cm wide tape, or did not complete the test. There was a 60-second rest between each trial. The total distance was recorded to the nearest 0.01 m.
FIGURE-OF-8 HOP TEST
The F8H test was positively correlated with functional ankle instability status (r = 0.31), indicating that participants with functional ankle instability performed poorly during this task.27 Participants performed the test according to the methods of Docherty et al.27 Participants performed the test with athletic shoes. Participants were instructed to maintain a single-leg stance position on the tested limb only while holding the opposite knee to 90° flexion. Participants performed a single-leg hop in a figure-of-8 pattern twice around a 5-m course outlined by two cones. An investigator recorded the total time using a handheld stopwatch. A 50-cm long plastic bar, adjusted to the participant’s chest level, was used to determine when the participant crossed the finish line. Participants were instructed to perform the test as quickly as possible. The trial was discarded and repeated if the participant touched the floor with their contralateral limb, fell, touched the cones, or did not complete the course twice. There was a 60-second rest between each trial. The total time was recorded to the nearest 0.01 seconds.
DATA ANALYSIS
For all clinical tests, data from six practice trials and three test trials were recorded over two sessions, scheduled two to three days apart. The test value was calculated as the mean of the three test trials performed on DAY 2 (test trial session), which occurred two to three days after the DAY 1 (practice trial session).
STATISTICAL ANALYSIS
All collected data were analyzed using IBM SPSS statistics version 25.0 (SPSS Inc, Chicago, IL). Two-way analysis of variance with repeated measures was used for each variable of the clinical tests. When statistical significance was achieved, pairwise comparisons with Bonferroni adjustments were used to detect a possible trial effect between sexes. When a main effect was observed, estimated marginal means were reported to demonstrate the overall size of the main effect. The effect size was calculated as partial eta squared which was interpreted as ≥ 0.01 was small, ≥ 0.06 was medium, and ≥ 0.14 was large.28 To assess the reliability between trials, ICC (3,1) were calculated using a two-way mixed effects model with absolute agreement. ICC values were interpreted as follows: < 0.40 was poor, 0.40–0.75 was fair to good, > 0.75 was excellent.29 Standard error of measurement (SEM) and MDC were calculated to establish random error scores and to evaluate performance stabilization. SEM was calculated as standard deviation (SD) × and MDC was calculated as 1.96 × × SEM. The level of significance for all statistical tests was set at p < 0.05.
RESULTS
One hundred participants with CAI (50 males and 50 females) completed the study. Sixty-six participants had bilateral ankle sprains and chose the involved limb based on a greater perceived feeling of instability. The remaining 34 participants reported unilateral ankle sprains. Male participants were significantly taller (182.4 ± 8.1 cm vs. 166.3 ± 6.2 cm, p < 0.01) and heavier (80.8 ± 12.2 kg vs. 67.4 ± 12.2 kg, p < 0.01) than female participants. There were no significant differences between groups in sports participation, FAAM-ADL, FAAM-Sports, AII scores, or number of previous ankle sprains (p > 0.05). Participant demographic information is presented in Table 1.
INTERACTION EFFECT AND MAIN EFFECT
The interaction and main effects for group and trial are presented in Table 2. There was a significant group-by-trial interaction for the hop tests (TCH test: F6,588 = 6.85, p < 0.01, = 0.07; F8H test: F6,588 = 2.33, p = 0.03, = 0.02). In contrast, there was no significant group-by-trial interaction for the SEBT (ANT: F6,588 = 0.55, p = 0.77, = 0.01; PM: F6,588 = 0.86, p = 0.53, = 0.01; PL: F6,588 = 1.51, p = 0.17, = 0.02) and BBS (OSI: F6,588 = 0.06, p = 1.00, = 0.00; APSI: F6,588 = 1.02, p = 0.41, = 0.01; MLSI: F6,588 = 0.58, p = 0.75, = 0.01). The results of the post hoc pairwise comparisons are presented in Table 3. There was a significant main effect of trial for all dependent variables (F6,588 = 7.97–76.96, p < 0.01, = 0.08–0.44). The estimated marginal means are presented in Table 4. Furthermore, descriptive statistics (raw mean, 95% confidence interval [CI], and SD) and reliability indices (ICC with 95% CI, SEM, and MDC) for each clinical test are presented in Tables 5, 6, and 7.
STAR EXCURSION BALANCE TEST
Although there was no significant group-by-trial interaction, a significant main effect of trial was found (ANT: F6,588 = 29.22, p < 0.01, = 0.23; PM: F6,588 = 27.67, p < 0.01, = 0.22; PL: F6,588 = 24.45, p < 0.01, = 0.20) (Table 2). Overall, at least four, three, and four practice trials were recommended to achieve consistent performance in the ANT, PM, and PL directions, respectively (Figure 2). Reliability indices for these directions showed ICC values ranging from 0.66 to 0.86, with SEM and MDC values ranging from 2.59–5.50 and 7.17–15.24, respectively (Table 5).
BIODEX BALANCE SYSTEM
Although there was no significant group-by-trial interaction, a significant main effect of trial was found (OSI: F6,588 = 9.80, p < 0.01, = 0.09; APSI: F6,588 = 8.23, p < 0.01, = 0.08; MLSI: F6,588 = 7.97, p < 0.01, = 0.08) (Table 2). Overall, at least four, two, and five practice trials were recommended to achieve consistent performance in the OSI, APSI, and MLSI, respectively (Figure 2). Reliability indices for these outcomes showed ICC values ranging from 0.47 to 0.57, with SEM and MDC values ranging from 0.10–0.30 and 0.28–0.84, respectively (Table 6).
HOP TESTS
There was a significant group-by-trial interaction for the hop tests (Table 2). Specifically, male participants were required to complete more than six practice trials, whereas female participants were required to complete at least five practice trials for the TCH test (Figure 3). For the F8H test, male participants were required to complete at least five practice trials, while female participants were required to complete at least three practice trials (Figure 3). Reliability indices for these outcomes showed ICC values ranging from 0.89 to 0.92 (Table 7). For the TCH test, SEM and MDC values ranged from 0.32–0.40 and 0.87–1.11, respectively, while for the F8H test, these values ranged from 0.66–0.77 and 1.84–2.15, respectively (Table 7).
DISCUSSION
This study was designed to determine the number of practice trials necessary to achieve consistent performance on clinical static and dynamic balance tests and hop tests, with a particular focus on sex-based differences. It was hypothesized that the recommended number of practice trials would differ between male and female participants performing balance and hop tests. The findings partially supported this hypothesis. First, the recommended number of practice trials for the hop tests varied by sex. Specifically, males were required to perform more than six practice trials for the TCH test and at least five practice trials for the F8H test, whereas females were required to perform at least five and three trials, respectively. These findings suggest that females may reach a performance plateau more quickly than males in hop tests, and therefore require fewer practice trials to achieve consistent performance. Second, contrary to the hypothesis, the number of practice trials required to perform the SEBT and the BBS was not influenced by sex. Specifically, at least four, three, and four practice trials were recommended to achieve consistent performance in the ANT, PM, and PL directions for the SEBT, and four, two, and five practice trials in the OSI, APSI, and MLSI for the BBS, respectively, regardless of sex. Overall, these results highlight the importance of establishing sex-specific practice trial protocols, particularly for hop tests, to ensure accurate measurement and to minimize learning effects, which are known to influence test outcomes. Although the recommended number of practice trials is available in the literature for the SEBT and BBS,9–14 no published data have been found regarding functional hop tests, including potential sex-based differences. These findings provide novel insight into this gap and highlight the need for sex-specific considerations when standardizing clinical test procedures.
The findings of this study demonstrate that while a significant main effect of trial was observed across all dependent variables, these statistical changes should be interpreted alongside measurement error indices to determine performance stabilization. Although the high statistical power in the large sample (N = 100) could potentially render even minute performance increments as significant (p < 0.05), the practical relevance of these changes was supported by comparing them against the calculated MDC thresholds. For example, in the SEBT and BBS, although performance continued to show slight numerical improvements in later trials, the increments fell within the limits of measurement error (MDC), suggesting that participants had reached a functional plateau where further practice did not yield meaningful gains (Tables 5 and 6). Integrating these measurement error considerations provides a more robust rationale for determining optimal practice trials than relying solely on p values, ensuring that the recommended trials represent a state of consistent and reliable motor performance. Notably, the large trial effect sizes observed for the SEBT = 0.20–0.23) and hop tests = 0.20–0.44) indicate that practice trial number accounted for a substantial proportion of performance variance, reinforcing the clinical importance of implementing adequate practice trials prior to data collection. In contrast, the medium trial effect sizes for the BBS = 0.08–0.09) suggest a comparatively smaller learning effect, which may reflect the less physically demanding nature of static single-leg stance compared to dynamic reaching or hopping tasks.
The findings revealed no interaction effect in the static and dynamic balance tests, whereas an interaction effect was observed in the hop tests (Table 2). The TCH test demonstrated a medium interaction effect size = 0.07), supporting the clinical relevance of sex-specific practice trial protocols for this test. In contrast, the F8H test showed a small interaction effect size = 0.02), indicating that although the sex difference was statistically significant, its practical impact may be more modest. For the SEBT and BBS, the interaction effect sizes were small to negligible = 0.00–0.02), further confirming that sex does not meaningfully influence the number of practice trials for these tests. These findings could be attributed to the effect of anthropometric differences between males and females. Indeed, there were significant differences in height (p < 0.01) and mass (p < 0.01) between the male and female participants in this study (Table 1). This is supported by Howell et al. who suggested that males’ anthropometric characteristics of greater height and mass compared to females may contribute to more difficulty in controlling the trunk and center of pressure movement.30 Consequently, as anthropometric characteristics including height and mass can influence functional performance, researchers and clinicians using hop tests should consider sex-specific influences to minimize learning effects and achieve consistent performance.
The TCH test has been widely used to quantify lower extremity strength and power as an indicator of functional performance. It is particularly utilized in anterior cruciate ligament (ACL) research to evaluate self-reported function,31 likelihood of return to sport,32 functional performance deficits resulting from injury,33 and to serve as a prognostic indicator for structural knee osteoarthritis.34 However, the scientific rationale for determining the number of practice trials to evaluate these functional performance tasks remains unclear. For example, many studies do not provide any information regarding the number of practice trials.31–33 Although some studies described the number of practice trials or testing procedures conducted prior to actual testing,15,25,34–37 the methodologies were inconsistent. Specifically, participants were either allowed only one practice trial in preparation for data collection,15,35 permitted up to two or three practice trials,25,34 or allowed to practice until they felt comfortable.36,37 The findings show that the number of practice trials is crucial for optimizing assessment using these clinical tests. The findings also suggest that there is a sex difference in the number of practice trials required. For the TCH test, males should be permitted six practice trials, while females should be permitted five practice trials (Table 3).
The F8H test has been used to evaluate dynamic neuromuscular control and agility as it involves directional changes while hopping with a single leg. In previous studies, the F8H test has been substantially used to evaluate potential functional performance deficits,26,27,38 improvements following rehabilitation interventions,39 and/or return-to-sport decisions6 in patients with CAI. As with the TCH test, no studies have been conducted to determine the optimal number of practice trials required to reach the proper learning curve. Some studies did not report the number of practice trials in their procedures,6,27 while others permitted participants one to three practice trials prior to testing,38,39 or permitted them to practice until they perceived themselves to be comfortable.26 None of these studies provided a scientific rationale for the number of practice trials. The data demonstrate a sex difference in the number of practice trials required for the F8H test: males should be permitted five practice trials, while females should be permitted three practice trials (Table 3).
The SEBT is widely used as a functional assessment tool for dynamic postural control. Several researchers have investigated the number of practice trials required to optimize SEBT performance by minimizing learning effects and/or muscular fatigue.9–12 For example, using the Spearman-Brown prophecy formula, Kinzey and Armstrong9 reported that at least six practice trials in each direction are necessary to achieve intratester reliability measures ranging from 0.86 to 0.95. Similarly, Hertel et al. found a practice effect, with participants reaching farther as they performed more trials until a plateau occurred during trials seven through nine, and thus recommended performing six practice trials in each direction before recording reach distances.10 More recently, Robinson and Gribble demonstrated that maximum reach distances and stance limb kinematic displacement values plateaued by the fourth practice trial for most reach directions, thereby suggesting a reduction in the recommended number of practice trials from six to four.11 Likewise, Munro and Herrington recommended adopting a standardized protocol of four practice trials for SEBT administration.12 The current findings support this recommendation; however, previous studies were limited by small sample sizes and did not sufficiently investigate potential sex differences and/or direction-specific requirements.11,12 For practical simplicity, four practice trials are recommended for all three directions of the SEBT, regardless of sex (Figure 2). The current study strengthens this recommendation by utilizing a larger sample size (N = 100) and confirming that sex does not influence the number of practice trials required.
The BBS is a widely used and reliable assessment tool for quantifying static and dynamic postural control.8 However, there is limited evidence on the recommended number of practice trials required to achieve a proper learning curve. While two or three practice trials are recommended for dynamic postural control assessment,13,14 the scientific rationale for the number of practice trials for static postural control assessment remains unclear. Nevertheless, previous studies have used one or two practice trials to reduce potential learning effects.8,40 In another study, participants were given one to two minutes of practice to familiarize themselves with the testing procedure.41 The results indicate that four practice trials are required in the OSI, two in the APSI, and five in the MLSI, regardless of sex (Figure 2). Data from other studies may have been limited by a small sample size of 20 participants, who may have acclimated more quickly to the BBS.13,14
LIMITATIONS
There are several limitations in this study. First, as this study included only patients with CAI, the findings may not be generalizable to individuals with other pathologies (e.g., ACL injury, patellofemoral pain, hamstring strain). This limitation is not unique to this study; previous studies involving healthy individuals may have comparable limitations when generalizing their findings to individuals with musculoskeletal disorders.9–14 Second, the structure of the testing protocol may have influenced the results. The two to three day interval between DAY 1 (six practice trials) and DAY 2 (one warm-up trial and three test trials) may have integrated elements of between-session retention (i.e., the two to three day gap) and continued learning immediately prior to testing (i.e., the one warm-up trial). Consequently, it is possible that the observed plateaus may reflect performance stabilization following a fixed exposure to practice, rather than representing the absolute minimal number of trials required to reach a plateau. Third, although a 60-second rest was provided between trials and participants were encouraged to bear weight on the opposite limb during rest periods, cumulative fatigue from performing six consecutive practice trials may have influenced performance, particularly in the later trials. However, the progressive improvement observed across trials suggests that the learning effect outweighed any potential fatigue effect. Nevertheless, these findings provide practical guidance for clinicians and researchers who often utilize multi-session assessment protocols in clinical and/or athletic settings.
CONCLUSION
Static and dynamic balance and functional performance tasks are commonly used as effective ways to quantify function. The results of this study indicate that the number of practice trials required for the TCH and F8H tests varies between males and females, whereas no sex differences were observed for the SEBT and BBS. Specifically, males required more than six and at least five practice trials for the TCH and F8H tests, respectively, while females required at least five and three practice trials. For the SEBT, four practice trials are recommended for all directions, and for the BBS, four, two, and five practice trials are recommended for the OSI, APSI, and MLSI, respectively, regardless of sex. Male participants may need more practice trials than females to achieve consistent performance. Therefore, researchers and clinicians should implement sex-specific practice trial protocols for hop tests to ensure accurate data collection.
Corresponding author:
S. Jun Son, PhD, ATC
Associate Professor & Assistant Dean
Graduate School of Sports Medicine
CHA University
222 Yatap-dong, Bundan-gu, Seongnam-si,
Gyeonggi-do, South Korea
Fax: +82.31.881.7069
Phone: +82.31.728.7910
E-mail: seongjunson@gmail.com
CONFLICT OF INTEREST
The authors declare no conflict of interests.
ACKNOWLEDGEMENTS
The authors thank the participants for their contributions.


