INTRODUCTION
Anterior cruciate ligament (ACL) injury is the most common ligamentous knee injury.1 ACL injuries frequently occur during sports2 and are typically associated with muscle weakness, risk of instability and poor knee function.3,4 These impairments are usually present in the short term but often persist long term after ACL injury.5 Symmetrical knee extensor (KE) strength reduces the re-injury rate after ACL reconstruction (ACLR)6 and higher KE strength early after ACL injury predicts improved patient reported and functional performance outcomes after ACLR.7,8 Patients with ACL injuries have an increased risk of osteoarthritis (OA) as compared to the general population,9 but strengthening KE muscles plays a pivotal role in preventing knee OA.10 Further, knee flexor (KF) strength plays an important role in protecting the ACL.11 KF strength is particularly decreased after ACL reconstruction with hamstring graft.12 This may persist more than two years after ACL reconstruction13,14 putting the patient at risk of a secondary ACL injury. Taken together, restoring maximal knee muscle strength and limb symmetry index (LSI) following ACL injury and ACLR is therefore of great importance.
Considering the importance of evaluating muscle strength following an ACL injury, it is essential that assessment methods are easy, reliable, and valid. Assessment of maximal voluntary isometric muscle contraction (MVIC) using an isokinetic dynamometer (ID) has previously shown excellent clinimetric properties and is considered the “gold standard”.15 However, IDs are expensive, require training to handle, and are not portable, making them unavailable for most clinicians.
Handheld dynamometers (HHD) are affordable, easy to use, portable and frequently used to evaluate MVIC in clinical practice. Although HHD appear to have acceptable clinimetrics when evaluating KE strength after ACLR,16 they may underestimate the force output at higher force values,17 and it can be difficult to ensure a reliable setup.18
The ForceFrame (FF) was recently developed for assessment of maximal muscle strength. The FF (Figure 1) has foldable plates, an adjustable crossbar (which can be adjusted in height and can be rotated) and the force transducers are sensors that can also be adjusted and rotated. The FF is portable and does not require extensive training, making it easy to use for clinicians.
Previous studies have evaluated the FF for test-retest reliability when assessing hip strength19,20 and strength of the shoulder rotators.21 These showed acceptable reliability for both hip strength (in a non-injured athletic population) and shoulder strength (in healthy athletes). However, the clinimetric properties of the FF to assess MVIC of the KE and KF in patients with ACL injuries have not previously been evaluated. Therefore, the purpose of this study was to evaluate the reliability (test-retest reliability, inter-tester reliability and test-retest agreement) and validity (concurrent validity, convergent validity and FF vs. ID agreement) of the FF dynamometer during isometric testing of the knee extensors and flexors.
MATERIALS AND METHODS
Study Design and Participants
This observational study was performed at the Department of Public Health, Aarhus University, Denmark and evaluated participants with an ACL injury or an ACL reconstruction. Participants were recruited from a public rehabilitation centre in Aarhus, Denmark between April and October 2023. Inclusion criteria were: 1) participants having had an ACL injury or ACLR in the prior 6-24 months, 2) ≥18 years of age and 3) able to speak, read and understand Danish. All participants received oral and written information regarding participation in the study and gave informed oral consent. The study is registered at the region of Southern Denmark.
Experimental Set-up
All participants completed two test sessions one to two weeks apart. During the first test session participants completed: 1) a background questionnaire, 2) testing of MVIC of the KE and KF using both FF (Vald Performance, Queensland, Australia), an HHD (Commander Echo Wireless Console and Muscle Tester, JTECH Medical, Salt Lake City, Utah, USA) and an ID (Humac NORM, CSMi, MA, USA), 3) registration of self-reported pain (using a numerical rating scale [NRS]) in the knee being tested before and after each test direction in all three test devices and 4) the “Knee Outcome Survey – Activities of Daily Living Scale” (KOS-ADLS).22 At the second test session participants completed 1) two tests of MVIC of the KE and KF using only the FF performed by two different assessors and 2) a battery of single legged hop tests if they had a LSI between the injured and uninjured leg 90% for the KE.
To minimize bias in the test results, it was decided that the order of testing should be randomized. In the first test session (with FF, ID, and HHD), three patients could be tested simultaneously. On some test days, only one or two patients were scheduled simultaneously. The order of testing for each patient was randomized by a simple draw (three pieces of paper indicating the three devices) to determine which device would be tested first. In the second session, only one patient at a time could be tested, but was tested by two different assessors. The draw decided which assessor would perform the first assessment.
Assessors
Three assessors (7-13 years of experience) who were trained with the use of the FF, ID or HHD participated in this study. In the first session, two physiotherapists (KA, TF) and one exercise physiologist (TK) performed the tests with the FF, HHD and ID, respectively. At the second session, only two assessors were needed (KA, TF) to test all the participants using the FF.
Testing procedures
MVIC
Before the MVIC test of KE and KF, a mark was made on the skin approximately 5 cm proximal to the lateral malleolus to guide the placement of the force transducer from the HHD and FF. However, both assessors (in HHD and FF) independently measured the lever arm from the lateral joint line in the knee to the skin mark. In the ID, the lever arm was measured from the placement of the lever arm on the device and no mark on the skin was made for this test. After measuring the lever arm participants performed a standardized five minute warm up on a stationary bike. The warm-up load was adjusted according to the patients’ rated perceived exertion (RPE). During the first three minutes, patients aimed for an RPE of 5-6 (out of 10). In the final two minutes, the target exertion level was increased to an RPE of 8-9. Tests of KE (regardless of device) were performed with the participant sitting on a bench with 90 degrees flexion in the hip and the lower leg hanging down with the knee in approximately 90 degrees flexion (Figures 2A, 2C and 2E).
Tests of KF (regardless of device) were performed with the participant seated upright with a knee angle of approximately 70 degrees (Figures 2B, 2D and 2F) and the force transducer placed on the lower part of the calf guided by the same mark as used for KE testing.
For the FF, a tailor-made bar was used to stabilize the device during the assessment of KE (Figure 2A). For the HHD, the device was fastened using a custom-made belt for stabilization during both KE and KF assessments (Figures 2C and 2D). For the ID, participants had a back support and were fastened using a belt over the waist (Figures 2E and 2F).
Prior to starting a test on each device, a standardized test-specific warm up protocol was performed as previously described.23 This consisted of three submaximal trials where the participants were instructed to contract at approximately 50%, 75% and 90% of their perceived maximal force. After this, at least three maximal trials were completed, separated by 45 seconds of rest between trials. If the third trial resulted in the highest value, participants were given extra attempts until no improvement occurred.
For each trial, participants were instructed to sit upright with the hands across the chest and to relax the leg being tested, while the assessor counted down from three to start the test. Participants were further instructed to press ‘as hard and fast as possible’, while trying to maintain the pressure for three seconds. Standardized verbal encouragement was provided during each trial.
For both KE and KF, the non-injured limb was tested first and the best trial for KE and KF from both limbs were used for the analyses and reported as Nm/kg.
Numeral Rang Scale (NRS)
Before and after each trial for both KE and KF, participants rated their current knee pain on a 11 point NRS ranging from 0 (no pain) to 10 (worst imaginable pain).24 This was collected to evaluate if knee pain increased following the tests to ensure that no unacceptable pain occurred. The test session was terminated if participants rated their knee pain above 5 on the NRS.
Knee Outcome Survey-Activities of Daily Living (KOS-ADLS)
KOS-ADLS is a reliable, valid and responsive knee-specific patient reported outcome measure (PROM) measuring symptoms and functional limitations with activities of daily living ranging from 0 (worst) to 100 (best).22,25 Participants completed the KOS-ADLS at the end of the first test-session.
Functional performance
Functional performance was assessed using a battery of four single legged hop tests: 1) single-leg-hop for distance (SLHD), 2) cross-over-hop for distance (CHD), 3) triple-hop for distance (THD) and 4) 6-m timed hop (6m-TH), performed as described previously.26 Results for each single leg hop test were reported as an average of two trials and presented in centimeters (SLHD, CHD and THD) or seconds (6m-TH).
Only six participants completed the battery of single legged hop tests and these results are therefore not included in this paper.
STATISTICAL ANALYSIS
Descriptive statistics were used to describe participant characteristics. Dichotomous and ordinal data are presented as frequencies (%) and continuous data are presented as means and SD if data is normally distributed or median and range if data is not normally distributed.
Prior to any analysis, QQ plots and histograms were visually inspected to ensure normal distribution of data. All results were presented for the KE and KF for both the injured leg and the non-injured leg.
Reliability
Test-Retest reliability
Day-to-day test-retest and inter-tester reliability was analyzed using interclass correlation coefficient (ICC) and reported with 95% confidence intervals (CI), based on single measures, absolute agreement and two-way mixed effects. Reliability was interpreted as poor (ICC < 0.50), moderate (ICC 0.50.-0.74), good (ICC between 0.75-0.90) or excellent (ICC ˃ 0.90).27 Day-to-day test-retest reliability was calculated from the best trial for both KE and KF from the first and second FF test-sessions (same test-leader). Inter-tester reliability was calculated from the best trial for both KE and KF of the second test-session (two assessors).
Day-to-Day Test-Retest Agreement
To evaluate day-to-day test-retest agreement, the best trials for KE and KF from the first and second FF test-session (same test-leader) were used. The mean difference in force (Nm/kg) between test and re-test was plotted aginst the mean force of the two measurements with 95% limits of agreement (LOA). LOA was calculated as the standard deviation (SD) of the differences between the day-to-day tests multiplied by 1.96. The standard error of measurement (SEM) was used to assess the absolute error of the measures and the smallest detectable change (SDC) was calculated (using the squareroot of the error variance) to reflect the smallest change exceeding test variation. SEM and SDC were calculated and presented as percentages of the group mean force.
Validity
Concurrent validity was analyzed (Pearsons correlation) to evaluate the relationship between the muscle strength results measured using FF and ID and between FF and HHD. Correlation was interpreted as poor (r=˂0.30), fair (r=0.30-0.59), moderate (0.60-0.79) and very strong (r=˃0.80).28,29 In addition, we also analyzed agreement between assessments with the FF and the ID to report the mean differences.
Convergent validity evaluated the relationship (Spearmann correlation) between muscle strength measures and the outcome on the KOS-ADLS.
RESULTS
In total, 24 of 27 completed both test sessions. Of the three that did not complete the second test session, one had increased knee pain from the first test session that did not resolve before the second session, one failed to attend the scheduled assessment, and one was prevented due to personal reasons. Of the 27 participants, 10 were female and 17 were male. The median age of the participants was 25 (19-60) years (Table 1). Mean MVIC values for KE in the injured leg were 1.80 (±0.32) Nm/kg in FF, 2.31 (±0.50) Nm/kg in the ID and 1.68 (±0.40) Nm/kg in the HHD. For KF, mean MVIC values in the injured leg were 0.85 (± 0.22) Nm/kg in the FF, 1.18 (±0.33) Nm/kg in the ID and 1.07 (±0.26) Nm/kg in the HHD (Table 2).
Reliability
Test-retest reliability
The FF presented with good day-to-day test-retest reliability for measuring MVIC of KE (ICC = 0.77, 95% CI 0.48-0.90) and KF (ICC = 0.83, 95% CI 0.61-0.92) in the injured leg as well as good day-to-day test-retest reliability of KE (ICC = 0.80, 95% CI 0.56-0.91) and KF (ICC = 0.87, 95% CI 0.72-0.94) in the non-injured leg (Figure 3A). The inter-tester reliability was excellent for measuring MVIC of KE (ICC= 0.97, 95% CI 0.94-0.98) and KF (ICC = 0.93, 95% CI 0.85-0.97) in the injured leg, and a good inter-tester reliability was observed for KE (ICC = 0.92, 95% CI 0.91-0.98) and KF (ICC = 0.91, 95% CI 0.64-0.97) of the non-injured leg (Figure 3B).
Day-to-Day Test-Retest Agreement
Bland & Altman plots shows a wide LOA between test and retest for KE and KF for both the injured and noninjured leg (Figures 4A, 4B and 5A and 5B), but with homogeneus distribution of datapoints. For the injured leg, KE showed LOA ranging from -0.57 to 0.51 Nm/kg and for KF LOA ranged from -0.29 to 0.36 (figure 4A and 5A). The SEM for KE and KF on the injured leg was 8% and 9%, respectively. The SDC for KE and KF was 22% and 27% for the injured leg, respectively (Table 3).
Validity
Concurrent validity between FF and ID showed a fair correlation for KE (r= 0.56) and a poor correlation for KF (r=0.24) when evaluated in the injured leg, and a moderate correlation for KE (r=0.65) and a fair correlation for KF (r=0.36) when evaluated in the non-injured leg (Figure 6 A). In the injured leg, a moderate correlation was found between FF and HHD for knee extension (KE) (r=0.74) and a poor correlation for KF (r=0.12) (Figure 6B). In the non-injured leg, there was a moderate correlation for KE (r=0.66) and a fair correlation for KF (r=0.49) (Figure 6B). Results from the injured leg from all devices are visually presented for KE in Figure 7A and KF in Figure 7B. Bland & Altman plot between FF and ID for KE on injured leg showed a mean difference of -0.51 Nm/kg and for KF on injured leg a mean difference of -0.32Nm/kg (Figure 8 A and B).
Convergent validity between FF and the KOS-ADLS was found to be poor for KE (r=0.13) and negative for KF (r= - 0.11).
DISCUSSION
This is the first study to evaluate the reliability and validity of the FF for assessing MVIC of KE and KF in an ACL injured population. The FF showed: 1) good day-to-day test-retest reliability, when assessing MVIC of KE and KF for the injured leg and excellent inter-tester reliability when assessing MVIC of KE and KF, 2) that a change exceeding 22-27% is needed in the evaluation KE and KF on the injured leg to ensure that a true change has taken place, 3) a moderate concurrent validity when assessing KE and a poor concurrent validity when assessing KF on the injured leg compared to the ID and 4) poor convergent validity as neither KE nor KF strength on the injured leg were associated with the score on KOS-ADLS.
Reliability
Good day-to-day test-retest reliability and excellent inter-tester reliability were found for both KE and KF on both the injured and non-injured leg when using the FF. This suggests that assessments on FF can be reproduced between test days and that assessments are not impacted markedly by different assessors. The high ICCs observed are likely enforced by the highly standardized test protocol, where participants were given standardized instructions during all assessments. On the second test-day, participants were asked if they experienced a change in their knee problems since the first test-day. Four participants (of 24) reported that they experienced a change in their knee problems (two better, two worse), which might have impacted their results. Furthermore, it was not possible to ensure that assessments were completed on the same time of day, which therefore allow some diurnal variation to have occurred. However, the two assessments evaluating inter-tester reliability were performed on the same day, therefore excluding day-to-day variation.
Although no previous studies have evaluated reliability of the FF for KE, the FF has shown good test-retest reliability (ICC = 0.77-0.95) when assessing hip strength in football players20 and excellent test-retest reliability in Australian footballers (ICC = 0.87-0.97).30 Similar test-retest reliability results were found in a study evaluating shoulder rotational strength (ICC = 0.85-0.92), which may indicate that the FF is a reliable device for evaluating MVIC and not sensitive to different assessors when specific protocols are applied.
Agreement
LOA ranged from -0.57 to 0.51 Nm/kg for KE and from -0.29 to 0.36 Nm/kg for KF. These ranges are quite wide considering the mean force of 1.80 Nm/kg and for KE and 0.85 Nm/kg for KF. The mean difference between the two test-days were only -0.03 Nm/kg for KE and 0.03 Nm/kg for KF, but considering the wide LOA, there were large differences in the absolute test results between days. This could be explained by a difference in the knee condition between test-days, but is most likely related to a less stable test position compared to the ID. A previous study evaluating agreement between HHD and an ID reported no agreement between devices when evaluating KE MVIC31 and discussed the impact of a stable test position, which likely also had an impact on the results in the present study.
The evaluation of agreement between the FF and ID indicated that FF measured 0.50Nm/kg and 0.32 Nm/kg lower than the ID in KE and KF. As agreement evaluates absolute values it must be considered that FF may not express the true MVIC considering the underestimation of KE and KF strength when using the FF. The difference between devices are again most likely explained by the different test positions, where the ID uses more optimal positioning and stabilization, allowing the participants to produce higher force outputs, as also suggested by previous research.32 From %SDC it is also notable that a change of less than 22% in KE and 27% in KF may reflect a measurement error rather than a real change. A previous review evaluated HHDs compared to IDs for assessing MVIC KE and KF and found that LOA were generally higher than 10%.32 Therefore, clinicians must carefully consider the magnitude of change over time when they interpret whether a strength improvement reflect an actual improvement rather than a potential measurement error. Of note, agreement was comparable for KE between both the injured and non-injured leg, suggesting that measurement error is not only reflected on the injured leg.
Validity
A fair and moderate concurrent validity between FF and the ID for KE was found for both the injured and the non-injured leg, whereas a poor and fair concurrent validity was found for KF for both the injured and non-injured leg. This indicates that the FF can be used to establish a valid MVIC of KE, whereas KF assessments should be interpreted very cautiously. The concurrent validity between FF and HHD for KE was higher than between FF and the ID, which may be explained by more comparable setups between these devices, as compared to the ID. When tested in FF and HHD, participants were seated on a bench without belt fixation over the hip nor had any backrest support, which is contrary to the ID. The weaker correlation between FF and ID than between FF and HHD is likely explained by the more optimal conditions available for force production in the ID, as FF generated 22% less force than the ID and only 7% more force than the HHD. The ID setup has the advantage of a completely fixated lever arm and seat, cushioning on the lever arm and seat to avoid pain, and requires less attention paid towards maintaining a stable position in the upper body and to keep seated when applying maximal force.31
Previous studies have evaluated concurrent validity for the HHD using ID as reference,31,33–36 reporting moderate to excellent correlations. Although not directly comparable, this indicates that KE FF validity is within the expected range. The poor correlations between the FF and ID for KF on the injured leg suggest that the applied test position might not be optimal for assessing KF muscle strength. In this group of patients with ACL injury, 12 participants had ACL reconstruction with hamstring graft, which are known to experience difficulties in regaining KF strength despite rehabilitation37 and could possibly influence the assessments if the setup for KF is more challenging than in the ID. In the present study, patients with hamstrings graft had similar KF strength in FF compared to patients with other grafts or rehabilitation only, but lower KF strength in the ID compared to the rest of the group. Furthermore, a recent study found that deficits in KF strength for patients with ACL reconstruction was more accurately evaluated using a device assessing eccentric KF in the Nordic hamstrings exercise compared to an ID.38 Given the various adjustment options in FF it may be considered if other test positions should be adopted when evaluating KF, but this remains to be investigated.
Methodological Considerations
This study had some limitations that should be considered when interpreting the results. First, as this study evaluated reliability, validity, and agreement of KE and KF with FF in patients with ACL injuries, the results are limited to this group of patients. Second, the authors used a tailor-made bar to stabilize the FF in the assessment of KE. This must be considered since this stabilizing bar is not part of the commercial FF kit, but was deemed necessary to ensure sufficient stability during the KE assessment. Third, despite using a standardized protocol, it was based on a subjective evaluation if an attempt was not accepted. A higher force output is likely occurring if a participant did not maintain seated or excessively pulled the upper body back. However, it is also possible that participants might feel limited in producing maximal force by the restrictions in the standardized test position.
CONCLUSIONS
The results of this study indicate that the FF can be used to obtain reliable and valid assessments of MVIC of KE. However, absolute values must be interpreted with great caution as FF likely underestimates the true MVIC value compared to gold standard (ID) and changes must exceed 22-27% to be considered a real change. If the FF assessments are used for evaluative and decision-making purposes this must be taken into account. The test position to assess KF in FF does not appear to be optimal, and different test-positions should be considered.
Disclosures
The study was supported with funding (to purchase of a test device, the ForceFrame and covering transportations costs for participants) by the Research Council at Lillebaelt Hospital, Denmark. The authors report no conflicts of interest associated with the creation of this manuscipt.
Acknowledgements
Great thanks to project nurses Jane Leonhardt and Pia Hostrup Andersen for assisting in practical planning, preparing and coordinating the test sessions