The universal goniometer is a portable, low cost and easy to use tool which can quantify joint range of motion (ROM). This has favoured its use in clinical settings as well as in research.1,2 Although it can have a close association with technologically advanced techniques,3,4 the measurement reliability and agreement of goniometry can vary due to multiple factors. These factors include the rater, the individual being measured, the joint being measured and differences in ROM measurement technique such as test posture, imposed restraints (e.g. adjacent joint(s) restriction) and assistance (e.g. active vs. passive).5 It is important to understand the extent to which these different factors can influence goniometry to inform best practices in the clinic and research.
Intra-rater reliability and agreement has consistently been shown to be higher than inter-rater reliability and agreement.5–8 Pain and spasticity have the potential to reduce reliability.5 Each joint can have different characteristics that can influence alignment and placement of the goniometer to influence ROM measurements.9 Similarly, different ROM tests measuring different ranges of the same joint (i.e. movement directions such as flexion or extension) can potentially influence reliability and agreement due to different position and goniometer alignment requirements. However, the extent to which different types of techniques can influence goniometry is uncertain. For instance, restricting adjacent joints can be used to assess the flexibility and the contribution of bi-articular and mono-articular muscles to the joint ROM which can inform on function and risk of muscle injury.10 Yet, it is unknown if restricting adjacent joints influences measurement reliability and agreement in a consistent way across different ROM measurements.
Comparing results between studies is challenged by differences between methodology, participants and the raters, as these all have the potential to influence the intraclass correlation coefficient (ICC) and the standard error of measurements (SEM) which have been recommended as the preferred estimate of reliability and agreement.11–13 The ICC is calculated as the ratio of between-participant variance to total variance, where the total variance is the sum of the between participant variance and error variance.14 When there are an equal number of repeated measurements for each participant (at least two per participant), a two-way model can be used to partition the error into systematic error and random error,13,14 such that the absolute agreement, two-way random effects, single measurement ICC (i.e. ICC[2,1]) can be calculated as:
Where σ2p is the variance between participants, σ2r is the systematic error variance due to systematic differences between the measurements or raters being compared, and σ2e is the random error variance.11,13,15 There are other formulas for calculating the ICC that reduce the influence of σ2r and σ2e which results in an increased ICC score (e.g. ICC[3,1] and ICC[2,k]).13 However, the use and reporting of ICCs between studies is inconsistent and often lacks sufficient detail.2,16,17 Additionally, the SEM can be calculated as SEM= SDpooled x √1−ICC.14 Hence, both the ICC and SEM are influenced by any changes that could influence the components of variance within the data. For this reason, differences between the methodology, participants and raters, challenges the comparison of ICC and SEM scores between studies. Further, few studies usually report the components of variance which limit the consolidation of different or conflicting results.11
There is a paucity of research investigating the effect of technique, such as adjacent joint restriction, across more than one joint or ROM test within a single study to overcome the difficulties in comparing results between different studies.5 Additionally, differences between restricting and not restricting the adjacent joint could influence the variance between participants (e.g. due to differences in bi-articular muscle flexibility leading to changes in ROM), variance between measurements/raters and random error in different ways. Thus, understanding if the effect is consistent across different ROM tests and investigating why any changes occur can contribute to the conceptual framework used to guide goniometry in research and practice.
The purpose of this study was to quantify intra- and inter-rater reliability and levels of agreement of goniometric measurements across five ROM tests, with and without adjacent joint restriction. Using the same participants and raters as well as reporting the components of variance may improve understanding of whether adjacent joint restriction consistently influences goniometer measurement reliability and agreement across different ROM tests. This knowledge may contribute to creating a conceptual framework to guide effective use of goniometers.
MATERIALS AND METHODS
A convenience sample of 30 healthy participants provided written informed consent. All procedures were approved by the University of Toronto’s Office of Research Ethics. Two Certified Athletic Therapists and one Registered Physiotherapist rated each participant in a random order. Raters had similar field experience and worked interchangeably within the same clinic. Raters underwent specific training for this study, which included standard instructions for each ROM test (Appendix A) and a practice session involving three participants who were not part of the study. To control for differences in repositioning participants between measurements, a trained research assistant positioned all participants. A second research assistant documented the measured values.
Participants performed a warm up session upon arrival that consisted of five minutes of light stationary cycling and 10 repetitions of bodyweight squats, lunges, calf-raises and arm circles. The ROM tests included ankle dorsiflexion, first metatarsophalangeal (MTPJ1) dorsiflexion, hip extension, hip flexion, and shoulder flexion. Each ROM test was performed twice bilaterally, with and without restraining the adjacent joint position. The left side was measured first, followed by the right side, and then both sides were repeated to obtain two measurements per side. This repositioning between each measurement avoided residual effects from holding the testing position (i.e. stretching). Each rater performed all measurements in the same order but rated each participant in a random order with approximately two minutes of rest between each rater. This allowed for equal time between each rater’s evaluation of each ROM test.
Details of the testing positions and goniometer placements can be found in Appendix A. The following provides a brief description:
Measured in a lunge/split stance. Unrestricted ROM was measured on the front foot, while the rear foot was used to measure restricted ROM by keeping the rear knee extended to influence the gastrocnemius muscles (Figure 1A).
Measured while participants sat on a table with their feet hanging freely above the floor. Unrestricted ROM was measured with the ankle supported in its natural hanging position, whereas the restricted ROM was measured in maximal ankle dorsiflexion to influence flexor hallucis longus (Figure 1B).
Measured with the participant lying supine with their hips on the edge of a table and a rolled towel beneath their lower back. The test leg hung off of the edge while the non-test leg was supported in hip and knee flexion by the research assistant. Unrestricted hip extension was measured by pushing the hanging leg down until the pelvis started to rotate anteriorly, while controlling for leg rotation. Restricted hip extension was performed in the same manner but while holding the hanging leg in 90° of knee flexion to influence rectus femoris (Figure 1C).
Tested in a supine position. The test hip was flexed with the test side knee either flexed or extended to influence the hamstring muscle group, for the unrestricted and restricted measurements respectively (Figure 1D).
Measured while participants sat with their feet firmly on the ground. The research assistant supported the scapula while raising the participants’ arm with the elbow either extended or flexed to influence the triceps brachii, for the unrestricted and restricted measurements respectively (Figure 1E).
To determine the effects of adjacent joint restriction on intra- and inter-rater reliability and agreement, changes in ICC and SEM scores were assessed between the unrestricted and restricted conditions across the five ROM tests. An absolute agreement, two-way random effects, single measurement ICC (i.e. ICC [2,1]) was used to measure intra- and inter-rater reliability (psych package, RGui Version 4.0.2, The R Foundation, Vienna, Austria). Inter-rater ICC was calculated using the mean of each rater’s two measurements to account for the deviation within their respective measurements. ICC scores were interpreted as <0.40 is poor, 0.41-0.59 is fair, 0.60-0.74 is good, and 0.75-1.0 is excellent.18 SEM was calculated as SEM= SDpooled x √1−ICC. The mean squares from the ICC calculation were used to estimate σ2p, σ2r and σ2e.15
Participant characteristics are provided in Table 1.
ICC and SEM scores changed with the restricted technique in different ways across ROM tests (Tables 2 and 3). Specifically, the mean intra-rater ICC and SEM scores across raters respectively decreased and increased (left and right ankle and MTPJ1 dorsiflexion), increased and decreased (left and right shoulder flexion and left hip extension), or both increased (left and right hip flexion and right hip extension) with adjacent joint restriction. However, each rater had different changes between unrestricted and restricted techniques. For example, rater 2 had a decrease in ICC and increase in SEM for shoulder flexion (Table 2).
Changes in intra-rater ICC and SEM scores between the unrestricted and restricted techniques matched changes in variance between participants (σ2p) and random error variance (σ2e) respectively. Specifically, σ2p and σ2e respectively decreased and increased (left and right ankle and MTPJ1 dorsiflexion), increased and decreased (left and right shoulder flexion and left hip extension) or both increased (left and right hip flexion and right hip extension) with adjacent joint restriction. Although adjacent joint restriction modified ICC scores, intra-rater ICCs were still good to excellent across the ROM tests even with the associated decreases with or without adjacent joint restriction (Table 2).
Inter-rater ICC and SEM scores showed similar changes to intra-rater scores with adjacent joint restriction for some tests (left and right ankle dorsiflexion, right hip extension, left hip flexion and right shoulder flexion) but not others (increased and decreased with restriction for left and right MTPJ1, decreased and increased with restriction for right side hip flexion and both ICC and SEM increased for left hip extension and left shoulder flexion, Table 3).
Inter-rater ICCs ranged from poor to good and had broader 95% CIs than the intra-rater comparison. Similarly, SEM was higher between raters compared to SEM within each rater’s measurements (Tables 2 and 3). The effects of adjacent joint restriction were more pronounced in the inter-rater comparison. The restricted technique reduced the inter-rater ICC score from fair to poor for left and right ankle dorsiflexion. Conversely, the restricted technique increased ICC scores from poor to fair for left and right hip extension and left shoulder flexion, and improved the ICC score from fair to good for left MTPJ1 flexion. SEM also decreased by approximately 3° with the restricted technique for right MTPJ1 flexion (Table 3).
The systematic variance between measurements/raters (σ2r) and the random error variance (σ2e) were higher for the inter-rater comparison than any of the intra-rater comparisons, but the changes in the ICC and SEM scores due to adjacent joint restriction varied due to differing changes across each component of variance (i.e. σ2p, σ2r and σ2e). The lower inter-rater ICC for the restricted ankle dorsiflexion was due to a decrease in σ2p and increase in σ2r and σ2e compared to unrestricted ankle dorsiflexion. The notable increases in ICC for restricted hip extension and left side shoulder flexion were also due to σ2p, as it increased but σ2r and σ2e remained similar compared to the unrestricted techniques. Conversely, the improved ICC and SEM scores for restricted MTPJ1 were not due to changes in σ2p, but due to decreases in σ2r and σ2e compared to the unrestricted technique (Table 3).
No known study has evaluated intra- and inter-rater goniometer measurement reliability and agreement across multiple ROM tests, with and without adjacent joint restriction. Reliability and agreement were differentially influenced by adjacent joint restriction across the five ROM tests. Joint restriction does not appear to substantially influence intra-rater reliability and agreement. Conversely, inter-rater reliability and agreement were more substantially influenced by adjacent joint restriction, but this was due to different changes across the components of variance used to calculate the ICC and SEM.
The differing changes in intra-rater reliability and agreement with adjacent joint restriction were primarily influenced by σ2p and σ2e. Although changes in σ2p and σ2e each have the potential to influence the ICC and SEM, it appears as though changes in ICC were more influenced by changes in σ2p and changes in SEM were more influenced by changes in σ2e (Table 2). However, it may not be appropriate to generalize these associations. Regardless, intra-rater reliability and agreement were not substantially influenced by adjacent joint restriction as the ICC scores were still excellent and the differences between the SEM scores were not larger than approximately 1°, with the exception of right-side unrestricted shoulder flexion (Table 2). Upon examining the data, the lower ICC and higher SEM for right side shoulder flexion was likely caused by an outlier trial from rater 3, which measured 270° of shoulder flexion. It appears as though this may be unrealistic but the trial was not removed from the data to maintain equal number of trials across raters.
Inter-rater comparisons provided a more complex scenario due to the higher σ2r, which was almost nonexistent in the intra-rater comparison. The decreased inter-rater ICC and increased SEM can be attributed to the systematic and random error that occurred between raters, as σ2r and σ2e were higher while σ2p was similar to the intra-rater values. Adjacent joint restriction differentially influenced these components of variance and the resulting ICC and SEM. Improvements in reliability and agreement observed with hip extension and shoulder flexion were due to changes σ2p, which suggests that raters were not necessarily better at making the measurements even though ICC and SEM scores improved with adjacent joint restriction. Conversely, the improvements in MTPJ1 with joint restriction, can be attributed to fewer differences between raters and less random error as decreases in σ2r and σ2e caused the observed improvements in ICC and SEM. When multiple raters are involved in the measurement process, it may be beneficial to determine which technique (unrestricted/restricted) provides higher reliability and agreement and standardize it across raters. Although this may also be important with intra-rater comparisons, it may not be as important, because the intra-rater comparisons scores were excellent even with changes due to adjacent joint restriction.
The inter-rater comparison demonstrates the utility of reporting the components of variance used to calculate the ICC and SEM, as it improved interpretation of the scores. Without investigating the components of variance, it is difficult to determine why the changes in ICC and SEM occurred which can lead to misleading conclusions. Reporting the components of variance has the potential to also improve between-study comparisons that use different participants, raters and methodologies, towards enhancing the consolidation of knowledge to guide goniometry. For example, standardizing the measurement protocol and training raters has been investigated as a means to improve reliability and agreement.5 Evaluating the variance components within the data can elucidate how different protocols may differentially influence the systematic and random error, which can be used to inform on how protocols can be tailored to target improvements in reliability and agreement, beyond the constraints of a specific study. Interpreting the data in this way can contribute to developing a conceptual framework that can inform future research and decision making in practice.
Considering the limitations associated with the current investigation, maximal ROM may not have changed between measurements, but it is possible that the repositioning between raters could have contributed to σ2r and σ2e. Although a single researcher positioned participants to control these differences, a criterion measure to confirm repositioning consistency and accuracy was not performed. The study design only provides rater reliability within a single session or day. Although apparently healthy participants were investigated, it is certainly possible that they could have had limitations in bi-articular muscle flexibility which would have influenced their ROM with adjacent joint restriction and the variance between participants (σ2p). Thus, the results from this study could represent the reliability and agreement when assessing heterogeneous groups such as sport teams but would not necessarily be well representative of other homogenous groups such as those with a specific condition that would influence their muscle flexibility. Due to challenges in making comparisons between studies and the scarcity of research on the specific ROM tests and techniques investigated here, it is difficult to make meaningful comparisons to other studies.
Restriction of adjacent joints influenced measurement reliability and agreement differentially across the five ROM tests. The changes due to adjacent joint restriction were more pronounced in the inter-rater reliability and agreement, whereas intra-rater reliability and agreement were not substantially influenced. Thus, the effects of adjacent joint restriction on reliability and agreement depends on the ROM test and is of higher importance when multiple raters are involved. Estimating the components of data variance improved the interpretation of the ICC and SEM scores, which demonstrates the utility in reporting variance components in future work, to potentially improve between-study comparisons and towards developing a conceptual framework to guide goniometry.
DECLARATION OF INTEREST
The authors do not have any conflicts of interest to declare.
The authors would like to acknowledge the University of Toronto’s David L. Macintosh Sports Medicine Clinic staff and Dr. Catherine Sabiston for her advice on study design and analyses. The authors would also like to thank Lindsay Musalem, Malinda Hapuarachchi, Phil Toppin, Victor Chan, Rachel Micay, Justine Branco, Pedro Malvar, Joyce Kuang, Izabela Boyaninska, Sunghoon Eric Minn and Mary Claire Geneau for their assistance with data collection.