INTRODUCTION
One of the primary components of the musculoskeletal physical examination is the assessment of joint and limb motion. Clinicians can obtain the amount of joint motion a person has with a variety of devices including three-dimensional analysis motion tracking systems, manual and digital inclinometers, and most commonly, manual goniometers. Also, clinicians have often been shown to have acceptable to excellent levels of inter-rater and intra-rater reliability when utilizing these devices in-person.1–4
The rise of virtual healthcare visits (often termed telehealth or telemedicine) has forced clinicians to either modify, or in some cases eliminate, components of the physical examination due to the logistics and barriers of administering a virtual visit.5–7 When using a virtual platform to administer a telehealth examination, motion assessments have either been eliminated or reduced to visual qualitative assessments. For example, a patient may be asked to elevate the arm in front of the body to determine how much shoulder flexion is able to be performed. However, without the use of a measuring device, clinicians would be relegated to using visible landmarks or categories to document the motion achieved i.e., patient was able to elevate arm to just below the ear. This raises two concerns: 1) the qualitative nature of the modified assessment is subjective and not exact and 2) previous literature has demonstrated that motion assessment reliability is more consistent with instrumentation compared to only using visual means.8 Furthermore, it has been demonstrated that patients tend to over-estimate the amount of motion they can perform when verbally asked to quantify their joint motion suggesting the elimination of objective motion assessments should not be considered.9 The non-objective assessments could in turn negatively affect clinical decision making for providing an accurate diagnosis, determining proper treatment, and properly monitoring patient progress across treatment.
Although it is possible these types of issues could arise during a telehealth examination, it is also possible that joint motion could be assessed with acceptable reliability in the virtual environments by utilizing a simple image capturing technique such as a screen shot and goniometer during the telehealth session. Therefore, the purpose of this study was to determine if similar goniometric measurements of the upper extremity could be obtained in-person and virtually. It was hypothesized that inter-rater and intra-rater test/re-test reliability for both in-person and virtual measurements would reach an acceptable level of reliability defined as an intraclass correlation coefficient ≥0.60.
METHODS
Subjects
A publicly recruited sample of subjects volunteered to participate in this study. Inclusion criteria included: age between 18-60 years; able to actively elevate the arm to ear level (approximately 150°); actively move the elbow from an extended to flexed position within an approximate range of 0°-90°; and actively move the wrist into a flexed and extended position from a starting position of neutral (0°) to a non-specific range of motion in both directions. Subjects were excluded if age was <18 years and >60 years, could not move the shoulder, elbow, and wrist as noted in the inclusion criteria, had a Disabilities of the Arm, Shoulder, and Hand (DASH)10 disability score ≥40%,11 or had neurological compromise that would prevent joint/limb motion from occurring.
Procedures
After reading and signing the informed consent packet, demographic information including age, sex, height, weight, and arm dominance were obtained. Following completion of the demographic obtainment, subjects completed the DASH.10
In-Person Measurements
Prior to performing the in-person goniometric measurements for each joint, an image was captured of each pre-determined joint position using a mobile device with a camera (iPad Air 2, Apple, Inc, Cupertino, CA). This still shot image represented an image that could be captured via screenshot on a virtual platform. Next, serial in-person measurements were obtained by each of four clinician research team members. The clinicians were comprised of two certified athletic trainers and two occupational therapists. All clinicians had a minimum of 10 years of clinical experience. Each clinician member of the research team performed all six measurements on each subject consecutively. This process continued until all team members performed all measurements twice in the same session. This was necessary for determining the test/re-test reliability for each clinician (intra-rater reliability). The goniometer dial was covered with paper so the team member obtaining the measurement could not see the values. To reduce the potential for recording bias by the team member performing the measurements, a second team member read and recorded the range of motion to the nearest degree mark.
The dominant arm of each subject was utilized for all measurements unless the dominant arm did not meet the inclusion criteria. Each subject was tested in a standing position facing sideways with their dominant arm facing the camera. This was necessary for both the in-person and virtual assessments to clearly visualize the anatomical landmarks for goniometer placement (Table 1). The testing positions for each measurement occurred as follows: Shoulder flexion: humerus at 150° flexion (approximately ear level); Shoulder extension: humerus at maximal extension without altering erect trunk position; Elbow flexion: humerus in line with trunk, elbow at 90° flexion, and forearm in supination; Elbow extension: humerus in line with trunk, elbow at 0° extension, and forearm supinated; Wrist flexion: humerus in line with trunk, elbow at 90° flexion, forearm pronated, and wrist maximally flexed; and Wrist extension: humerus in line with trunk, elbow at 90° flexion, forearm pronated, and wrist maximally flexed.
Virtual Measurements
Approximately one week (7-10 days) after the in-person measurements were completed, the research team members measured the captured images using the same goniometric techniques. This step also utilized two team members, with one member performing the in-person measurement for each image and another member reading the goniometer (and vice versa). The images were placed on a cloud-based shared drive, for all team members to be able to access the images at each person’s personal computer. Team members were not permitted to alter the image characteristics (brightness, contrast, resolution, etc) but were permitted to use the zoom function contained within the computer’s image viewing software to enlarge each image for more accurate placement of the goniometer. The average of the two trials was calculated for both in-person and virtual sessions and used for statistical analysis. Intra-rater reliability for each joint measurement was determined for each clinician per each session (i.e. clinician #1 trial 1 versus trial 2, clinician #2 trial 1 versus trial 2, etc.) while inter-rater reliability for each joint measurement was determined by comparing all results for trial 1 versus trial 2 for all four clinicians combined for each session.
Statistical Analysis
Summary statistics for demographic items were calculated and reported as means and standard deviations for continuous variables and frequencies with percentages for categorical variables. The distribution of data for each variable was assessed for normality using the Shapiro-Wilk test. Using a two-way random with absolute agreement design for inter-rater (2, k) and intra-rater (2,1) test/re-test reliability, intraclass correlation coefficients (ICC) were calculated for both in-person and virtual testing sessions. Once the ICCs were determined, standard error of measurement (SEM) and minimal detectable change at the 90% (MDC90) and 95% (MDC95) confidence level were calculated. An ICC greater than 0.75 was interpreted as excellent, 0.74-0.60 was good, 0.59-0.40 was fair, and <0.40 was considered poor.12 Finally, a between session comparison of measurement values was conducted using paired t-tests or Wilcoxon sign rank tests (based on normality results) for the overall comparisons (in-person versus virtual) and one-way analyses of variance with Bonferroni correction for between examiner comparisons.
Using previously established criteria for sample size estimation, it was determined that 20 subjects would be needed to achieve a minimum intraclass correlation coefficient of 0.60 at an alpha level of 0.05 and beta level of 0.90.13
RESULTS
Twenty subjects (Age: 30.8±12.8 years; height: 169.8±10.2 centimeters; weight: 76.8±18.9 kilograms; DASH: 26.0±2.4%; Sex: 85% female) participated in the study.
Inter-Rater Reliability
The ICCs for five of the six in-person measurements were classified as excellent (ICC≥0.81) (Table 2). In-person wrist extension was classified as good (ICC=0.60). Similarly, the ICCs for five of the six virtual measurements were classified as excellent (ICC≥0.78). Virtual wrist flexion was classified as good (ICC=0.65).
Intra-Rater Reliability
Overall, the ICCs for the individual clinicians were between good and excellent for the in-person measurements (range: 0.61-0.96) and virtual measurements (range: 0.72-0.97) (Table 3). When examining the individual measurement results, the ICCs for both in-person (ICC≥0.84) and virtual (ICC≥0.93) shoulder extension and in-person (ICC≥0.89) and virtual (ICC≥0.94) elbow extension were all classified as excellent. There were a greater number of excellent ICC values for the virtual measurements (90%) compared to the in-person measurements (70%).
Between Session Comparisons
When combining all clinician measurements, there were statistically significant differences between in-person and virtual sessions for five of the six measurements (p≤0.006) (Table 4). Only the measurement of elbow extension did not differ between sessions (p=0.966).
Inter- Rater Reliability
Upon review of the inter-rater reliability, Examiner 1 recorded significantly lower amounts of in-person shoulder flexion compared to Examiner 3 (p=0.010) and Examiner 4 (p<0.001) (Table 5). Similarly, Examiner 1 recorded significantly lower amounts of in-person wrist extension compared to the other three examiners (p<0.001). Regarding the virtual measurements, Examiner 4 recorded significantly greater amounts of shoulder flexion compared to Examiners 1 and 2 (p<0.001). Examiners 3 and 4 recorded significantly greater amounts of wrist flexion compared to Examiner 1 (p≤0.010) while Examiner 3 also recorded significantly greater wrist flexion compared to Examiner 2 (p=0.005). Finally, Examiner 1 recorded significantly lower amounts of wrist extension compared to Examiners 2 (p=0.031) and 3 (p=0.023).
DISCUSSION
Clinicians routinely utilize range of motion measures to predict the development of and to diagnose certain pathologies as well as to determine function of a body part.14–29 Following COVID-19, virtual patient evaluations became more common raising concern about the inclusion of and reliability of motion measurements. This study aimed to determine the reliability of measuring virtual range of motion in the shoulder, elbow and wrist using a goniometer. Virtual assessment compared to in-person goniometric measurements showed good to excellent inter- and intra-rater reliabilities (ICC≥0.60).
Past researchers have attempted to examine range of motion in the shoulder,4,27,30 elbow,1–3,27,31,32 and knee33,34 using methods such as radiographs, visual estimation, inclinometer, smart phone applications and goniometry. Blonna et al. reported excellent to good reliability between surgeons and physician assistants when comparing visual observation of elbow flexion/extension to goniometry but noted the highest ICC’s using a goniometer.31 Similarly, van de Pol et al. found a wide range of inter-rater reliability depending on the method utilized but concluded devices such as goniometers or inclinometers should be utilized over visual observation due to more consistent and higher ICC values.8 The current results agreed with these findings where the reliability metrics for all measurements ranged from good to excellent . When employing visual observation alone, Hickey et al. reported limited agreement to observe and define asymptomatic versus symptomatic scapular motion via video cassette tapes between experienced and novice clinicians. This suggests that more than a trained eye should be used for shoulder evaluations and although their study did not use a goniometer, it does support the need for a more quantitative form of measurement for shoulder evaluation in a virtual medium.
The most important finding of this study is that although both in-person and virtual measurements ranged from good to excellent test/re-test reliability, there were a higher number of excellent ICC’s for the virtual measurements. This is most likely due to the lack of movement between trials for the virtual measurements. These data support using a screen shot and goniometer during a virtual examination for assessing flexion and extension of the shoulder, elbow, and wrist. Recent literature, supporting the virtual examination, has focused on camera positioning and placement in addition to clothing to ensure the most accurate measures.6,35 These variables are causes for possible differences in the measurement values between in-person and virtual sessions of the current study. It is important to set up a consistent space for the clinician to perform the best evaluation of a patient, however, these results point towards using the screen shot and goniometer as a reliable method of upper extremity evaluation. Future testing of virtual range of motion should address motions in the horizontal and transverse planes of motion which should attempt to include more specific instructions regarding clothing and setting up an optimal space for recording the evaluation. Additionally, due to continuing technological advances in software and devices, it is recommended future efforts be expended on establishing psychometrics of digital applications and devices designed for assessing and quantifying motion for various anatomical joints in virtual environments.
Limitations
The findings of this study show virtual range of motion measures had a high level of ICC’s (excellent) which suggests clinicians can obtain quantitative measurements even if the patient is not directly in front of the clinician. However, there are limitations to discuss. First, differences found in the results of this study could be due to a few variables such as clothing differences, patient posture and patient joint position sense. The subjects who volunteered for this study were not instructed about the type of clothing to wear as they were a sample of convenience. Loose fitting blouses or patterns could have hindered the clinicians view of the joint making it difficult to find the same landmarks consistently. Likewise, a tight-fitting shirt could have hindered that patient’s ability to fully achieve range of motion. Second, posture could have played a role in differences of measurement. Subjects were not instructed to stand in anatomically correct (or ideal) posture, nor were any postural differences between subjects corrected for. For instance, a patient with forward rounded shoulders might have less range of motion than a patient whose posture is more anatomically correct or more ideal. However, for shoulder extension, subjects who noticeably altered trunk position to gain more extension were immediately told to remain in each person’s “typical” posture, but no other corrections were applied. Finally, joint position sense could have played into any difference in results. Still photos for all motions (shoulder flexion/extension, elbow flexion/extension and wrist flexion/extension) were all taken prior to range of motion testing. Conversely, subjects were asked to repeat the same motions multiple times for the in-person measurements which could have increased flexibility throughout the testing (or created fatigue due to repeated positioning for four examiners) and therefore changed the end position of motion as the patient progressed through the testing.
CONCLUSION
Measuring range of motion both in-person and virtually had good to excellent test/re-test reliability suggesting either method is acceptable to use clinically. Capturing screenshots during a virtual exam to measure range of motion is recommended and is supported by the higher percentage of ICCs being ranked as excellent for the virtual measurements. Using a goniometer can provide an objective component to assessment and diagnosis of upper extremity injuries in the virtual examination for more thorough and accurate clinical decision making.
Conflicts of interest
The authors report no conflicts of interest.