An estimated 2.5 million sports-related knee injuries occur in adolescents annually in the United States, resulting in significant time loss from sports participation for young athletes.1–4 Female athletes are disproportionately more susceptible to sport-related knee injury than males, having a two to ten times greater risk for sustaining severe ligamentous injuries such as an anterior cruciate ligament (ACL) rupture.5–7 ACL injuries impose a significant burden on young athletes, including time away from sports and peers, extended rehabilitation and high healthcare costs, thus, injury prevention interventions have been sought to reduce injury risk.8 The use of clinical screening tools and preventive interventions to address modifiable injury risk factors has been recommended to reduce the overall incidence of knee injuries in this population.7

Up to 70% of all ACL injuries occur via a non-contact mechanism commonly involving deceleration and/or a direction change on a planted foot.9 Specifically, neuromuscular deficits at the trunk and lower extremity (LE) have been identified as key modifiable risk factors for ACL injury during changes in direction or cutting maneuvers.10–13 However, clinical screening tools analyzing movement patterns during a cutting maneuver are currently limited. Weir et al. reported fair to excellent intra-rater reliability and poor to excellent inter-rater reliability for a quantitative two-dimensional (2D) assessment of a 45-degree sidestep cut in 15 junior and 15 elite senior female field hockey players.14 In their study, angular measurements demonstrated higher reliability compared to displacement measures such as foot-pelvis distance and knee valgus displacement.14 Alternatively, the Cutting Movement Assessment Score (CMAS), a qualitative assessment, was found to be a reliable and valid tool to assess movement patterns during a 90-degree cutting task in collegiate athletes.15 Another qualitative assessment, the Cutting Alignment Scoring Tool (CAST), reported good inter-rater and intra-rater reliability for assessing trunk and LE alignment in the frontal plane during a 45-degree sidestep cut in young athletes (age = 14.7+1.2 years).16 The CAST was further developed with the Expanded Cutting Alignment Scoring Tool (E-CAST) which added sagittal plane assessments at the knee and ankle.17 This more comprehensive E-CAST demonstrated moderate inter-rater reliability and good intra-rater reliability when assessing trunk and LE alignment in the frontal and sagittal planes during a 45-degree sidestep cut.17

While these 2D screening tools were developed to assess a cutting maneuver using either quantitative or qualitative 2D assessment criteria, it is unknown which type of 2D assessment (quantitative versus qualitative) is more reliable for clinical movement evaluations during a cutting task. To the author’s knowledge, there is only one study that compared the reliability between quantitative and qualitative 2D assessments of the LE during athletic tasks. Simon et. al assessed the reliability between quantitative measurement of frontal plane projection angle and qualitative visual assessment of dynamic valgus during a lateral step down task and found higher reliability with the quantitative assessment.18 However, given differences in reliability and validity of 2D assessment tools between different athletic tasks, the results of this study may not be generalizable to cutting and pivoting maneuvers.19 Thus, to fill this knowledge gap, the authors of the current study developed a quantitative version of the E-CAST, using a 2D kinematic assessment. The purpose was to examine the reliability of the quantitative version of the E-CAST among physical therapists and to compare the reliability of the quantitative E-CAST to the original qualitative E-CAST. Specifically, this study consisted of three aims: 1) to assess the inter- and intra-rater reliability of the quantitative version of the E-CAST; 2) to examine rater agreement of each component of the quantitative version of the E-CAST; and 3) to compare the reliability of the quantitative version of the E-CAST to the original qualitative scoring tool. Given these aims, the hypotheses were: 1) there would be good to excellent inter- and intra-rater reliability; 2) there would be good to almost perfect agreement in the assessed variables, including cut width, trunk lean, knee flexion and valgus, and, plantarflexion; and 3) the quantitative version of the E-CAST would demonstrate greater inter-rater and intra-rater reliability compared to the qualitative E-CAST.


Study Design

A repeated measures study design was used. The study protocol was developed based on the Declaration of Helsinki and ethical standards in sport and exercise science research.20 Institutional Review Board approval was obtained prior to commencement of the study.


A total of 25 adolescent female athletes were recruited for participation in the study from local middle school, high school, and club sport teams. These were the same participants from the original work of Butler et al.17 A review of current research in this area led to the sample size selection. Participants were included if they were between the ages of 12 and 17 years and were active participants in sports requiring cutting and pivoting within the prior 12 months. Participants were excluded from the study if they had a LE injury within the prior six months, a history of LE surgery, a positive response on the Physical Activity Readiness Questionnaire (PAR-Q+), or a history of scoliosis. The PAR-Q+ was used to determine the participants’ readiness and safety for physical activity. A positive response of the PAR-Q+ indicates the need to seek further advice from a physician prior to engaging in physical activity.21 Written parental informed consent and participant informed assent were obtained prior to the start of the study.

Data Collection

Data collection was performed in a movement science laboratory at a local sports medicine center. A 5-minute warm up on an exercise bike (Matrix Fitness, Cottage Grove, WI) was performed prior to performing the 45-degree sidestep cut. Participants practiced the sidestep cut three times in each direction or until they felt comfortable performing the maneuver. They were instructed to sprint at 80% of their maximum speed in a forward direction toward the “opponent cone” and plant to perform a sidestep cut (Figure 1). This procedure was modeled by a testing protocol described by McLean et al which requires participants to decelerate, plant on the stance foot, and cut between two cones placed on their contralateral side along a 45-degree line of progression.22 (Figure 1).

Figure 1
Figure 1.

Participants completed three trials planting on the right foot and three trials planting on the left foot, and a trial was considered “good” if the participants’ foot landed within the stance/pivot area such that video data successfully captured the cutting maneuver. The testing order was standardized for all participants following the protocol by Butler et al.16 Video data were captured at 60 frames per second with 1080p quality using three Sony RX10 IV cameras adjusted to 36 inches tall. Two cameras were positioned 136 inches from either side of the stance/pivot area, and one camera was positioned 146 inches in front of the stance/pivot area. A total of six cutting maneuvers were performed by each participant with one trial randomly selected for analysis. All videos were slowed by 50% for visual analysis and participants’ faces were blurred using VideoStudio.

Quantitative 2D Assessment Tool

The quantitative assessment tool was devised based on the previously reported qualitative scoring system (E-CAST).17 The original six-item assessment criteria from the E-CAST were adapted and re-defined to utilize a motion analysis application on a smart phone that allowed for the extraction of 2D kinematic measurements. The quantitative scoring tool involved a dichotomous rating system, with scoring defined as “1” when a movement fault was present and “0” when optimal movement patterns were observed. Frontal and sagittal plane variables were assessed. Frontal plane variables included: trunk lean opposite of the cut direction, increased cut width, knee valgus at initial load acceptance (static valgus) and knee valgus throughout the cutting task (dynamic valgus). Sagittal plane variables included: ankle plantarflexion and knee flexion. The quantitative 2D assessment tool is shown in Table 1.

Table 1.Adapted Checklist
Item View 2-D Kinematic Measurement Definitions
Trunk lean to opposite direction of cut Frontal At the time point of initial load acceptance, draw a line connecting the athlete’s right and left ASIS* (hip line). Next, draw a line from the center of the head to the midpoint of the hip line (trunk line). Measure the angle formed between the trunk line and vertical. If the trunk line is deviated greater than 10° score 1 (YES). If the trunk line is deviated less than or equal to 10° score 0 (NO).
Increased cut width Frontal At the time point of initial load acceptance, draw a line down from the lateral most aspect of the athlete’s stance leg hip, if the line appears to fall more than one shoe width medial to the foot score 1 (YES). If not, score 0 (NO).
Static valgus Frontal At the time point of initial load acceptance measure the angle formed between the stance limb hip, knee and ankle joint centers. If the angle formed is greater than 8° score 1 (YES). If the angle formed is less than or equal to 8° score 0 (NO).
Dynamic valgus Frontal Measure the angle formed between the stance limb hip, knee and ankle joint centers at the maximum point of knee valgus during the cut. If the angle formed is greater than 8° score 1 (YES). If the angle formed is less than or equal to 8° score 0 (NO).
Decreased knee flexion angle Sagittal At the time point of initial contact, measure the angle formed between the lateral hip, lateral knee and lateral malleolus. If the angle formed is less than 30° score 1 (YES). If angle formed is greater than or equal to 30° score 0 (NO).
Decreased plantar flexion angle Sagittal At the time point of initial contact, measure the angle formed between the lower leg and the bottom sole of the shoe. If the angle formed is less than 90° score 1 (YES). If the angle formed is greater than or equal to 90° score 0 (NO).

*anterior superior iliac spine


Two raters who were doctors of physical therapy in a pediatric sports medicine department were chosen based on their clinical roles in treating young athletes. The raters belong to the same medical institution, and each had seven years of clinical experience. Both raters provided their informed consent to participate in the study and independently viewed 25 videos. This study was performed by two different raters than those who participated in the original work of Butler et al.17


One video for each participant was provided to each rater along with a reference sheet containing images that demonstrated 1) how to take the 2D kinematic measurements using the smartphone application and 2) the adapted definitions for each original qualitative variable (see document, supplementary digital content 1, adapted checklist reference sheet). The raters were instructed to view the videos independently. They were allowed to review the videos and take as many measurements as they felt necessary. All videos were evaluated using each rater’s personal smart phone device and a free publicly available motion analysis application (Hudl Technique Version The raters were given one week to complete the first reliability session followed by a two-week wash-out period. Then, the second reliability session was performed, using the same method outlined for the first reliability session. The sequence of videos was randomized in the second reliability session using a web-based randomization tool and both raters were blinded from their previous ratings recorded in the first reliability session.

Statistical Analysis

Reliability was determined by calculating intraclass correlation coefficients (ICC) for the scoring tool total scores, with a 2-way mixed-effects model and 95% confidence intervals (95% CIs) for inter- and intra-rater reliability. For the first aim, the individual and cumulative inter-and intra-rater reliabilities were calculated within the first and second reliability sessions. ICC values less than 0.50, between 0.50 and 0.75, between 0.75 and 0.90, and greater than 0.90 were defined as poor, moderate, good, and excellent reliability, respectively.23 To attain study aim 2, a kappa coefficient was calculated for each of the scoring tool variables using the formula κ= Pr(a) – Pr(e)/1 – Pr(e), where Pr(a) represented relative observed agreement between raters and Pr(e) represented hypothetic probability of chance agreement.24 The kappa coefficient was interpreted based on the scale of Landis and Koch with 0.01-0.20 as slight, 0.21-0.40 fair, 0.41-0.60 moderate, 0.61-0.80 good, and 0.81-1.00 almost perfect agreement.25 Correlations were converted to z scores using the following Fisher Z-transformation equation (z’ = 0.5[ln(1+r) – ln(1-r)]) to compare the quantitative assessment criteria to the original qualitative assessment criteria for significance (α < 0.05).26 All statistical analyses were conducted using SPSS Statistics 22 (IBM Corp. Released 2013. IBM SPSS Statistics for Windows, Version 22.0. Armonk, NY: IBM Corp).


A total of 25 adolescent female athletes (age 13.8 ± 1.4 years, body mass 52.4 ± 9.3 kg, height) 161.7 ± 6.0 cm) participated. (Table 2)

Table 2.Participant demographics
Age (years) Height (cm) Weight (kg) BMI
Minimum 12.0 150.0 40.8 16.4
Maximum 16.3 172.5 72.6 26.3
Average 13.8 161.7 52.4 19.9
Standard deviation 1.4 6.0 9.3 2.6

Intra-rater reliability for Rater 1 was moderate (ICC: 0.667, 95% CI 0.255-0.852) and intra-rater reliability for Rater 2 was excellent (ICC: 0.900, 95% CI 0.777-0.956; Table 3). The cumulative intra-rater reliability of both raters was good (ICC: 0.821, 95% CI 0.687 – 0.898; Table 3). Cumulative intra-rater kappa coefficients of all variables ranged from moderate to almost perfect (κ= 0.505-0.875; Table 3). Inter-rater reliability for the first reliability session was moderate (ICC: 0.747, 95% CI 0.436- 0.888) and inter-rater reliability for the second reliability session was good (ICC: 0.760, 95% CI 0.463-0.894). The cumulative inter-rater reliability of both sessions was good (ICC: 0.752, 95% CI 0.565-0.859). Cumulative inter-rater kappa coefficients of all variables ranged from fair to good (κ=0.336-0.751; Table 4). To compare correlations, Fisher’s r to z transformation was utilized. This transformation is done so that the z-scores can be compared and analyzed for statistical significance by determining the observed z test statistic. The z-score comparing inter-rater reliability was -0.3 with a corresponding p-value of 0.382. With an alpha level of 0.05, we were able to conclude that there was no statistically significant difference between inter-rater reliability when comparing the qualitative and quantitative assessments. The z-score comparing intra-rater reliability was -0.38 with a corresponding p-value of 0.352 leading to the conclusion that there was also no statistically significant difference between intra-rater reliability when comparing the qualitative and quantitative assessments.

Table 3.Intra-rater reliability (*(ICC, 95%†CI, cumulative values) and intra-rater reliability for adapted checklist variables
Raters *ICC 95% †CI Cut Width (‡k) Trunk Lean (‡k) Dynamic Valgus (‡k) Static Valgus (‡k) Knee Flexion (‡k) Plantar Flexion (‡k)
Rater #1 0.667 0.255-0.852 0.364 0.595 0.865 0.865 0.606 0.503
Rater #2 0.900 0.777-0.956 0.694 0.457 0.884 0.803 0.684 0.481
Cumulative 0.821 0.687-0.898 0.532 0.558 0.875 0.831 0.658 0.505

*Intraclass correlation coefficient; †confidence interval; ‡kapa coefficient

Table 4.Inter-rater reliability for adapted checklist variables
Width (k*)
Lean (k*)
Dynamic Valgus (k*) Static Valgus (k*) Knee Flexion (k*) Plantar Flexion (k*)
1 0.595 0.627 0.651 0.694 0.448 0.493
2 0.816 0.194 0.865 0.752 0.532 0.157
Cumulative 0.733 0.394 0.751 0.722 0.493 0.336

* kappa coefficient


The purpose of this study was to assess the intra-rater and inter-rater reliability of a quantitative version of the E-CAST among physical therapists. The quantitative assessment tool demonstrated good intra-rater reliability (cumulative ICC: 0.821, 95% CI 0.687-0.898) and inter-rater reliability (cumulative ICC: 0.752, 95% CI 0.565-0.859). These findings support the first hypothesis that the quantitative assessment tool would demonstrate good to excellent inter and intra-rater reliability. The second hypothesis was not supported as only moderate agreement was found for the cut width variable, fair to moderate agreement for trunk lean and plantar flexion variables, and moderate to good agreement for dynamic valgus and knee flexion variables. Static valgus was the only variable that demonstrated good to almost perfect agreement. Furthermore, the current findings did not support the third hypothesis as there was no significant difference in intra- and inter-rater reliability between the quantitative assessment tool and the qualitative E-CAST (Zobs = -0.46 and Zobs = -0.30). This was likely a result of the small difference between actual values. For inter-rater reliability, there was only a 0.042 difference between the qualitative and quantitative assessments and for intra-rater there was a 0.041 difference.

Although the quantitative assessment tool resulted in slightly higher reliability compared to the qualitative E-CAST, which reported moderate inter-rater reliability (cumulative ICC: 0.71, 95% CI 0.50-0.91) and good intra-rater reliability (cumulative ICC: 0.78, 95% CI 0.59-0.96), this difference was not significant.17 From a clinical standpoint this suggests that the use of app based measurements may not be necessary to reliably assess trunk and LE alignment during change of direction maneuvers. This is an important finding given the time restrictions in the clinic setting. While both the quantitative and the qualitative assessment tools demonstrated adequate reliability, the original qualitative E-CAST may be more efficient. These findings also indicate that quantitative tools may still be subject to variability in time point and landmark identification which likely contributed to this variation in reliability. Valgus variables demonstrated the highest intra- and inter-rater reliability when utilizing either quantitative or qualitative assessment. Interestingly, lower intra and inter-rater reliability were found for the variables of trunk lean and plantarflexion with the quantitative assessment (Table 3). Similarly, lower inter-rater reliability was observed for the static valgus variable using the quantitative verses the qualitative assessment (Table 3). This may be a result of the differences in the operational definitions used for these variables between the two assessments. Specifically, for the trunk lean variable, the qualitative E-CAST uses a horizontal line reference while the quantitative assessment uses an angle measurement off a vertical line as a reference. Additionally, for plantarflexion, the qualitative E-CAST uses a point of first contact (toe-to-heel vs. heel-to-toe) definition, while the quantitative assessment uses an angle measurement that requires the rater to visually identify the sole of the shoe. Thus, it is possible that variability in landmark identification may have decreased rater agreement. Similarly, the variable of static valgus requires identification of the time point of initial load acceptance. When using a quantitative 2D measurement, differences in time point identification may contribute to poorer agreement between raters and may also explain the differences in intra-rater reliability of each rater in this study.

The findings of this study are generally in agreement with the work of Weir et al. who reported fair to excellent intra-rater reliability and poor to excellent inter-rater reliability for their quantitative 2D assessment tool.14 There are however important differences between the two studies. First, the study by Weir et al. assessed the reliability of joint and segment angle measurements, which are continuous variables.14 In the current study, 2D measurements were used to determine if the movement fault was “present” or “not present”, resulting in dichotomous variables. Variability in the reported rater agreement between the two studies may be attributed to differences in statistical assessment of agreement between continuous and dichotomous variables. Assessing agreement between two dichotomous variables may be more challenging given the strict response of present or not present compared to continuous variables, which allow for a wider range of potential responses and possibly more opportunity for agreement. Furthermore, the dichotomous variable derived from the 2D kinematic assessment may be more sensitive to human error than the qualitative assessment given that the extracted variables were highly influenced by landmark identification. Additionally, the study by Weir et al used an unplanned 45-degree sidestep cut compared to a planned cutting task which was used in this study. Unplanned cutting tasks have been shown to result in higher knee joint loads, which may make movement faults easier to visually identify.27

When comparing the findings of this study to other qualitative assessments of cutting, similar results are reported. For the CMAS, excellent intra-rater reliability (ICC=0.95) and moderate inter-rater reliability (ICC= 0.69) were reported when utilizing a qualitative scoring system to evaluate a 90-degree cutting maneuver.15 While the current study reported slightly lower intra-rater reliability (ICC=0.82) and slightly higher inter-rater reliability (ICC= 0.73), these slight differences are likely not clinically significant.

Time limitations have been previously reported as a barrier to movement screenings.28 If quantitative 2D assessment does not significantly improve reliability compared to qualitative assessment, then clinicians should consider ease and efficiency when choosing the type of assessment tool to use. Given the simplicity of qualitative visual assessments, this might support their use over more complex and technology-dependent quantitative measurements.


This study has several limitations. First, the adapted checklist evaluated reliability among two physical therapists using a two-way mixed effects model which reduces the generalizability. Future studies should consider assessing reliability amongst a larger group of raters using a two-way random effects model. Furthermore, coaching staff and athletic trainers in school or club sport settings are likely best positioned to perform movement screenings, thus, reliability of this tool should be assessed in non-clinically trained personnel. Providing coaching staff with reliable and valid assessment tools will help them to identify athletes at the highest risk for injury and thus the best candidates for preventive interventions. However, it should be noted that not all coaches have the background knowledge to perform this type of assessment. Additional training for coaches on how to utilize this assessment tool may be necessary. Also, of note, this study used a planned cutting task. Unplanned cutting tasks have been shown to result in greater knee joint loads compared to planned cutting maneuvers.27 However, video assessment of an unplanned cutting maneuver requires an additional camera view (two sagittal views compared to one). Adding more cameras increases the complexity of set up and may result in decreased utilization of the tool. Additionally, this study only assessed the reliability between two raters, future studies should evaluate the tool’s reliability between multiple raters. Lastly, it is unknown if 2D qualitative or quantitative tools can predict those at risk for ACL injury. Future studies should aim to determine the sensitivity and specificity of the assessment tools in identifying athletes who are at high risk for an ACL injury. The concurrent validity of 2D qualitative and quantitative tools with 3D motion capture is also unknown and should be studied further.


The results of this study suggest that qualitative 2D assessment is comparable in reliability to more complex quantitative 2D analysis when evaluating trunk and LE alignment during a 45-degree sidestep cut. These findings highlight the potential for more efficient and feasible screening methods to identify high-risk movement patterns during cutting tasks. Additional work is needed to determine the concurrent validity of both the qualitative assessment (E-CAST) and the adapted quantitative checklist.

Funding Source

No funding.

Ethics Approval

This study was approved by the Western International Review Board for human subjects’ research and by the University of Texas Southwestern Institutional Review Board for human subjects’ research.

Conflicts of Interest

The authors report no conflicts of interest.