INTRODUCTION
The International Knee Documentation Committee (IKDC) Subjective Knee Form (SKF) is a 19-item joint-specific patient-reported outcome measure (PROM) used in orthopedics and sports medicine, with applications in clinical practice and research.1 The IKDC-SKF is intended to be used across a variety of knee pathologies (e.g., ACL injury,2 meniscal injury,3 cartilage damage,4 patellofemoral pain syndrome5) and has been translated into many languages (e.g., Chinese,6 Arabic,7 German8). Initial assessment of scale properties indicated the English version of the IKDC-SKF had adequate test-retest reliability (ICC range = 0.85-0.99)9–11 but large ceiling effects,12 while translated versions have been reported to have good responsiveness (i.e., change can be detected) without floor or ceiling effects.8,10,11 However, other analysis procedures necessary to establish the measurement properties of the IKDC-SKF for use in clinical practice and research are either lacking (e.g., multi-group invariance testing) or have identified potential concerns with the scale.
For example, internal consistency, a measure of how similar the items are in a unidimensional scale or unique subscale, should be calculated for each construct included in a scale.13–16 High alpha levels (i.e., ≥ 0.90) have been interpreted as evidence of strong internal consistency; however, these high values may be more likely to indicate item redundancy, inclusion of too many items or parallel items, construct underrepresentation, or reduced construct precision.14,15,17 Low values (i.e., ≤ .70), in contrast, indicate poor internal consistency within a scale or construct. While alpha values ≥ 0.70 and < 0.90 are often considered acceptable,13,14 a range of ≥ 0.80 and < 0.90 has been recommended for scale development.15,17 Reported Cronbach’s alpha values for the IKDC-SKF have ranged from 0.77 to 0.97 across different versions of the scale.8–10,18 The reported Cronbach’s alpha values outside the recommended range, particularly those well above 0.90, along with those calculated for the entire scale, raise concerns regarding the item design, internal consistency, and dimensionality of the IKDC-SKF. The results suggest further scale modification is needed to reduce redundancy, construct underrepresentation,15,17 and response burden,19 while also improving model fit.20,21
Psychometrically sound reflective scales should also have consistent structural validity, which is often established through exploratory factor analysis (EFA)22–24 and confirmatory factor analysis (CFA) procedures20,25,26 or Rasch analysis.26 When factor analysis is used, initial procedures should follow recommended extraction techniques,27 and factor identification should follow recommended procedures, as under factoring or over factoring issues occur in scale development.28 While principal component analysis (PCA) can be used initially to reduce the number of items and factors,23,29 a common factor approach is preferred,30 and PCA should not be used as a substitute for EFA and CFA to determine the underlying latent factors.20,22,23,29 Further, once EFA procedures have been used to identify latent factors, CFA procedures in new samples to confirm the factor structure are recommended.20,22,23,29 An important step in this process is the identification of latent factors, which is often performed using multiple criteria, such as eigenvalues (e.g., Kaiser-Guttman criterion of values greater than 1.0), scree plots, percent variance explained, or minimum-average partial correlation.22–24 Parallel analysis is another approach that has been recommended because it performs well across PCA or EFA procedures for correctly identifying factors.24,30,31 As it relates to the IKDC-SKF, researchers have used PCA, EFA, and Bayesian CFA procedures to establish scale structure; however, multiple factor solutions have been found1,32,33 and best practice recommendations have not always been followed. For example, PCA has resulted in three-component solutions, with researchers supporting a one-factor solution,32 despite recommendations to not use PCA.20,22,23,29 Others have used EFA and Bayesian CFA methods which resulted in two-factor solutions,1,33 without implementing many recommendations (e.g., parallel analysis) in the identification of latent factors.20,22,23,29 Thus, the factor structures (e.g., number of factors) have not been consistent across studies,1,32,33 and the solutions have varied in the number of items to include in the final scale (e.g., 15 items across two-factors,33 all 18 items across the PCA solution).32
Short form versions have also been identified from EFA, CFA, and Rasch analysis procedures. The first short form (i.e., 15 items) solution, however, was not identified with EFA procedures using the most contemporary methods for item retention and factor identification (e.g., parallel analysis)24,30,31 and CFA results have indicated further modification of the scale is warranted to identify a sound short form version for use in clinical practice and research.8 The need for further item removal and the identification of a parsimonious short from was also supported with Rasch analysis; however, final model solutions differed with one retaining 5 items12 and the other retaining 8 items.34 Other concerns with these studies12,34 are the use of small samples (i.e., 7734 and 160,12 respectively) and respondent pools (e.g., healthy respondents) who are not representative of the patient population with which the scale is used in clinical practice and research. Thus, further research is needed to make clearer recommendations on a parsimonious IKDC-SKF short form that can be used in practice and research.
Finally, multi-group and longitudinal measurement invariance and hypothesis testing assessment results, which helps ensure scale suitability for use in research and clinical practice,20,25,26,35 have not been reported for the IKDC-SKF. Multi-group invariance testing should be conducted to ensure factorial stability exists across different populations, which establishes measurement properties are equivalent across various subgroups (e.g., sex, age, injury type). Establishing multi-group invariance of PROMs allows clinicians and researchers to answer substantive questions regarding group differences.20,26,36 Longitudinal invariance testing is valuable for PROMs because it helps establish if the underlying constructs are adequately measured across repeated testing to allow clinicians or researchers to interpret score changes as true change.20,25,36 Establishing multi-group and longitudinal invariance then allows for hypothesis testing by determining if the scale can used to measure differences between relevant groups or across time.20,25,26
Thus, further psychometric assessment of the IKDC-SKF is warranted given the reported inconsistencies, concerns with scale measurement properties, lack of invariance analysis results, and inconsistent findings on a short form version. Performing EFA and CFA procedures in large, diverse, and separate samples is valuable for determining and then confirming or refuting the structural validity of the IKDC-SKF or an identified short form version. These procedures will allow for identification of a parsimonious and psychometrically sound scale when following contemporary factor analysis procedure recommendations. Additionally, assessing the internal consistency of the identified factors (i.e., one, two, or three factors) is warranted to further confirm internal consistency and measurement precision without item redundancy. Finally, conducting multi-group and longitudinal invariance testing will provide insight into whether the scale can be used to measure group differences and change over time. Establishing these scale properties provides clinicians and researchers with a psychometrically sound scale to track patient progress or compare groups. Therefore, the purpose of the study was to assess the psychometric properties of the IKDC-SKF in a large, heterogenous sample. This included four separate mechanisms: 1) to conduct EFA following best practice recommendations to identify a sound latent structure, which may include alternate forms (i.e., short forms), of the IKDC-SKF in a large, heterogenous sample; 2) to assess the internal consistency of any identified constructs; 3) to use CFA procedures to confirm the structural validity of the identified scale structure in a separate sample; and 4) to perform relevant multi-group and longitudinal invariance procedures on the identified scale to inform practitioners and researchers on scale use for assessing group differences and change over time.
METHODS
A sample of patient data obtained from the Surgical Outcome System (SOS, Arthrex, Naples, Florida) was used for the study. Patients provided informed consent prior to using the SOS and were emailed PROMs at predetermined intervals. Institutional Review Board (IRB) approval for the project was granted by the Cedar-Sinai Office of Research Compliance and Quality Improvement as part of a larger research project using SOS data. University IRB was not required because the deidentified data set was not considered human subject research.
For this study, patients who were classified in an arthroscopic knee surgery group and who had completed the IKDC-SKF at baseline (i.e., pre-arthroscopic knee surgery) were included in the study. For longitudinal invariance, only patients who completed the IKDC-SKF at four time points (i.e., baseline [pre-arthroscopic knee surgery], three months post-surgery, six months post-surgery, and 12-months post-surgery) were included in the analysis.
Instrumentation
International Knee Documentation Committee – Subjective Knee Form
The IKDC-SKF is a 19-item knee joint specific PROM.37 The IKDC-SKF includes one dichotomous item, four 11-point Likert scale items, and fourteen 5-point Likert scale items. Eighteen of the items are summed into one score which ranges from 0 to 100 (item #19 is not included in the score).37 A higher score represents less dysfunction, less pain, and greater knee function.37
Data Analysis
A total of 1,959 individuals completed the IKDC-SKF prior to knee arthroscopy and were exported from the SOS database into the Statistical Package for Social Sciences (SPSS v. 25.0, Chicago, IL) and Analysis of Moment Structures (AMOS v. 25.0, Chicago, IL) for analysis. Cases with a z-score equal to or greater than ± 3.3 were classified as univariate outliers and were subsequently removed. The dataset was also assessed for multivariate outliers using Mahalanobis distance; cases with a p < 0.001 according to the Chi-square test were removed from the data set. Respondent data were not excluded if demographic information was missing because the primary study purpose was to assess the IKDC-SKF. Finally, histograms and descriptive statistics (i.e., skewness and kurtosis values) were used to assess the normality of the data. Following data cleaning, the data set was randomly split into two equal samples (n1 and n2).
Exploratory Factor Analysis
An EFA with maximum likelihood extraction and direct oblimin rotation was conducted on sample n1 to identify a parsimonious scale. Bartlett’s test of sphericity (< 0.001) and Kaiser-Meyer-Olkin values (≥ 0.80) were assessed, with values outside of the specified ranges constituting a violation of the test.38 Items were assessed individually and removed one at a time until a parsimonious solution was identified.22,23 Item removal was guided by theoretical (e.g., item content), design-related (e.g., item structure)19 and statistical (e.g., low factor loadings ≤ 0.40, high cross-loadings ≥ 0.30, high bivariate correlations with another item, poor contribution to internal consistency) criteria.15,23,25,38 Factor retainment was guided by eigenvalues ≥ 1.0, scree plot examination, and factors that explained ≥ 5.0% of the variance.25,28,38,39 Parallel analysis was used to confirm or refute factor retention; eigenvalues of the original data set were compared to a randomly ordered data set to inform final factor retainment.40 Cronbach’s alpha was also calculated for each factor retained. Items were considered for removal if the alpha value was ≥ 0.90; item removal was guided by statistical guidelines (i.e., which item was most redundant), theory (e.g., item content), and item design. The final EFA solution resulted in a parsimonious IKDC short form to be confirmed with CFA.
Confirmatory Factor Analysis of Proposed IKDC-SKF Short-Form*
Sample n2 was used to conduct a CFA of the proposed IKDC-SKF short form to confirm model structure using maximum likelihood estimation in AMOS. Model fit indices used for evaluation included the Comparative Fit Index (CFI) ≥ 0.95, Tucker-Lewis Index (TLI) ≥ 0.95, and root mean square error of approximation (RMSEA) ≤ 0.06. Models with fit indices values outside of the specified ranges indicated poor model fit and were interpreted as not supporting the proposed factor structure of the IKDC-SKF.20,41 CFA procedures also included assessing localized areas of strain and the interpretability, size, and statistical significance of the model’s parameter estimates (i.e., factor variances, covariances, and indicator errors).25 If indicated, additional items were removed, and the CFA procedures were repeated with the new model.
Multi-group Invariance Testing
Multi-group invariance testing between participant sex and age groups was conducted on the full sample (i.e., samples n1 and n2 combined) in three stages: 1) structural invariance to assess equivalent factor structure between subgroups; 2) metric invariance to assess equal factor loadings between subgroups; and 3) scalar invariance to confirm equal loadings and intercepts between subgroups. Each model was more restricted than the previous model20 and each step was used to assess whether the items were being interpreted equally across selected subgroups (i.e., sex, age group). These steps ensure the meanings of the common factors are consistent across groups and that mean scores are not contaminated by outside factors (e.g., group specific attributes), which then allows for substantive questions to be answered to support hypothesis testing (e.g., comparison of subgroup means).20 If the metric model held, subgroups could be tested for equal variances on the latent constructs, and if the scalar model held, subgroups could be tested for equal latent means. For the purposes of multi-group analysis by age, participants were split into groups defined as youth (<18 years old), emerging adult (18-25 years old), early adulthood (26-40 years old), middle age (41-65 years old), and older adult (>65 years old)42; however, the older adult group was not analyzed because of its small sample size (n = 30). The χ2diff and CFIdiff tests were both used to assess invariance, and the scale was considered invariant at each stage if the CFIdiff was ≤ 0.01 as compared to the configural model and the fit indices previously described were met. If the model was not found to be invariant at a given step, item loadings (i.e., metric model) or item intercepts (i.e., scalar model) were released one by one, and the model was retested. Once a problematic item was identified (i.e., the one that improved CFI to be closest to the CFI of the configural model), it was removed, and the model was re-run. For the substantive questions, if the CFIdiff was > 0.01 compared to the configural model, it was deemed that the subgroups were not equal on the tested statistic (e.g., latent means). In these cases, another model was run in which one group served as the comparison group to determine relative latent variances or means for the other subgroups (i.e., greater than, less than, or equal to the comparison group). The χ2diff test was not weighted as heavily in the invariance process because of the effect sample size has on this statistic.20,21
Longitudinal Invariance Testing
Longitudinal invariance testing was evaluated using the same procedures outlined in the multi-group invariance section to confirm similar interpretation of items and common factors across time points. If all models held (i.e., all fit indices cut-off values were met), it indicated that substantive properties (e.g., change over time) could be evaluated, allowing for clinician assessment of patient scores over time (e.g., did scores change from baseline to 12-months post-arthroscopy). The same procedures were used as described in multi-group invariance testing to identify any problematic items and create a more parsimonious scale, when indicated.
Correlation Analyses
Bivariate correlation analysis was conducted using scores from the 18-item IKDC-SKF and scores from any generated IKDC-SKF short forms. The preferred percentage of variance explained was set at r ≥ 0.90 (R2 = 0.81).43,44
RESULTS
A total of 55 cases were removed during the data cleaning process (i.e., identified outliers) leaving 1,904 cases for analysis; the 1,904 total cases were then randomly split into two even data sets (i.e., 952 cases in n1 and n2). For the full sample, participants were 32.06 ± 14.16 years of age (range: 11-80 years) and included 874 males and 802 females. For sample n1, participants were 32.42 ± 14.38 years old (range: 11-74 years) and included 441 males and 388 females. For sample n2, participants were an average of 31.69 ± 13.93 years old (range: 12-80 years) and included 433 males and 414 females.
Exploratory Factor Analysis
The initial exploratory factor analysis (EFA) using all 18 items resulted in a four-factor solution with items that had low loadings and high cross-loadings. Parallel analysis indicated a three-factor solution was sufficient when all 18 items were used. Items were removed during the EFA procedures one at a time and the solution was respecified until an acceptable solution was identified; a total of nine items were removed, resulting in a 9-item, 3-factor solution, with three items in each factor (Athletic Activities, Activity Level, Activities of Daily Living [ADLs]; Table 1). The solution accounted for 75.74% of the variance and Cronbach’s alpha values fell within the suggested range for each subscale (ADLs: 0.76; Activity Level: 0.84; Athletic Activities: 0.88; Table 1) with item loadings ranging from 0.65 to 0.93. While certain criteria (e.g., scree plot, percent variance explained) supported the 3-factor structure solution, parallel analysis with the nine items supported a two-factor structure. The 3-factor, 9-item scale was retained for CFA as further modification could be conducted during those analysis procedures to support or refute factor structure.
Confirmatory Factor Analysis 9-item IKDC-SKF Short Form
The CFA of the 3-factor, 9-item IKDC-SKF short form met all model fit criteria (CFI = 0.983; TLI = 0.975; IFI = 0.983; RMSEA = 0.057; chi-square = 97.667; p < .001; Figure 1) and had factor loadings ranging from 0.55 to 0.87. Construct correlations ranged from 0.64 to 0.75, with the highest correlation between Athletic Activities and ADLs (56.25% shared variance). Modification indices indicated significant cross-loadings and potential model misspecification were present.
Although model fit indices were exceeded, inspection of the model (e.g., item design, latent variable correlations, modification indices) and consideration of the parallel analysis findings led to further refinement and the identification of a 2-factor, 6-item modified IKDC-SKF short form. The 2-factor (Activity Level and ADLs), 6-item model supported by parallel analysis, demonstrated excellent model fit (CFI = 1.0; TLI = 0.999; IFI = 1.0; RMSEA = 0.11; chi-square = 8.943; p = 0.347; Figure 2), and addressed concerns (e.g., high latent variable correlations, cross-loadings) identified in the 9-item IKDC-SKF short form. Invariance testing (multigroup and longitudinal) was conducted on both the 9-item and 6-item IKDC-SKF short forms to provide further insight on both proposed factor structures.
Multigroup Invariance Testing
Multigroup invariance testing across sex and age groups was conducted using participant responses to the IKDC-SKF at baseline (i.e., pre-arthroscopy).
Sex
IKDC-SKF 9-item Short Form
A total of 1,676 individuals (males = 874; females = 802) reported sex and were used for analysis. Both individual models (i.e., males, females) met all fit indices criteria (Table 2). The configural model fit indices also met all recommended values (CFI = 0.990; RMSEA = 0.031; Table 2). The metric and scalar models passed the CFIdiff test, warranting examination of an equal variances and equal means model. The equal variance model passed the CFIdiff test, indicating variances were equal across groups. The equal means model also passed the CFIdiff test, indicating the means were equal for all latent variables across males and females.
IKDC-SKF 6-item Short Form
A total of 1,676 individuals (males = 874; females = 802) reported sex and were used for analysis. Both individual models (i.e., males, females) met all fit indices criteria (Table 3). The configural model also met all recommended model fit values (CFI = 0.990; RMSEA = 0.031; Table 3). The metric and scalar models passed the CFIdiff test, warranting examination of an equal variances and equal means model. The equal variance model passed the CFIdiff test, indicating variances were equal across groups. The equal means model also passed the CFIdiff test, indicating the means were equal for all latent variables across males and females.
Age Group
IKDC-SKF 9-item Short Form
A total of 1,762 individuals (youth = 321; emerging adults = 416; early adulthood = 558; middle age = 467) who reported an age (range: 11-65 years) were used for analysis. Baseline models (i.e., youth, emerging adults, early adults, middle age) met all fit indices (Table 4). The configural model fit indices met all recommended values (CFI = 0.993; RMSEA = 0.019; Table 4). The metric and scalar models passed the CFIdiff test, warranting examination of an equal variances and equal means model. The equal variance model passed the CFIdiff test, indicating variances were equal across groups. The equal means model did not pass the CFIdiff test, indicating the means were not equal for all latent variables between age groups. When means were not constrained, the middle age group had significantly lower means than all groups (i.e., more dysfunction, more pain, and less knee ability) across all three latent variables (i.e., ADLs, Activity Level, and Athletic Activities). Additionally, the early adulthood group had a significantly lower mean (i.e., more dysfunction, more pain, and less knee ability) than the youth and emerging adult groups for the ADL latent variable. Statistically significant mean differences were not found for any latent constructs between the youth and emerging adult groups.
IKDC-SKF 6-item Short Form
A total of 1,762 individuals (youth = 321; emerging adults = 416; early adulthood = 558; middle age = 467) who reported an age (range = 11-65 years) were used for analysis. Baseline models (i.e., youth, emerging adults, early adults, middle age) met all fit indices (Table 5). The configural model fit indices met all recommended values (CFI = 0.993; RMSEA = 0.019; Table 5). The metric and scalar models passed the CFIdiff test, warranting examination of an equal variances and equal means model. The equal variance model passed the CFIdiff test, indicating variances were equal across groups. The equal means model did not pass the CFIdiff test, indicating the means were not equal for all latent variables between age groups. When means were not constrained, the middle age group had significantly lower means (i.e., more dysfunction, more pain, and less knee ability) than all groups across both latent variables (i.e., ADLs and Activity Level). Additionally, the early adulthood group had a significantly lower mean (i.e., more dysfunction, more pain, and less knee ability) than the youth and emerging adult groups for the ADL latent variable. There were no significant mean differences for any latent constructs between the youth and emerging adult groups.
Longitudinal Invariance Testing
A total of 792 individuals completed the IKDC-SKF at all four time points and were retained for longitudinal invariance. The average age of participants in this subsample was 33.02 ± 15.00 years (range: 11-78 years; 354 females; 353 males).
IKDC-SKF 9-item Short Form
All baseline models (i.e., baseline, 3-months post-surgery, 6-months post-surgery, 12-months post-surgery) met all fit indices (Table 6). The configural model fit indices met all recommended values (CFI = 0.993; RMSEA = 0.019; Table 6). The metric model passed the CFIdiff test, warranting examination of an equal variances model. The equal variance model did not pass the CFIdiff test, indicating variances were not equal across time points for latent variables. The scalar model, however, did not pass the CFIdiff test, indicating potential item-level bias which did not support testing of the equal latent means model. Follow-up analysis indicated slight item bias for item #15 (i.e., “How does your knee affect your ability to run straight ahead?”).
Due to the item bias findings, invariance testing was conducted on an 8-item IKDC-SKF short form (i.e., the remaining items from the 9-item scale after item #15 was removed). All baseline models (i.e., baseline, 3-months post-surgery, 6-months post-surgery, 12-months post-surgery) met model fit indices (Table 7). The configural model fit indices met all recommended values (CFI = 0.997; RMSEA = 0.014; Table 7). The metric and scalar models passed the CFIdiff test, warranting examination of an equal variances and equal means model. The equal variance model did not pass the CFIdiff test, indicating variances were not equal across time points for latent variables. The equal means model also did not pass the CFIdiff test, indicating means were not equal across time. When not constrained to be equal, Activity Level, ADLs, and Athletic Activities latent means at 3-, 6-, and 12-months post-surgery were significantly higher than baseline (i.e., pre-arthroscopy) scores (i.e., less dysfunction, less pain, and higher knee ability), except for Activity Level latent means at three months. Scores increased/improved across time, except for Activity Level at three months, indicating patients reported scores with improved function, pain, and knee ability after surgery.
IKDC-SKF 6-item Short Form
All baseline models (i.e., baseline, three months post-surgery, six months post-surgery, 12-months post-surgery) met all fit indices (Table 8). The configural model fit indices met all recommended values (CFI = 0.998; RMSEA = 0.012). The metric model and scalar model passed the CFIdiff test, warranting examination of an equal variances and equal means model. The equal variance model did not pass the CFIdiff test, indicating variances were not equal across time points. The equal means model also did not pass the CFIdiff test, indicating means were significantly different across time points. When not constrained, Activity Level and ADL latent means at three-, six-, and 12-months post-surgery were significantly higher than baseline (i.e., pre-arthroscopy) scores (i.e., less dysfunction, less pain, and higher knee ability), except for Activity Level at six months post-surgery. Scores increased/improved across time, except for Activity Level at three months, indicating patients reported scores with improved function, pain, and knee ability after surgery.
Correlation Analysis
Individual scores for the IKDC-SKF 9-item short form were highly correlated (r = 0.924, R2 = 0.854) with the scores for the original 18-item IKDC-SKF. Individual scores for the IKDC-SKF 6-item short form were highly correlated (r = 0.889, R2 = 0.790) with the scores for the original 18-item IKDC-SKF. Scores for the IKDC-SKF 9-item short form were also highly correlated (r = 0.940, R2 = 0.884) with the scores for the 6-item IKDC-SKF short form. Finally, scores on the modified 8-item (3-dimension) IKDC-SKF short form were highly correlated with scores on the original 18-item IKDC-SKF (r = 0.919, R2 = 0.845), the 9-item IKDC-SKF short form (r = .992, R2 = 0.984), and the 6-item IKDC-SKF short form (r = .962, R2 = 0.925).
DISCUSSION
Best practice recommendations for assessing the structural validity have not always been followed or reported in measurement studies of the IKDC-SKF,20,22–31 which may explain the inconsistent structural findings reported.1,32,33,37 Further, multiple short form versions of the IKDC-SKF have been suggested in the literature, but initial efforts have primarily used small samples that do not well represent the patient population who completes the IKDC-SKF.12,34 Therefore, assessment of the IKDC-SKF using recommended classical test theory procedures was warranted and the purpose of our study was to conduct EFA, CFA, and invariance testing procedures on the IKDC-SKF in a large, heterogenous sample of patients to assess the measurement properties of the scale or an alternate, psychometrically sound short form version of the scale. EFA resulted in a 9-item, 3-factor IKDC-SKF short form (IKDC-SKF-9; Appendix 1) supported by CFA and multi-group invariance testing; however, the proposed model did not meet all recommended measurement criteria and did not pass longitudinal invariance requirements. Due to potential concerns with the identified 9-item version, subsequent 8-item (3-factor; IKDC-SKF-8; Appendix 2) and 6-item (2-factor; IKDC-SKF-6; Appendix 3) IKDC-SKF short forms (Appendices 1 and 2) were also tested with CFA and invariance procedures.
Factor Structure
The IKDC-SKF has been reported to have a unidimensional32,37 and a multidimensional1,8,33 factor structure with different items in the final models. Study methodology differences may contribute to the inconsistent findings as differences in samples (e.g., size, respondent population [e.g., healthy,12 ACL injury,1,34]) and analysis methodologies (e.g., EFA/CFA,33 PCA,32 Bayesian SEM,1 Rasch,12,34 factor and item retention criteria) exist between studies. For example, our study included a large, heterogenous sample of patients who had undergone arthroscopic knee surgeries, while others have included smaller samples, healthy respondents,12 or focused on different patient populations (e.g., ACL reconstruction,1,34 meniscal lesions,45 various patient pathologies32,33.37). Research1,8,33 using more contemporary and recommended factor analysis measurement techniques has generally supported a multidimensional structure; however, evidence exists to suggest structural validity and model fit could be improved with further item reduction.8,12,34
Three-dimensional and two-dimensional factor structures that exceeded most recommended contemporary fit criteria were identified.20,41 The retained solutions included fewer items than those found in prior research on the IKDC-SKF.1,8,33 Removing items with poor fit (e.g., cross-loadings, item redundancy) or poor design (e.g., item structure, item reading level, etc.) can improve internal consistency and scale structure, while also reducing response burden with a more concise instrument.20,38,43 The proposed short form versions improved model precision and scale structure without losing much of the information captured with all 18 items. Despite using nine or fewer items, scores on the short form versions accounted for 79% (r = .889), 84% (r = .919), and 85% (r = .924) of the variance in scores on the 18-item original IKDC-SKF with the 6-, 8-, and 9-items, respectively. Our correlational findings are in line with prior research using similar procedures to produce short-form versions of previously established PROMs43,44 and indicate the newly proposed models capture similar enough information to warrant use in comparison to the original scale. One concern, however, was the identification of an internal consistency value (0.76) for the ADLs construct that is outside of the preferred 0.80 to 0.90 range for scale development.15,17 Thus, the ADLs construct may not have the preferred precision for assessing the construct; however, scale design recommendations caution against using constructs with fewer than three items.20,21,46 Future work may be recommended to alter or add items to measure the ADLs construct more precisely. Until that time, researchers and clinicians should be aware that this construct does not meet the strictest contemporary recommendations for internal consistency.
Multi-group Invariance Testing
This study is the first to perform multi-group invariance testing with the IKDC-SKF. Multi-group invariance testing helps to ensure the association between the items and dimensions are stable between groups, which supports scale validity and allows for an instrument to be used to assess group differences (e.g., group mean differences in older individuals compared to younger individuals would be outside scale measurement error).20,21,25 Both the 9-item and 6-item IKDC-SKF short form versions in this study were found to be invariant across configural, metric, and scalar models for sex and age groups, indicating the short form models have sound measurement properties across the tested groups. Thus, researchers and clinicians could use these versions of the scale to assess differences among these groups. The findings also allow for substantive testing of whether variances or means are equal between groups, which can also support scale validity.20,21
The current multi-group findings could provide theoretical support for the validity of the two short form versions if the findings align with expectations based on the literature. For example, widespread pain, which often includes long-standing knee pain, is more common in patients over 50 years of age,47 and self-reported knee pain has been found to be higher in people over the age of 40 compared to those under the age of 40.48 Additionally, OA, a leading diagnosis and cause of disability in older populations,49,50 has a higher prevalence and more radiographic signs with increases in age and population longevity,51 with those over the age of 45 accounting for over 98% of total knee arthroplasties.52 Further, the presence of all types of knee abnormalities (e.g., osteophytes, cartilage damage, ligamentous damage, OA) has been found to increase with age,53 and it has been reported that 85% or more of patients 50 years of age or older demonstrate articular cartilage changes to at least one knee compartment compared to 32% of patients between the ages of 20 to 29 years of age and 13% of patients 20 years or younger.54 Researchers have also indicated knee functional difficulties increase with age.55–58 Gradual functional deterioration was found across the lifespan on the Knee Injury and Osteoarthritis Outcome Score (KOOS) and reported functional impairment was more apparent with functional tasks of greater difficulty (e.g., sport and recreational functional activities) across adults aged 18-84.59 Similarly, Baldwin et al. reported group mean score differences across age groups (e.g., 18 to 29 years, 30 to 39 years, 40 to 49 years, 50 to 59 years), with consistent findings of less knee impairment on the pain, ADL, sport/recreation, and quality of life constructs of the KOOS for those under 40 years of age.48 Thus, if the proposed IKDC-SKF short forms are measuring the intended constructs, it would be expected to find similar patterns in our multi-group invariance results (e.g., higher levels of impairment in the older age groups in our sample).
The proposed models were able to identify age group differences at initial examination: statistically significant group differences were found across age groups for the 6-item and 9-item solutions. Specifically, the middle age (41-65 years of age) group had lower means (i.e., more impaired knee health) than the other younger age groups (i.e., early adult, emerging adult, youth) at baseline (i.e., pre-arthroscopic surgery). Additionally, the early adulthood (26-40 years of age) group had a lower mean score for the ADL construct than the youth and emerging adult groups. The middle age group reporting greater impairment across all three factors (i.e., ADLs, Activity Level, Athletic Activity) than the three younger age groups align with expectations based on KOOS findings48,59 and expectations for functional impairment across the life span.48,55–59 Similarly, the early adult age group reporting greater impairment in the Activity Level factor but similar mean scores for ADLs with younger age groups also aligns with expectations based on the literature.48,55–59 The lack of a statistically significant difference for the Athletic Activity construct might be explained by the final items included in that construct; however, prior KOOS Sport/Recreation construct findings indicate meaningful age group differences were not found until after 40 years of age for items assessing this type of construct.48,59 Finally, statistically significant differences were not found between the emerging adult and youth groups, which aligns with prior KOOS findings48,59 and expectations for the presence of knee pathological changes being less likely to have occurred this early in the lifespan.51,53,54 A weakness of our results, however, is the low number of responses (n = 30) in the older adult (66 years of age or older) group, which prevented us from including this group in the multi-group analyses of the 6- and 9-item short form versions. It would be valuable to confirm that similar group differences are found in older or elderly populations. Similarly, it would be valuable to conduct this analysis across different pathology groups (e.g., total knee arthroplasty patients vs. arthroscopy patients) to ensure the scale has the necessary measurement properties to assess groups differences based on pathology and if greater levels of diagnosed pathology results in greater reported knee health impairment on the proposed short forms.
Longitudinal Invariance Testing
To the authors knowledge, this study is also the first to assess longitudinal invariance of the IKDC-SKF or proposed short forms. Longitudinal invariance testing is valuable because it allows for the determination of whether the items and dimensions are stable across time, which supports scale validity and allows for an instrument to be used to assess change over time.20,21,25 We found that the 6-item IKDC-SKF short form was invariant across time based on the configural, metric, and scalar model findings. The 9-item IKDC-SKF short form was not invariant across time, and further analysis revealed item #15 exhibited bias. Follow-up analysis indicated that the remaining items (i.e., 9-item IKDC-SKF short form except for item #15; 8-item IKDC-SKF short form) and factor structure were invariant across time. The findings allow for the assessment of score change over time to determine when and where patient reported improvement occurred following arthroscopic surgery on the 6- and 8-item short form versions. Finding expected improvement over time would support scale validity, while also indicating whether patients perceived improvements in their condition across time.20,21,25
In this study, individuals reported the lowest scores (i.e., greatest impairment in knee health) at baseline and the highest scores (i.e., lowest knee impairment) at 12-months post-surgery. The score improvements were statistically significant for all latent means (e.g., Activity Level, ADLs) at 3-, 6-, and 12-months post-surgery except Activity Level at three months post-surgery across the 6- and 8-item versions of the scale. The current findings indicate patients reported statistically significant improvements across all dimensions six months post-surgery and the improvements were maintained at 12-months post-surgery. The longitudinal findings were consistent across both the 6- and 8-item short form versions of the IKDC-SKF. Thus, the two versions of the scale identified patient-reported improvement across the measured latent constructs across time similarly.
The current findings support scale validity as the results are consistent with what we would expect for individuals recovering from surgery. Specifically, patients in the rehabilitation process would be expected to report improvements across items intended to measure how the prior injury impaired the previously measured constructs (i.e., Activity Levels, ADLs, and Athletic Activities) because the patient should experience health status improvement (e.g., decreased pain, increased ROM, increased strength) after surgery through a combination of treatment effectiveness, natural healing, and placebo. We would also expect to find that improvements in certain constructs (e.g., Activity Level) might not occur as quickly as other constructs (e.g., ADLs) because patients may have activity/rehabilitation restrictions or more substantial pathology that may slow improvements in specific constructs; however, the authors would also expect to then see significant improvements in those dimensions at later time points that are in line with the improvements found across the other constructs over time.
Thus, the current findings support the use of the 6- and 8-item IKDC-SKF short-form versions: sound measurement properties were demonstrated and theoretical support (e.g., patient-perceived improvements match expectations for the recovery process of the included patients in our study) was found. These results also provide support for clinicians who want to use the short form versions of the IKDC-SKF (i.e., 6- and 8-item versions) to measure change across time. The 9-item short form could be used with caution to assess change across time because it did not meet the strictest criterion for longitudinal invariance due to one problematic item. However, it is also important to note limitations with the 3-dimensional solutions: 1) parallel analysis better supported a 2-dimensional factor structure once problematic items had been removed; and 2) the Athletic Activities factor in the 8-item IKDC-SKF short form only contained two items and three to five items per factor has been recommended.20,46,60 Clinicians and researchers should consider summary of findings when deciding which version of the IKDC-SKF to use within their clinical practice or research; however, the 6-item IKDC-SKF likely has the greatest measurement support for its use across various research and clinical practice scenarios. Further scale development work is needed to develop items to accurately capture the desired information of the Athletic Activities factor and truly support a 3-dimensional IKDC-SKF factor structure.
Limitations and Future Research
While the current study included the use of contemporary analysis procedures on a large, diverse sample of patients, it does have limitations. First, the data set did not include information on the type of knee pathology or procedure performed. One of the preconditions for a viable IKDC-SKF instrument is that the model is stable over a variety of knee pathologies. Without the relevant demographic information, we were unable to conduct multi-group invariance tests by pathology or intervention type. Additionally, our sample had a small sample of patients classified in the older age (66 years or older) group, which prevented their inclusion in the multi-group age analysis. Further, responses to all 18-items were used to produce short-form versions; while the analysis processes used are common for instrument refinement, it is possible that respondents were influenced by items not included in the final models. Additional psychometric analyses could also be conducted; for example, the new models could be tested against a criterion standard scale to support validity, Rasch analysis (e.g., person differentiation) could be performed, and responsiveness (e.g., minimal clinically important difference [MCID] values) and test-retest reliability of the new models could be assessed. Finally, as this PROM was delivered via email, potential response biases could have affected results and it was not possible to examine if completion mode (i.e., paper or electronic) influenced results.
Future analysis should include multi-group invariance testing in older populations, while also examining the multi-group invariance properties across pathology or intervention groups. Further, researchers should examine the structural validity of the scale in different respondent groups who only answer the short form versions of the scale, while also incorporating additional items to measure the Athletic Activities factor more effectively. In addition to confirming the measurement properties of the short form versions, these analyses could provide insight into whether the 6-item short form may have other psychometric concerns (e.g., ceiling effects) when used in certain populations (e.g., competitive athletes) that could be resolved by developing an effective 3-dimensional scale. Finally, future research should also work to establish the test-retest reliability, responsiveness (e.g., MCIDs), and criterion validity of the short form versions.
CONCLUSION
The EFA and CFA resulted in short form versions of the IKDC-SKF that exceed contemporary fit recommendations. The identified models present as plausible alternatives to the IKDC-SKF as the original item pool was reduced by more than 50%, but the short forms still accounted for most of the variance in participant responses on the IKDC-SKF. Further, the 6- and 8-item IKDC-SKF short forms met all criteria for applied multi-group and longitudinal invariance tests, which indicates the scales may be used to assess group differences or change across time. The overall analysis indicated the short form versions of the IKDC-SKF were structurally valid alternatives to the IKDC-SKF with improved measurement properties, reduced scale response burden, and evidence to support the assessment of patient improvement across time.
FINANCIAL DISCLOSURE
This publication was supported by an Institutional Development Award (IDeA) from the National Institute of General Medical Sciences of the National Institutes of Health under Grant #P20GM103408 and an Idaho WWAMI Research Training Support Award.
CONFLICT OF INTEREST
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.