INTRODUCTION
A concussion can provoke changes in pupil size and the pupillary light response (PLR). These subtle, yet significant, changes have led healthcare providers to measure neurological injury with a pupillometer.1–4 Pupillometers can provide insight regarding the location of neurological lesions and predict recovery trajectory following traumatic brain injury.5 The PLR is both a response and a visual reflex to the level of light sensed in the environment and serves as an accessible marker of the autonomic nervous system.6,7 The PLR provides a comprehensive manner to assess sympathetic and parasympathetic function.7 Specifically, the sympathetic pathway controls the eye muscles responsible for pupil dilation, while the parasympathetic pathway controls the eye muscles responsible for pupil constriction.8 The parasympathetic system causes eye constriction when a light stimulus is applied, and the sympathetic system causes eye dilation to the baseline state when the light stimulus is removed. Studying the static and dynamic properties of the PLR has emerged as an attractive field of interest given its ease of access, non-invasiveness, and insight into numerous neurological disorders and physiological states.6,9–11
Historically, healthcare providers have used penlights to assess pupil symmetry and PLR. Concerns related to the use of penlights are low inter-rater reliability, higher error rates in prognosis, and the reduced ability to monitor the recovery of the PLR.2,5,6 Automated pupillometer systems have been developed and shown to be more accurate and reliable for examining the PLR.2,12 These devices are superior to manual observation because of their ability to monitor intracranial pressure, provide a prognosis following concussion, and assess cognitive load.1,6,11,13–16 At a minimum, automated devices provide readouts on the static (e.g., minimum and maximum pupil diameter) and dynamic (pupil constriction and dilation velocity) parameters.15 More advanced devices, such as the NeurOptics Neurologic Pupil index (NPi)-200 and NPi-300, provide the NPi with a calculation that incorporates pupil size, constriction latency, constriction velocity, and dilation velocity. These devices are useful because they can compare scores to a normative database. More specifically, they provide a score range of 0-5 points, in which a score less than 3 points indicates abnormal pupil function.17 The NeurOptics PLR-3000 (NO3000) better characterizes the PLR response as it provides the time from peak pupil constriction size to 75% of its baseline size, commonly known as the T-75 recovery time parameter (T75). Users of earlier models like NPi-300 could not obtain this parameter and had to extrapolate the graphical data to calculate T75.10
The T75 represents the sympathetic drive behind the dilation phase and is influenced by the amplitude of the light reflex.18 The larger the percent change from baseline to maximum dilation size results in more time needed for the pupil to constrict and return to baseline. Researchers have reported longer T75 times in children with mild concussions and athletes with sport-related concussions compared to controls.14,19 While the T75 can discriminate between concussed and healthy groups, its reliability has not been extensively examined.14,19
Establishing the psychometric properties of commonly used PLR systems is important to ensure they appropriately acquire meaningful information that aids in diagnosis, clinical prognosis, and research. In addition to concussions and traumatic brain injuries, pupillometers show promise in better understanding different neurological and chronic diseases such as Parkinson’s and Alzheimer’s.4,6,9 As these systems, particularly the hand-held automatic pupillometers, are increasingly integrated into common healthcare settings, the importance of verifying a device’s robustness is critical.4,9,17,20 Importantly, devices have inherent differences due to their design, which may introduce measurement variability.4 NO3000 has established inter-trial reliability, but inter-rater reliability has not been established.21 Therefore, the purpose of our study was to establish the inter-rater reliability and confirm the inter-trial reliability of the NO3000 pupillometer among healthy adults. The authors hypothesized that inter-rater and inter-trial reliability for all PLR measures would have intraclass correlation coefficients (ICC) greater than or equal to 0.70, which has been deemed as acceptable reliability.22
METHODS
Participants
Before subject recruitment, we conducted an a priori power analysis. Based on a minimum acceptable ICC of 0.70 and expected reliability ICC of 0.86, a two-tailed significance of alpha=0.05, a power of 80%, and two raters, at least 39 subjects would be needed.22 Fourty-eight healthy adults were recruited from Augusta University via word of mouth and email advertisement (25 males, age = 25.0 + 4.7 y; 23 females, age = 25.3 + 6.4 y). Eligible participants were between the ages 18-40 and did not have a history of known neurological injury (including stroke, traumatic brain injury, concussion), cognitive impairment, neurodegenerative disorders, migraine headache diagnosis, seizure disorder, blindness, dysautonomia/postural orthostatic tachycardia syndrome, and history of eye surgery/amblyopia/strabismus or other congenital eye disorders that could alter pupil response before measurement. This age range of 18 to 40 years was chosen because differences in PLR differ between pediatric and adult cohorts.19,23 Also, pupil sizes tend to decrease after the fourth decade of life.19,23 Individuals who could not provide accurate measurements due to repeated blinking throughout data collection also were excluded. All subjects signed an institutional-approved informed consent form prior to participation.
Procedures
Procedures were developed in accordance with the Quality Appraisal of Diagnostic Reliability (QAREL) Checklist (Appendix A).24,25 Both the investigator and participant were seated in identical 18" tall chairs across from one another at a standardized table in an environment with fluorescent lighting. We asked subjects to focus their non-measured eye on a fixed point located 2 meters away from them to avoid accommodation of the eye being measured. The pupillometer (NeurOptics PLR-3000, Irvine, CA), which operates in a monocular manner, was then placed against the eye (Figure 1). We used settings that were identical to those described by Asakawa et al.21 Settings included a positive pulse stimulus, light stimulus pulse intensity of 10 uW, and background intensity of 0 uW. The measurement duration was 5.01 s, the pulse duration was 0.80 s, and the pulse onset was immediate (0 s) to stimulate the PLR. Subjects remained as still as possible and refrained from blinking during the 5-s measurement period. The investigator recorded all PLR measurements (initial pupil diameter [INITIAL], end pupil diameter [END], % change [DELTA], constriction latency [LATENCY], average constriction velocity [ACV], maximum constriction velocity [MCV], average dilation velocity [ADV], and T75). The subjects rested between 30 seconds and one minute before the investigator measured the other eye. The investigators took three trials for the right eye and three trials for the left eye. Subjects rested one to two minutes before a second investigator repeated the same measurements. Raters examined subjects and recorded values independent of each other. The average of the three trials for each eye was used to determine inter-rater reliability; individual trials were used to determine inter-trial reliability.
Statistical Analysis
All analyses were conducted using IBM SPSS Statistics for Windows, Version 28 (IBM Corp, Armonk, NY, USA) with the level of significance established at the 0.05 level. Means, standard deviations, and 95% confidence intervals were calculated for all dependent measures.
Separate independent t-tests were used to compare group differences between rater 1 and rater 2. Separate ICC [2,3] and standard error of measurement (SEM) were used to determine inter-rater reliability and measurement precision.26 The minimal detectable change (MDC) was also calculated for inter-rater reliability to determine each measure’s responsiveness.27 The MDC represents the minimal amount of change exceeding the SEM, and represents a change beyond measurement error.27 Separate ICC [3,1] and SEM were used to determine inter-trial reliability and measurement precision for Rater 1 and Rater 2. ICC values <0.5 were indicative of poor, between 0.5 and 0.75 were indicative of moderate, between 0.75 and 0.9 were indicative of good, and >0.90 were indicative of excellent reliability.28
Bland-Altman plots were used to determine the similarity between each measure. For this purpose, the difference (bias) between raters and the mean score (magnitude) for the raters were plotted to provide important information regarding bias.29 Between-rater score differences that were scattered (i.e., no tendency for a score to be higher or lower) were considered unbiased. The plots also assessed for bias associated with the magnitude of a score. Bias would occur when the between-rater score differences were associated with an increase in the score magnitude.29
RESULTS
No significant differences (p > 0.05) existed between any of the measures taken by Rater 1 and Rater 2 (Table 1). For inter-rater reliability, ICC [2,3] exceeded 0.70 for all measures except for T75 (Table 2). Four of the eight measures had excellent reliability for each eye as evidenced by ICCs exceeding 0.90 (INITIAL, END, ACV, and MCV). A similar pattern of values existed (Table 3) for inter-trial reliability.
Except for T75, the Bland-Altman plots showed a random pattern between the difference and mean for each measure (Figures 2 - 4). These plots also did not show a pattern of differences increasing or decreasing as the score magnitude (mean) increased. These factors taken together suggested no bias for these measures. For T75, the Bland-Altman plots appeared less scattered, and many differences appeared to increase with greater score magnitude. This finding suggested bias for T75, especially for the right eye.
DISCUSSION
The current study was the first to examine inter-rater reliability for the eight PLR parameters obtained using the NO3000. Except for the T75, inter-trial and inter-trial reliability was moderate to excellent for all measures. ICCs for T75 were poor to moderate and did not meet the minimum acceptable ICC of 0.7, suggesting that T75 may not be a useful biomarker.
Inter-rater and Inter-trial Reliability of the NO3000
Moderate to excellent inter-rater ICCs existed when measuring seven of the eight PLR measures using the NO3000, supporting its robustness as an automated pupillometer. Equally important was acceptable inter-trial reliability, the ability for a user to repeat measures and obtain consistent results12,21 Most inter-trial ICCs were good to excellent and agreed with Asakawa et al.,21 who used the NO3000 to examine inter-trial reliability. McKay et al.30 compared measures from the NO3000 to BrightLamp, a pupillometer app, and also found strong measurement reproducibility for the NO3000. Findings from the current study further support the reliability of the NO3000.21,30 Master et al.19 have used PLR as a biomarker for identifying sport-related concussions in adolescents; having a device with acceptable reliability is critical for clinical decision-making.31
Inconsistencies in the Inter-rater and Inter-trial Reliability of T75
The current findings suggest that both the inter-rater and inter-trial reliability of T75 were poor to moderate.32 Unacceptable T75 reliability may have resulted from measurement precision. To obtain T75, subjects must keep their eyes still throughout the entire measurement period. Researchers who have examined pediatric populations have reported sources of error from movement14 and shorter stimulus durations.33 In studies analyzing T75 using the PLR-2000 or PLR-3000 models, the duration of the stimulus was 154 ms or 800 ms.14,19,34–37 This variation in stimulus duration across studies may represent a source of error contributing to unacceptable reliability.
The current findings coincided with Asakawa et al.,21 who also used the NO3000. Asakawa et al. reported poor inter-trial T75 reliability and suggested that specified device settings be used to obtain this parameter.21 They used a 180-µwatt/cm2 stimulus with an 800-ms duration that was considerably higher compared to our 30-ms duration.21 Asakawa et al. concluded that poor reliability resulted from a lack of optimal settings to accommodate the time required for the eye to reach 75% of its baseline size. The time between trials also may need to be lengthened and standardized to give the eye adequate time to recover before being re-stimulated. Yoo et al.37 recorded for a 5 s duration after initiating a 180 µwatts/cm2 stimulus for 185 ms and found significant differences in T75 and pupil diameters between healthy individuals and those with Horner Syndrome. These settings differed from the settings used in the current research of 180 µwatts/cm2 for 30 ms and suggested the light stimulus settings, particularly the duration, be increased to obtain consistent T75 data. Others14,19 that used T75 analysis in populations with concussions used pupillometer settings with a 180-µwatt/cm2 but a 154 ms stimulus duration. Findings from these studies were more consistent, which suggests the importance of a longer duration range to obtain reliable T75 data.14,19 Future investigators should pay special attention to the light stimulus intensity and duration when obtaining PLR parameters and which testing conditions are needed to optimize data collection.
Clinical Implications
Measurement reliability is critical to enhance clinical decision-making.31 It supports that a change in a parameter represents a “true” change in the behavior. Clinicians also can use the MDC values (Table 2) to determine if changes in a measure exceed the inherent measurement variability, thus representing a “true” change.27 Other previous pupillometer models from NeurOptics have established inter-trial and inter-device reliability, supporting their use in the critical care field for the evaluation of traumatic brain injury.17 Findings from the current study generally support the use of the NO3000 to assess PLR in screening settings and research applications. The NO3000 is useful in detecting concussions because there are known changes to the pupillary light response following trauma.4 Pupillometers can increase detection, especially when clinical symptoms may be lacking, and mitigate human error.3,4 With their user-friendly and portable designs, settings beyond research labs such as sports medicine or physical therapy clinics can use this device to monitor recovery progress.3,4 However, caution is required when measuring T75 due to more sources of error. Future investigations should determine the optimal settings for this measure.
Limitations
This study has limitations. Only one setting of the light stimulus and recording period was used during data collection. Having conducted trials using different light stimulus intensities and durations could have elicited ranges that accommodate and improve the reproducibility of T75. Eye dominance in our participants was not determined, thus, the authors are unable to explain the discrepancies in T75 observed in the right eye only rather than both eyes. Only the most recent version of the model was used, making the current findings only generalizable to the NO3000. A final limitation was the use of healthy subjects, which has limited generalization to clinical populations.
CONCLUSION
The inter-rater and inter-trial reliability of the NO3000 was established. All parameters, except T75, exhibited good to excellent inter-rater and inter-trial reliability. T75 had moderate inter-rater and inter-trial reliability, which likely reflected inherent challenges when obtaining this measure. The NO3000 can be used in future pupillometry studies focused on measuring static and dynamic PLR parameters, but attention and rationale regarding the stimulus settings and environment are needed to minimize measurement error. Further investigation is needed to examine if other pupillometers can reliably measure T75 using different light stimulus intensities and durations.
Conflicts of interest
The authors report no conflicts of interest.