III. Detecting treatment effects in clinical trials with different indices of pain intensity derived from ecological momentary assessment
Introduction
- Dworkin RH
- Turk DC
- Farrar JT
- Haythornthwaite JA
- Jensen MP
- Katz NP
- Kerns RD
- Stucki G
- Allen RR
- Bellamy N
- Carr DB
- Chandler J
- Cowan P
- Dionne R
- Galer BS
- Hertz S
- Jadad AR
- Kramer LD
- Manning DC
- Martin S
- McCormick CG
- McDermott MP
- McGrath P
- Quessy S
- Rappaport BA
- Robbins W
- Robinson JP
- Rothman M
- Royal MA
- Simon L
- Stauffer JW
- Stein W
- Tollett J
- Wernicke J
- Witter J
Although a substantial body of research has been devoted to determining reliable and valid methods of self-reported pain assessment,
an important question is whether and how the information obtained from pain intensity measures could be improved to enhance detection of treatment effects.
,
,
,
,
The overall amount of pain (typically conceptualized as the average pain level over a day or week) has served as the most common pain intensity outcome in many clinical trials. However, an exclusive focus on patients’ average pain level may miss important effects of treatment on patients’ pain experiences in daily life. For example, the Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT) suggests that measures of the temporal aspects of pain, including the duration of pain states and the variability in pain intensity have not received adequate attention as clinical trial outcomes.
- Dworkin RH
- Turk DC
- Farrar JT
- Haythornthwaite JA
- Jensen MP
- Katz NP
- Kerns RD
- Stucki G
- Allen RR
- Bellamy N
- Carr DB
- Chandler J
- Cowan P
- Dionne R
- Galer BS
- Hertz S
- Jadad AR
- Kramer LD
- Manning DC
- Martin S
- McCormick CG
- McDermott MP
- McGrath P
- Quessy S
- Rappaport BA
- Robbins W
- Robinson JP
- Rothman M
- Royal MA
- Simon L
- Stauffer JW
- Stein W
- Tollett J
- Wernicke J
- Witter J
In addition, the United States Food and Drug Administration (FDA) draft guidance on analgesic drug development recommends assessing patients’ worst pain levels rather than changes in average pain as primary outcome in trials for analgesic medications. To date, however, these calls for alternative pain intensity measures are not supported by a systematic evidence base.
and test whether the new measures are sensitive to detecting treatment effects. The use of EMA as a method to collect outcomes in clinical trials has not been widely considered, but its potential benefits rest on two central aspects. First, momentary measurement reduces recall biases and error, capturing patients’ current pain and not their retrospective memories or beliefs about their pain.
,
Second, ratings of momentary pain are obtained multiple times per day. This makes it possible to obtain a range of different pain summary measures that can serve as endpoints in clinical trials.
,
,
- Dworkin RH
- Turk DC
- Farrar JT
- Haythornthwaite JA
- Jensen MP
- Katz NP
- Kerns RD
- Stucki G
- Allen RR
- Bellamy N
- Carr DB
- Chandler J
- Cowan P
- Dionne R
- Galer BS
- Hertz S
- Jadad AR
- Kramer LD
- Manning DC
- Martin S
- McCormick CG
- McDermott MP
- McGrath P
- Quessy S
- Rappaport BA
- Robbins W
- Robinson JP
- Rothman M
- Royal MA
- Simon L
- Stauffer JW
- Stein W
- Tollett J
- Wernicke J
- Witter J
,
, The amount of time patients spent either in low pain or in high pain considers alternative indices of potential importance that have garnered much lesser attention and these emphasize the frequency and duration of pain at low or high levels. The available evidence suggests that pain frequency represents a distinctive feature of pain,
- Dworkin RH
- Turk DC
- Farrar JT
- Haythornthwaite JA
- Jensen MP
- Katz NP
- Kerns RD
- Stucki G
- Allen RR
- Bellamy N
- Carr DB
- Chandler J
- Cowan P
- Dionne R
- Galer BS
- Hertz S
- Jadad AR
- Kramer LD
- Manning DC
- Martin S
- McCormick CG
- McDermott MP
- McGrath P
- Quessy S
- Rappaport BA
- Robbins W
- Robinson JP
- Rothman M
- Royal MA
- Simon L
- Stauffer JW
- Stein W
- Tollett J
- Wernicke J
- Witter J
,
,
,
and our previous work suggests that an index of pain frequency may be especially sensitive to change due to treatment.
Additionally, we explore indices capturing the amount of variability in pain
,
,
and diurnal aspects (pain after wake-up)
of patients’ pain intensity to examine whether they can provide useful information on treatment efficacy.
,
Here, we re-examine two large analgesic clinical trials for the treatment of fibromyalgia to evaluate the utility of different pain indices for detecting intervention effects. In addition, we examine the extent to which changes in the different pain indices relate to Patient Global Impressions of Change (PGIC) over the treatment period, given that PGIC ratings are commonly used as anchors for evaluating clinically significant change.
,
,
To the extent that alternative pain indices relate to PGIC above changes in “average” pain, they may contribute additional information on intervention effects that patients perceive as clinically meaningful.
Methods
Study design and sample
,
; a Cochrane systematic review presents summaries of the original trial results and methodology.
Briefly, and most relevant to the present analyses, both studies involved the comparison of two doses of milnacipran – 100 or 200 milligram per day – to a placebo control group, in adults between 18 and 70 years of age who met the American College of Rheumatology criteria for fibromyalgia. After an initial medication washout period of 1 to 4 weeks, patients entered a pre-treatment assessment period of 2 weeks during which they received training in the use of an electronic diary for EMA and daily data collection. To continue in the study, participants needed to complete at least 70% of the random EMA prompts during the pre-treatment period. They were then randomized and entered a 3-week dose escalation period, followed by a period of stable dose treatment that lasted 24 weeks (in study 1;
) or 26 weeks (in study 2;
). Primary end-points in the original reports included daily 24-hour recall pain ratings and questionnaires of patient global impression of change and physical functioning. To conduct the present analyses using momentary pain ratings, we obtained electronic copies of the de-identified primary data files and annotated case report forms from the sponsor. The University of Southern California Institutional Review Board approved the secondary analysis project.
Participants in study 1 were on average 49.4 (SD = 10.7) years of age, predominantly female (95.6%) and White (93.6%), with an average duration of fibromyalgia of 5.7 (SD = 5.4) years. Of 1639 patients screened, 888 were randomized into the 100 mg/day (n = 224), 200 mg/day (n = 441), or placebo (n = 223) groups. Of those who were randomized, 57.1% (100 mg/day group), 54.2% (200 mg/day group), and 65% (placebo group) completed the 27-week treatment period.
Participants in study 2 had a mean age of 50.2 (SD = 10.6) years, were 96.2% female and 93.4% White, with an average disease duration of 9.7 (SD = 8.2) years. Of 2270 patients who were screened, 1207 were initially randomized but 11 were withdrawn from the analysis sample, leaving an intent-to-treat sample of n = 1196 assigned to the 100 mg/day (n = 399), 200 mg/day (n = 396), or placebo control (n = 401) groups. Of those, 39.1% (100 mg/day group), 35.4% (200 mg/day group), and 40.4% (placebo group) completed the full 29-week period.
Collection of Patient Global Impression of Change (PGIC)
PGIC scores were assessed at with a single question administered at treatment weeks 3, 7, 11, 15, 19, 23, and 27 after randomization. The question was worded “Since the start of the study, overall my fibromyalgia is”, with categorical response options 1= very much improved, 2 = much improved, 3 = minimally improved, 4 = no change, 5=minimally worse, 6 = much worse, 7 = very much worse.
Collection of momentary pain intensity ratings
Momentary pain intensity ratings were collected on an electronic diary with proprietary software (invivodata, inc., Scotts Valley, CA) in the morning after participants woke up, in response to approximately 3-4 random prompts throughout the day, and in the evening (at about 8 PM), for a total of approximately 5-6 measurement time-points per day. EMA pain ratings were assessed each day over the full course of the baseline and treatment period. At each EMA measurement occasion, participants were asked to “rate your current level of pain” using a visual analogue scale with response anchors 0 = no pain to 100 = worst possible pain.
Construction of summary measures (“pain intensity indices”) from momentary pain data
,
and because creating the measures for shorter (e.g., 24-hr) periods would likely have generated unreliable indices that would have needed to be aggregated (e.g., averaged) over multiple days (e.g., a week).

Figure 1Illustration of indices derived from momentary pain ratings, using data from a single participant at the baseline week.
To obtain an indicator of Pain Variability, we calculated the intraindividual standard deviation (SD) of patients’ momentary pain ratings over the week, in line with previous studies on pain variability.
,
,
The intraindividual SD is arguably the most commonly used measure of variability.
It directly reflects the magnitude of fluctuations in pain regardless of their temporal ordering, and therefore can be readily computed if pain ratings are unequally spaced (as is the case in EMA protocols). Compared to other variability indices (e.g., mean squared successive differences, autocorrelation), the intraindividual SD has also been shown to require relatively fewer repeated observations to be reliably captured.
An index representing the amount of Time in High Pain was derived by calculating the percentage of momentary pain ratings that were at a level of ≥ 75 on the 0-100 scale for each patient and week, and a corresponding index of Time in Low Pain was calculated as the percentage of momentary ratings ≤ 34 on the 0-100 scale. The thresholds for high (≥ 75) and low pain (≤ 34) were selected based on previous work that has established cut-off points for severe and mild pain on the visual analogue scale for patients with chronic musculoskeletal pain.
It should be noted that whereas the indices of maximum and minimum pain capture the worst (or least) intensity of pain, the measures of time in high (or low) pain are intended to capture the frequency of pain experiences that could be characterized as severe (or mild). Finally, a measure of the Average Pain After Wake-up was constructed by averaging the first momentary pain rating of each day (i.e., selecting only the pain ratings after wake-up) across the 7 days for each patient; morning pain has been shown to be a hallmark feature in patients with fibromyalgia.
,
,
All pain intensity indices were represented as continuous variables. Increasing values conceptually reflect worsening of patients’ pain experience for all measures except for the percent of time in low pain, where increases reflect improvements in the pain experience.
Data analysis
Descriptive analyses
,
changes in EMA completion rates were also examined. Multilevel growth models were used for this purpose,
where the number of daily completed EMA prompts served as dependent variable, and day of study served as predictor variable, allowing for random effects (individual differences) in the (linear) time trend of completed daily prompts.
We next examined descriptive statistics (means, SDs) and correlations among the different pain indices at baseline. If some of the alternative indices were near perfectly correlated with each other, this would suggest that they do not capture different aspects of pain in daily life. Additionally, we examined the test-retest reliability of each pain index with intraclass correlation coefficients (ICCs) computed between the first and second pre-treatment weeks. The rationale was that pain indices showing a low test-retest reliability would not be good candidates for measuring the impact of treatment because the reliability of a measure sets an upper limit for its validity.
Analyses of treatment effects
The goal of the primary analyses was to examine the treatment effects obtained for each pain index and compare them with the treatment effects on Average Pain. Because some of the indices were not scaled in the same way (e.g., the SD metric of the Pain Variability index differs from indices tapping average, high, or low pain levels), we compared standardized effect sizes (ES) of the treatment effects, defined as the difference in change from baseline between an active treatment group and the placebo control group relative to the pooled standard deviation of changes from baseline in each group. The second week of the 2-week pre-treatment assessment period was selected as the baseline week because it was most proximal to treatment (alternatively, we could also have created a baseline score by averaging the scores for the first and second pre-treatment weeks for each pain index, but we decided against this because baseline assessment periods of multiple weeks are not available in all trials, limiting generalizability; this decision did not impact the results). The 3-week dose escalation period was not included in the analyses because treatment effects were likely to change during this period. Thus, the analyses considered treatment effects from baseline to each of the weeks of stable dose treatment.
such that estimating the NNT for each pain index also provided a means to evaluate the magnitude of the effects found for the indices in the context of previous findings for primary trial outcomes. The NNT is defined as the number of patients who would need to receive the active treatment in order to have one more success (or one less failure) than if treated with the placebo.
,
The NNT is based on the number of “responders” in each group and is calculated as NNT = 1/[responder rate in treatment group – responder rate in control group]. Consistent with the original trial analyses
,
and with recommendations for the reporting of core outcome measures in pain clinical trials,
- Dworkin RH
- Turk DC
- Farrar JT
- Haythornthwaite JA
- Jensen MP
- Katz NP
- Kerns RD
- Stucki G
- Allen RR
- Bellamy N
- Carr DB
- Chandler J
- Cowan P
- Dionne R
- Galer BS
- Hertz S
- Jadad AR
- Kramer LD
- Manning DC
- Martin S
- McCormick CG
- McDermott MP
- McGrath P
- Quessy S
- Rappaport BA
- Robbins W
- Robinson JP
- Rothman M
- Royal MA
- Simon L
- Stauffer JW
- Stein W
- Tollett J
- Wernicke J
- Witter J
we defined responders as those obtaining reductions from baseline of at least 30%, separately for each pain index. To obtain a pooled average NNT across all treatment weeks for each pain index, we used generalized estimating equations (GEE) in which each individual’s weekly responder status was the binary dependent variable and in which treatment group, week (effect coded), and the treatment by week interaction served as independent variables (the resulting log odds from this model were transformed into NNTs using the formula described above).
Analyses of Patient Global Impression of Change
A first set of analyses examined each pain index individually as a predictor in separate models. To evaluate whether any of the alternative pain indices showed an incremental contribution to understanding PGIC over the Average Pain index, hierarchical ordinal logistic regressions were estimated in which changes in the Average Pain index were controlled in a first step to obtain incremental effects (partial odds ratios and changes in pseudo R2) of an alternative pain index entered in the second step.
Missing data
Analyses were conducted using SAS version 8.4 (Cary, NJ) and Mplus version 8.1.
to evaluate the extent to which the results (i.e., the estimated treatment effects) would be impacted by violations of the MAR assumption.
,
An ISNI analysis was performed for all parameters of the repeated measures ANOVA models, taking into account both intermittent nonresponses and dropout from the study (for details, see Xie et al.
). The potential impact of nonignorable missingness on the parameter estimates was evaluated using the c statistic, where small values of c
The c statistic exceeded the critical value of 1.0 for all treatment effect parameters of both studies and pain indices (cs > 4.90 in study 1 and cs > 3.05 in study 2), suggesting that nonignorable missingness would have little impact on the results and that analyses assuming MAR carried little risk of bias.
Discussion
We emphasize that the Average Pain index assessed with EMA should not be viewed as interchangeable with commonly used measures of average pain that are based on retrospective (e.g., 7-day recall) reports.
,
Whereas indices derived from EMA may be deemed more ecologically valid, it is possible that recall reports of average pain capture additional information of clinical relevance for understanding treatment effects, which was not investigated here.
synthesized the results from 23 pharmacological treatment studies that reported effect sizes for both average pain and worst pain reports (using 24-hr or 1-week recall items) across various chronic pain conditions and found no significant difference in efficacy estimates obtained from average and worst pain ratings. Notably, the difference in effect sizes between average and worst pain outcomes was very small at d = 0.02, nearly identical to the present results. Similarly, a previously reported pooled analysis of 4 clinical trials with fibromyalgia patients testing the effects of duloxetine showed very similar treatment effect estimates from 24-hour recall ratings of average, worst, and least pain.
This, in turn, may have clinical consequences because global retrospective impressions have been shown to play a central role in patient decision making and patients’ willingness to continue a therapeutic regimen.
,
It is not specifically designed to reduce momentary fluctuations in patients’ pain levels (as would be the case for rescue medications) or to specifically target states of severe pain, which may explain the small effect sizes for these two indices. The notion that Pain Variability captures meaningful information in the context of clinical trial outcomes is supported by the finding that changes in Pain Variability explained incremental variance in PGIC ratings beyond Average Pain, which suggests that impressions of improvement as perceived by patients could be further augmented by treatments that target reductions in pain variability.
This study has several strengths. First, the number of EMA pain intensity ratings that were the basis for the present analyses was substantial. Patients collected momentary pain data for several months and contributed over 1 million EMA pain intensity ratings over the course of the active treatment period, providing insights into the reproducibility of results across repeated assessments and across two separate studies. Second, both studies used a stringent clinical trial design, in that both were randomized, double-blind, placebo-controlled, parallel group studies. Third, using EMA ratings as the basis of the present analysis made it possible to evaluate and compare treatment effects obtained for a relatively broad range of outcome measures characterizing different aspects of pain.
As these treatments might target different aspects of patients’ pain experience, the tailored use of a given pain index that can best reflect the therapeutically desired outcome could be a fruitful strategy for future research. Finally, the present research focused on pain outcome measures that capture basic distributional characteristics of real-time pain experiences. Additional measures capturing temporal dynamics of pain can be derived from EMA but these were not considered here because they require more specialized analyses. For example, novel applications of time-series analyses have been shown to capture unique temporal features of pain intensity, including the persistence (e.g., autocorrelation) of pain states and the amplitude of shifts between elevated and reduced pain states, which may provide important avenues for future efforts to optimize the detection of efficacious treatments.
,
,
- Dworkin RH
- Turk DC
- Peirce-Sandner S
- Burke LB
- Farrar JT
- Gilron I
- Jensen MP
- Katz NP
- Raja SN
- Rappaport BA
- Rowbotham MC
- Backonja MM
- Baron R
- Bellamy N
- Bhagwagar Z
- Costello A
- Cowan P
- Fang WC
- Hertz S
- Jay GW
- Junor R
- Kerns RD
- Kerwin R
- Kopecky EA
- Lissin D
- Malamut R
- Markman JD
- McDermott MP
- Munera C
- Porter L
- Rauschkolb C
- Rice ASC
- Sampaio C
- Skljarevski V
- Sommerville K
- Stacey BR
- Steigerwald I
- Tobias J
- Trentacosti AM
- Wasan AD
- Wells GA
- Williams J
- Witter J
- Ziegler D
Supporting this hypothesis, Farrar et al.
found that pain variability assessed from 7 daily diaries at the baseline period moderated observed treatment effect sizes, in that higher baseline pain variability was associated with a greater likelihood of treatment response in placebo-control groups but not in active medication-treated groups. It is possible that other aspects of baseline pain such as those used in this study could yield similar moderating effects or that baseline levels of different pain indices could be useful for treatment tailoring: these are open questions for future research.
Conclusions
Alternative summary measures of pain intensity derived from EMA have the potential to broaden the scope of outcome measures that could be useful as endpoints in pain clinical trials and may contribute to detecting changes from treatment that are deemed relevant by patients. It is important for our findings to be extended to other chronic pain diagnoses and treatment approaches. Comparative effectiveness trials may especially benefit from including multiple summary measures of pain intensity to determine whether different types of treatment affect different aspects of pain intensity.
Series Concluding Remarks
Our primary goal in this series of three papers was to determine from several perspectives the characteristics of different indices of pain intensity that can be derived from EMA protocols in addition to the traditional assessment of Average Pain. We argued throughout the series that these indices provide a new viewpoint on patients’ experience of pain. We were pleased to find that it was possible to reliably measure various temporal aspects of pain intensity with momentary assessments. Perhaps most strikingly, no single pain index emerged consistently as “superior” for understanding patients’ pain experience. Instead, the results indicated that the validity and potential usefulness of different pain indices depend on the context in which they were evaluated: whether they were preferred by stakeholders, whether they enhanced understanding of patient functioning, or whether they indicated treatment effects in clinical trials. We view this variation in pain index performance as pertinent information for informing decisions about the selection of pain intensity measures for pain research and clinical practice.
Nevertheless, we hope the series makes a compelling case that broadening the scope of pain intensity measures with indices based on momentary data is useful while at the same time acknowledging that many decisions need to be made about which aspects of pain are most pertinent to measure in different contexts.
The three studies in the series each offered a unique perspective on the indices. Results from the stakeholder opinions, empirical associations with functional outcomes, and treatment detection do not squarely align in a manner that unambiguously leads one to select a single pain index. Although maybe not all of the pain indices might be compelling as primary pain outcomes, they could serve to supplement primary measures and expand our understanding of the pain experience. Finally, we are not in a position to recommend a formula for integrating the different sources of information about the indices as it will undoubtedly vary by the reason for collecting the data. Choosing the “right” combination of pain intensity outcomes will demand much thoughtful effort from researchers and clinicians. Our hope is that this series will generate heightened interest in the measurement of alternative indices of pain intensity that can be derived from EMA to ultimately facilitate evidence-based decision making regarding the most suitable measures of pain intensity for research and practice.
Comments are closed.