Introduction. Recommendations for colorectal cancer screening encourage patients to choose among various screening methods based on individual preferences for benefits, risks, screening frequency, and discomfort. We devised a model to illustrate how individuals with varying tolerance for screening complications risk might decide on their preferred screening strategy. Methods. We developed a discrete-time Markov mathematical model that allowed hypothetical individuals to maximize expected lifetime utility by selecting screening method, start age, stop age, and frequency. Individuals could choose from stool-based testing every 1 to 3 years, flexible sigmoidoscopy every 1 to 20 years with annual stool-based testing, colonoscopy every 1 to 20 years, or no screening. We compared the life expectancy gained from the chosen strategy with the life expectancy available from a benchmark strategy of decennial colonoscopy. Results. For an individual at average risk of colorectal cancer who was risk neutral with respect to screening complications (and therefore was willing to undergo screening if it would actuarially increase life expectancy), the model predicted that he or she would choose colonoscopy every 10 years, from age 53 to 73 years, consistent with national guidelines. For a similar individual who was moderately averse to screening complications risk (and therefore required a greater increase in life expectancy to accept potential risks of colonoscopy), the model predicted that he or she would prefer flexible sigmoidoscopy every 12 years with annual stool-based testing, with 93% of the life expectancy benefit of decennial colonoscopy. For an individual with higher risk aversion, the model predicted that he or she would prefer 2 lifetime flexible sigmoidoscopies, 20 years apart, with 70% of the life expectancy benefit of decennial colonoscopy. Conclusion. Mathematical models may formalize how individuals with different risk attitudes choose between various guideline-recommended colorectal cancer screening strategies.
Background. Numerous studies have found that cost strongly influences patients’ decision making. The objective of this study was to explore the impact of varying cost formats on patients’ preferences. Methods. Mechanical Turk workers completed a choice-based conjoint (CBC) survey. The CBC survey was designed to examine stated preferences for the use of second-line agents to treat diabetes across 5 attributes: route of administration, efficacy, risk of low blood sugar, frequency of checking blood sugar levels, and cost. We developed 7 versions of the CBC survey that were identical except for the cost attribute. We described cost in terms of: Affordability, Monthly Co-pay, Dollar Sign Rating, How Expensive, or How Cheap compared with other medications, Working Hours Equivalent (per mo) and Percent of Monthly Income. The resulting part-worth utilities were used to calculate the relative importance of cost and to estimate treatment preferences for exenatide, a sulfonylurea, and insulin. Results. The relative impact of cost varied significantly across the 7 formats. Cost had the greatest influence on participants’ decisions when framed in terms of Affordability [mean (SD) relative importance, 37.3 (0.9)] and the lowest influence when framed in terms of How Cheap (compared with other drugs) [12.1 (0.9)]. A sulfonylurea was strongly preferred across 4 of the 7 formats. Preference for insulin, the most effective, albeit riskiest, option was low across all cost formats. Conclusions. The format used to describe cost affects how the attribute impacts patients’ preferences. Individuals are most cost-sensitive when cost is framed in terms of affordability and least cost-sensitive when cost is described in terms of how cheap the medication is compared with others.
Conventional practice within the United Kingdom and beyond is to conduct economic evaluations with "health" as evaluative space and "health maximization" as the decision-making rule. However, there is increasing recognition that this evaluative framework may not always be appropriate, and this is particularly the case within public health and social care contexts. This article presents a methodological case study designed to explore the impact of changing the evaluative space within an economic evaluation from health to capability well-being and the decision-making rule from health maximization to the maximization of sufficient capability. Capability well-being is an evaluative space grounded on Amartya Sen’s capability approach and assesses well-being based on individuals’ ability to do and be the things they value in life. Sufficient capability is an egalitarian approach to decision making that aims to ensure everyone in society achieves a normatively sufficient level of capability well-being. The case study is treatment for drug addiction, and the cost-effectiveness of 2 psychological interventions relative to usual care is assessed using data from a pilot trial. Analyses are undertaken from a health care and a government perspective. For the purpose of the study, quality-adjusted life years (measured using the EQ-5D-5L) and years of full capability equivalent and years of sufficient capability equivalent (both measured using the ICECAP-A [ICEpop CAPability measure for Adults]) are estimated. The study concludes that different evaluative spaces and decision-making rules have the potential to offer opposing treatment recommendations. The implications for policy makers are discussed.
Objectives. Comprehension of risks, benefits, and alternative treatment options has been shown to be poor among patients referred for cardiac interventions. Patients’ values and preferences are rarely explicitly sought. An increasing proportion of frail and older patients are undergoing complex cardiac surgical procedures with increased risk of both mortality and prolonged institutional care. We sought input from patients and caregivers to determine the optimal approach to decision making in this vulnerable patient population. Methods. Focus groups were held with both providers and former patients. Three focus groups were convened for Coronary Artery Bypass Graft (CABG), Valve, or CABG +Valve patients ≥ 70 y old (2-y post-op, ≤ 8-wk post-op, complicated post-op course) (n = 15). Three focus groups were convened for Intermediate Medical Care Unit (IMCU) nurses, Intensive Care Unit (ICU) nurses, surgeons, anesthesiologists and cardiac intensivists (n = 20). We used a semi-structured interview format to ask questions surrounding the informed consent process. Transcribed audio data was analyzed to develop consistent and comprehensive themes. Results. We identified 5 main themes that influence the decision making process: educational barriers, educational facilitators, patient autonomy and perceived autonomy, patient and family expectations of care, and decision making advocates. All themes were influenced by time constraints experienced in the current consent process. Patient groups expressed a desire to receive information earlier in their care to allow time to identify personal values and preferences in developing plans for treatment. Both groups strongly supported a formal approach for shared decision making with a decisional coach to provide information and facilitate communication with the care team. Conclusions. Identifying the barriers and facilitators to patient and caretaker engagement in decision making is a key step in the development of a structured, patient-centered SDM approach. Intervention early in the decision process, the use of individualized decision aids that employ graphic risk presentations, and a dedicated decisional coach were identified by patients and providers as approaches with a high potential for success. The impact of such a formalized shared decision making process in cardiac surgery on decisional quality will need to be formally assessed. Given the trend toward older and frail patients referred for complex cardiac procedures, the need for an effective shared decision making process is compelling.
Cervical cancer is the second most common cancer in women around the world, and the human papillomavirus (HPV) is universally known as the necessary agent for developing this disease. Through early detection of abnormal cells and HPV virus types, cervical cancer incidents can be reduced and disease progression prevented. We propose a finite-horizon Markov decision process model to determine the optimal screening policies for cervical cancer prevention. The optimal decision is given in terms of when and what type of screening test to be performed on a patient based on her current diagnosis, age, HPV contraction risk, and screening test results. The cost function considers the tradeoff between the cost of prevention and treatment procedures and the risk of taking no action while taking into account a cost assigned to loss of life quality in each state. We apply the model to data collected from a representative sample of 1141 affiliates at a health care provider located in Bogotá, Colombia. To track the disease incidence more effectively and avoid higher cancer rates and future costs, the optimal policies recommend more frequent colposcopies and Pap tests for women with riskier profiles.
Background. Brunswik’s Lens Model and lens model equation (LME) have been applied extensively in medical decision making. Clinicians often face the dual challenge of formulating a judgment of patient risk for some adverse outcome and making a yes or no decision regarding a particular risk-reducing treatment option. Objective. In this article, I examine the correlation between clinical risk judgments and treatment-related decisions, referring to this linkage as "cohesion". A novel form of the LME is developed to decompose cohesion. The approach is "bifocal" in that it focuses on 2 sets of linked responses from the same individual. Methods. Data from 2 studies were analyzed to illustrate how individual differences in cohesion could be explained by differences in the parameters of the bifocal lens model equation (BiLME). Results. Cohesion varied because of differences in cognitive control, similarities in the judgment and decision policies, and a possible reliance on a subjective threshold value applied to the judgments to make decisions. The parameters of the BiLME accounted for individual differences in cohesion; however, their relative influences differed between the two studies. Conclusion. The BiLME links the results from two regression models—one linear and one logistic—based on the same set of cases. In its current form, the equation holds promise for understanding cognitive individual differences that could underlie practice variation. With minor modifications, it becomes possible to apply the equation to traditional, dual-system judgment analysis studies, where continuous judgments are compared with an ecology composed of dichotomous outcomes, or vice versa. In this regard, the BiLME is quite flexible and adds to the set of tools available to judgment analysts.
Modeling of clinical-effectiveness in a cost-effectiveness analysis typically involves some form of partitioned survival or Markov decision-analytic modeling. The health states progression-free, progression and death and the transitions between them are frequently of interest. With partitioned survival, progression is not modeled directly as a state; instead, time in that state is derived from the difference in area between the overall survival and the progression-free survival curves. With Markov decision-analytic modeling, a priori assumptions are often made with regard to the transitions rather than using the individual patient data directly to model them. This article compares a multi-state modeling survival regression approach to these two common methods. As a case study, we use a trial comparing rituximab in combination with fludarabine and cyclophosphamide v. fludarabine and cyclophosphamide alone for the first-line treatment of chronic lymphocytic leukemia. We calculated mean Life Years and QALYs that involved extrapolation of survival outcomes in the trial. We adapted an existing multi-state modeling approach to incorporate parametric distributions for transition hazards, to allow extrapolation. The comparison showed that, due to the different assumptions used in the different approaches, a discrepancy in results was evident. The partitioned survival and Markov decision-analytic modeling deemed the treatment cost-effective with ICERs of just over £16,000 and £13,000, respectively. However, the results with the multi-state modeling were less conclusive, with an ICER of just over £29,000. This work has illustrated that it is imperative to check whether assumptions are realistic, as different model choices can influence clinical and cost-effectiveness results.
Background. A critical component of shared decision making (SDM) is the role played by health care providers in distributing decision aids (DAs) and initiating SDM conversations. Existing literature indicates that decisions about designing and implementing DAs must take provider perspectives into account. However, little is known about how differences in provider attitudes across specialties may impact DA implementation and how provider attitudes may shift after DA implementation. Group Health’s Decision Aid Implementation project was carried out in six specialties using 12 video-based DAs for preference-sensitive conditions; this study focused on two of the six specialties. Design. In-depth, qualitative interviews with specialty care providers in two specialties—orthopedics and cardiology—at two time points during DA implementation. Data were analyzed using a thematic analysis approach. Results. We interviewed 19 care providers in orthopedics and cardiology. All respondents believed that providing patients with accurate information on their health conditions and treatment options was important and that most patients wanted an active role in decision making. However, respondents diverged in decision-making styles and views on the practicality and appropriateness of using the DAs and SDM. For example, cardiology specialists were ambivalent about DAs for coronary artery disease because many viewed DAs and SDM as unnecessary or inappropriate for this clinical condition. Provider attitudes towards DAs and SDM were generally stable over two years. Limitations. Limitations include a lack of patient perspectives, social desirability bias, and possible selection bias. Conclusions. Successfully implementing DAs in clinical practice to promote SDM requires addressing individual provider attitudes, beliefs, and knowledge of SDM by specialty. During DA development and implementation, providers should be asked for input about the specific conditions and care processes that are most appropriate for SDM.
Background. Children’s preferences for health states represent an important perspective when comparing the value of alternative health care interventions related to pediatric medicine, and are fundamental to comparative effectiveness research. However, there is debate over whether these preference data can be collected and used. Purpose. The purpose of this study was to establish psychometric properties of eliciting preferences for health states from children using direct methods. Data Sources. Ovid Medline, PsycINFO, Scopus, EconLit. Study Selection. English studies, published after 1990, were identified using Medical Subject Headings or keywords. Results were reviewed to confirm that the study was based on: 1) a sample of children, and 2) preferences for health states. Data Extraction. Standardized data collection forms were used to record the preference elicitation method used, and any reported evidence regarding the validity, reliability, or feasibility of the method. Data Synthesis. Twenty-six studies were ultimately included in the analysis. The standard gamble and time tradeoff were the most commonly reported direct preference elicitation methods. Seven studies reported validity, four reported reliability, and nine reported feasibility. Of the validity reports, construct validity was assessed most often. Reliability reports typically involved interclass correlation coefficient. For feasibility, four studies reported completion rates. Limitations. The search was limited to four databases and restricted to English studies published after 1990. Only evidence available in published studies were considered; measurement properties may have been tested in pilot or pre-studies but were not published, and are not included in this review. Conclusion. The few studies found through this systematic review demonstrate that there is little empirical evidence on which to judge the use of direct preference elicitation methods with children regarding health states.
Background. Treatment decision making is often guided by evidence-based probabilities, which may be presented to patients during consultations. These probabilities are intrinsically imperfect and embody 2 types of uncertainties: aleatory uncertainty arising from the unpredictability of future events and epistemic uncertainty arising from limitations in the reliability and accuracy of probability estimates. Risk communication experts have recommended disclosing uncertainty. We examined whether uncertainty was discussed during cancer consultations and whether and how patients perceived uncertainty. Methods. Consecutive patient consultations with medical oncologists discussing adjuvant treatment in early-stage breast cancer were audiotaped, transcribed, and coded. Patients were interviewed after the consultation to gain insight into their perceptions of uncertainty. Results. In total, 198 patients were included by 27 oncologists. Uncertainty was disclosed in 49% (97/197) of consultations. In those 97 consultations, 23 allusions to epistemic uncertainty were made and 84 allusions to aleatory uncertainty. Overall, the allusions to the precision of the probabilities were somewhat ambiguous. Interviewed patients mainly referred to aleatory uncertainty if not prompted about epistemic uncertainty. Even when specifically asked about epistemic uncertainty, 1 in 4 utterances referred to aleatory uncertainty. When talking about epistemic uncertainty, many patients contradicted themselves. In addition, 1 in 10 patients seemed not to realize that the probabilities communicated during the consultation are imperfect. Conclusions. Uncertainty is conveyed in only half of patient consultations. When uncertainty is communicated, oncologists mainly refer to aleatory uncertainty. This is also the type of uncertainty that most patients perceive and seem comfortable discussing. Given that it is increasingly common for clinicians to discuss outcome probabilities with their patients, guidance on whether and how to best communicate uncertainty is urgently needed.
Background: Median wait times for gastroenterology services in Canada exceed consensus-recommended targets and have worsened substantially over the past decade. Meanwhile, efforts to control colorectal cancer have shifted their focus to screening asymptomatic, average-risk individuals. Along with increasing prevalence of colorectal cancer due to an aging population, screening programs are expected to add substantially to the existing burden on colonoscopy services, and create competition for limited services among individuals of varying risk. Failure to understand the effects of operational programmatic screening decisions may cause unintended harm to both screening participants and higher-risk patients, make inefficient use of limited health care resources, and ultimately hinder a program’s success. Methods: We present a new simulation model (Simulation of Cancer Outcomes for Planning Exercises, or SCOPE) for colorectal cancer screening which, unlike many other colorectal cancer screening models, reflects the effects of competition for limited colonoscopy services between patient groups and can be used to guide planning to ensure adequate resource allocation. We include verification and validation results for the SCOPE model. Results: A discrete event simulation model was developed based on an epidemiological representation of colorectal cancer in a sample population. Colonoscopy service and screening modules were added to allow observation of screening scenarios and resource considerations. The model reproduces population-based data on prevalence of colorectal cancer by stage, and mortality by cause of death, age, and sex, and attendant demand and wait times for colonoscopy services. Conclusions: The study model differs from existing screening models in that it explicitly considers the colonoscopy resource implications of screening activities and the impact of constrained resources on screening effectiveness.
Background: Discrete choice experiments incorporating duration can be used to derive health state values for EQ-5D-5L. Yet, methodological issues relating to the duration attribute and the optimal way to select health states remain. The aims of this study were to: test increasing the number of duration levels and choice sets where duration varies (aim 1); compare designs with zero and non-zero prior values (aim 2); and investigate a novel, two-stage design to incorporate prior values (aim 3). Methods: Informed by zero and non-zero prior values, two efficient designs were developed, each consisting of 120 EQ-5D-5L health profile pairs with one of six duration levels (aims 1 and 2). Another 120 health state pairs were selected, with one of six duration levels allocated in a second stage based on existing estimated utility of the states (aim 3). An online sample of 2,002 members of the UK general population completed 10 choice sets each. Differences across the regression coefficients from the three designs were assessed. Results: The zero prior value design produced a model with coefficients that were generally logically ordered, but the non-zero prior value design resulted in a set of less ordered coefficients where some differed significantly. The two-stage design resulted in ordered and significant coefficients. The non-zero prior value design may include more "difficult" choice sets, based on the proportions choosing each profile. Conclusions: There is some indication of compromised "respondent efficiency", suggesting that the use of non-zero prior values will not necessarily result in better overall precision. It is feasible to design discrete choice experiments in two stages by allocating duration values to EQ-5D-5L health state pairs based on estimates from prior studies.
Background: Sequential decision problems are frequently encountered in medical decision making, which are commonly solved using Markov decision processes (MDPs). Modeling guidelines recommend conducting sensitivity analyses in decision-analytic models to assess the robustness of the model results against the uncertainty in model parameters. However, standard methods of conducting sensitivity analyses cannot be directly applied to sequential decision problems because this would require evaluating all possible decision sequences, typically in the order of trillions, which is not practically feasible. As a result, most MDP-based modeling studies do not examine confidence in their recommended policies. Method: In this study, we provide an approach to estimate uncertainty and confidence in the results of sequential decision models. Results: First, we provide a probabilistic univariate method to identify the most sensitive parameters in MDPs. Second, we present a probabilistic multivariate approach to estimate the overall confidence in the recommended optimal policy considering joint uncertainty in the model parameters. We provide a graphical representation, which we call a policy acceptability curve, to summarize the confidence in the optimal policy by incorporating stakeholders’ willingness to accept the base case policy. For a cost-effectiveness analysis, we provide an approach to construct a cost-effectiveness acceptability frontier, which shows the most cost-effective policy as well as the confidence in that for a given willingness to pay threshold. We demonstrate our approach using a simple MDP case study. Conclusions: We developed a method to conduct sensitivity analysis in sequential decision models, which could increase the credibility of these models among stakeholders.
Background: Estimates of life expectancy are a key input to cost-effectiveness analysis (CEA) models for cancer treatments. Due to the limited follow-up in Randomized Controlled Trials (RCTs), parametric models are frequently used to extrapolate survival outcomes beyond the RCT period. However, different parametric models that fit the RCT data equally well may generate highly divergent predictions of treatment-related gain in life expectancy. Here, we investigate the use of information external to the RCT data to inform model choice and estimation of life expectancy. Methods: We used Bayesian multi-parameter evidence synthesis to combine the RCT data with external information on general population survival, conditional survival from cancer registry databases, and expert opinion. We illustrate with a 5-year follow-up RCT of cetuximab plus radiotherapy v. radiotherapy alone for head and neck cancer. Results: Standard survival time distributions were insufficiently flexible to simultaneously fit both the RCT data and external data on general population survival. Using spline models, we were able to estimate a model that was consistent with the trial data and all external data. A model integrating all sources achieved an adequate fit and predicted a 4.7-month (95% CrL: 0.4; 9.1) gain in life expectancy due to cetuximab. Conclusions: Long-term extrapolation using parametric models based on RCT data alone is highly unreliable and these models are unlikely to be consistent with external data. External data can be integrated with RCT data using spline models to enable long-term extrapolation. Conditional survival data could be used for many cancers and general population survival may have a role in other conditions. The use of external data should be guided by knowledge of natural history and treatment mechanisms.
Background. HIV transmission is the result of complex dynamics in the risk behaviors, partnership choices, disease stage and position along the HIV care continuum—individual characteristics that themselves can change over time. Capturing these dynamics and simulating transmissions to understand the chief sources of transmission remain important for prevention. Methods. The Progression and Transmission of HIV/AIDS (PATH 2.0) is an agent-based model of a sample of 10,000 people living with HIV (PLWH), who represent all men who have sex with men (MSM) and heterosexuals living with HIV in the U.S.A. Persons uninfected were modeled as populations, stratified by risk and gender. The model included detailed individual-level data from several large national surveillance databases. The outcomes focused on average annual transmission rates from 2008 through 2011 by disease stage, HIV care continuum, and sexual risk group. Results. The relative risk of transmission of those in the acute phase was nine-times [5th and 95th percentile simulation interval (SI): 7, 12] that of those in the non-acute phase, although, on average, those with acute infections comprised 1% of all PLWH. The relative risk of transmission was 24- to 50-times as high for those in the non-acute phase who had not achieved viral load suppression as compared with those who had. The relative risk of transmission among MSM was 3.2-times [SI: 2.7, 4.0] that of heterosexuals. Men who have sex with men and women generated 46% of sexually acquired transmissions among heterosexuals. Conclusions. The model results support a continued focus on early diagnosis, treatment and adherence to ART, with an emphasis on prevention efforts for MSM, a subgroup of whom appear to play a role in transmission to heterosexuals.
There is recent interest in using discrete choice experiments (DCEs) to derive health state utility values, and results can differ from time tradeoff (TTO). Clearly, DCE is "choice based," whereas TTO is generally considered a "matching" task. We explore whether procedural adaptations to the TTO, which make the method more closely resemble a DCE, make TTO and choice converge. In particular, we test whether making the matching procedure in TTO less "transparent" to the respondent reduces disparities between TTO and DCE. We designed an interactive survey that was hosted on the Internet, and 2022 interviews were achieved in the United Kingdom in a representative sample of the population. We found a marked divergence between TTO and DCE, but this was not related to the "transparency" of the TTO procedure. We conclude that a difference in the error structure between TTO and choice and that factors other than differences in utility are affecting choices is driving the divergence. The latter has fundamental implications for the way choice data are analyzed and interpreted.
Introduction. This study estimates health-related quality of life (HRQoL) or utility decrements associated with type 1 diabetes mellitus (T1DM) using data from a UK research program on the Dose Adjustment For Normal Eating (DAFNE) education program. Methods. A wide range of data was collected from 2341 individuals who undertook a DAFNE course in 2009–2012, at baseline and for 2 subsequent years. We use fixed- and random-effects linear models to generate utility estimates for T1DM using different instruments: EQ-5D, SF-6D, and EQ-VAS. We show models with and without controls for HbA1c and depression, which may be endogenous (if, for example, there is reverse causality in operation). Results. We find strong evidence of an unobserved individual effect, suggesting the superiority of the fixed-effects model. Depression shows the greatest decrement across all the models in the preferred fixed-effects model. The fixed-effects EQ-5D model also finds a significant decrement from retinopathy, body mass index, and HbA1c (%). Estimating a decrement using the fixed-effects model is not possible for some conditions where there are few new cases. In the random-effects model, diabetic foot disease shows substantial utility decrements, yet these are not significant in the fixed-effects models. Conclusion. Utility decrements have been calculated for a wide variety of health states in T1DM that can be used in economic analyses. However, despite the large data set, the low incidence of several complications leads to uncertainty in calculating the utility weights. Depression and diabetic foot disease result in a substantial loss in HRQoL for patients with T1DM. HbA1c (%) appears to have an independent negative impact on HRQoL, although concerns remain regarding the potential endogeneity of this variable.
Objective. To assess the influence of patient preferences and urologist recommendations in treatment decisions for clinically localized prostate cancer. Methods. We enrolled 257 men with clinically localized prostate cancer (prostate-specific antigen <20; Gleason score 6 or 7) seen by urologists (primarily residents and fellows) in 4 Veterans Affairs medical centers. We measured patients’ baseline preferences prior to their urology appointments, including initial treatment preference, cancer-related anxiety, and interest in sex. In longitudinal follow-up, we determined which treatment patients received. We used hierarchical logistic regression to determine the factors that predicted treatment received (active treatment v. active surveillance) and urologist recommendations. We also conducted a directed content analysis of recorded clinical encounters to determine if urologists discussed patients’ interest in sex. Results. Patients’ initial treatment preferences did not predict receipt of active treatment versus surveillance, 2(4) = 3.67, P = 0.45. Instead, receipt of active treatment was predicted primarily by urologists’ recommendations, 2(2) = 32.81, P < 0.001. Urologists’ recommendations, in turn, were influenced heavily by medical factors (age and Gleason score) but were unrelated to patient preferences, 2(6) = 0, P = 1. Urologists rarely discussed patients’ interest in sex (<15% of appointments). Conclusions. Patients’ treatment decisions were based largely on urologists’ recommendations, which, in turn, were based on medical factors (age and Gleason score) and not on patients’ personal views of the relative pros and cons of treatment alternatives.
Background. Evidence suggests that advance directives may improve end-of-life care among seriously ill patients, but improving completion rates remains a challenge. Objective. This study tested the influence of increasing the number of options for completing an advance directive among seriously ill patients. Methodology. Outpatients (N = 316) receiving hemodialysis across 15 dialysis centers in the Philadelphia region between July 2014 and July 2015 were randomized to receive either the option to complete a brief advance directive form or expanded options including a brief, expanded, or comprehensive form. Patients in both groups could decline to complete an advance directive or take their selected version home. The primary outcome was a returned, completed advance directive. Secondary outcomes included whether patients wanted to complete an advance directive, decision satisfaction, quality of life at 3 months, and patient factors associated with advance directive completion. Results. Although offering more advance directive options was not significantly associated with increased rates of completion (13.1% in the standard group v. 12.2% in the expanded group, P = 0.80), it did significantly increase the proportion of patients who wanted to complete an advance directive and took one home (71.9% in standard v. 85.3% in expanded, P = 0.004). There was no difference in satisfaction (P = 0.65) or change in quality of life between groups (P = 0.63). A higher baseline quality of life was independently associated with advance directive completion (P = 0.006). Conclusions and Relevance. These results suggest that although an expanded choice set may initially nudge patients toward completing advance directives without restricting choice, increasing actual completion requires additional interventions that overcome downstream barriers.
Objective. To design a precision medicine approach aimed at exploiting significant patterns in data, in order to produce venous thromboembolism (VTE) risk predictors for cancer outpatients that might be of advantage over the currently recommended model (Khorana score). Design: Multiple kernel learning (MKL) based on support vector machines and random optimization (RO) models were used to produce VTE risk predictors (referred to as machine learning [ML]-RO) yielding the best classification performance over a training (3-fold cross-validation) and testing set. Results. Attributes of the patient data set (n = 1179) were clustered into 9 groups according to clinical significance. Our analysis produced 6 ML-RO models in the training set, which yielded better likelihood ratios (LRs) than baseline models. Of interest, the most significant LRs were observed in 2 ML-RO approaches not including the Khorana score (ML-RO-2: positive likelihood ratio [+LR] = 1.68, negative likelihood ratio [–LR] = 0.24; ML-RO-3: +LR = 1.64, –LR = 0.37). The enhanced performance of ML-RO approaches over the Khorana score was further confirmed by the analysis of the areas under the Precision-Recall curve (AUCPR), and the approaches were superior in the ML-RO approaches (best performances: ML-RO-2: AUCPR = 0.212; ML-RO-3-K: AUCPR = 0.146) compared with the Khorana score (AUCPR = 0.096). Of interest, the best-fitting model was ML-RO-2, in which blood lipids and body mass index/performance status retained the strongest weights, with a weaker association with tumor site/stage and drugs. Conclusions. Although the monocentric validation of the presented predictors might represent a limitation, these results demonstrate that a model based on MKL and RO may represent a novel methodological approach to derive VTE risk classifiers. Moreover, this study highlights the advantages of optimizing the relative importance of groups of clinical attributes in the selection of VTE risk predictors.
Background. To develop and validate a new conceptual model (CM) of chronic obstructive pulmonary disease (COPD) for use in disease progression and economic modeling. The CM identifies and describes qualitative associations between disease attributes, progression and outcomes. Methods. A literature review was performed to identify any published CMs or literature reporting the impact and association of COPD disease attributes with outcomes. After critical analysis of the literature, a Steering Group of experts from the disciplines of health economics, epidemiology and clinical medicine was convened to develop a draft CM, which was refined using a Delphi process. The refined CM was validated by testing for associations between attributes using data from the Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints (ECLIPSE). Results. Disease progression attributes included in the final CM were history and occurrence of exacerbations, lung function, exercise capacity, signs and symptoms (cough, sputum, dyspnea), cardiovascular disease comorbidities, ‘other’ comorbidities (including depression), body composition (body mass index), fibrinogen as a biomarker, smoking and demographic characteristics (age, gender). Mortality and health-related quality of life were determined to be the most relevant final outcome measures for this model, intended to be the foundation of an economic model of COPD. Conclusion. The CM is being used as the foundation for developing a new COPD model of disease progression and to provide a framework for the analysis of patient-level data. The CM is available as a reference for the implementation of further disease progression and economic models.
Background. Modeling breast cancer progression and the effect of various risk is helpful in deciding when a woman should start and end screening, and how often the screening should be undertaken. Methods. We modeled the natural progression of breast cancer using a hidden Markov process, and incorporated the effects of covariates. Patients are women aged 50–59 (older) and 40–49 (younger) years from the Canadian National Breast Screening Studies. We included prevalent cancers, estimated the screening sensitivities and rates of over-diagnosis, and validated the models using simulation. Results. We found that older women have a higher rate of transition from a healthy to preclinical state and other causes of death but a lower rate of transition from preclinical to clinical state. Reciprocally, younger women have a lower rate of transition from a healthy to preclinical state and other causes of death but a higher rate of transition from a preclinical to clinical state. Different risk factors were significant for the age groups. The mean sojourn times for older and younger women were 2.53 and 2.96 years, respectively. In the study group, the sensitivities of the initial physical examination and mammography for older and younger women were 0.87 and 0.81, respectively, and the sensitivity of the subsequent screens were 0.78 and 0.53, respectively. In the control groups, the sensitivities of the initial physical examination for older and younger women were 0.769 and 0.671, respectively, and the sensitivity of the subsequent physical examinations for the control group aged 50–59 years was 0.37. The upper-bounds for over-diagnosis in older and younger women were 25% and 27%, respectively. Conclusions. The present work offers a basis for the better modelling of cancer incidence for a population with the inclusion of prevalent cancers.
Background. Despite its widespread advocacy, shared decision making (SDM) is not routinely used for cancer screening. To better understand the implementation barriers, we describe primary care physicians’ (PCPs’) support for SDM across diverse cancer screening contexts. Methods. Surveys were mailed to a random sample of USA-based PCPs. Using multivariable logistic regression analyses, we tested for associations of PCPs’ support of SDM with the US Preventive Service Task Force (USPSTF) assigned recommendation grade, assessed whether the decision pertained to not screening older patients, and the PCPs’ autonomous v. controlled motivation-orientation for using SDM. Results. PCPs (n = 278) were, on average, aged 52 years, 38% female, and 69% white. Of these, 79% endorsed discussing screening benefits as very important to SDM; 64% for discussing risks; and 31% for agreeing with patient’s opinion. PCPs were most likely to rate SDM as very important for colorectal cancer screening in adults aged 50–75 years (69%), and least likely for colorectal cancer screening in adults aged >85 years (34%). Regression results indicated the importance of PCPs’ having autonomous or self-determined reasons for engaging in SDM (e.g., believing in the benefits of SDM) (OR = 2.29, 95% CI, 1.87 to 2.79). PCPs’ support for SDM varied by USPSTF recommendation grade (overall contrast, X2 = 14.7; P = 0.0054), with support greatest for A-Grade recommendations. Support for SDM was lower in contexts where decisions pertained to not screening older patients (OR = 0.45, 95% CI, 0.35 to 0.56). Limitations. It is unknown whether PCPs’ perceptions of the importance of SDM behaviors differs with specific screening decisions or the potential limited ability to generalize findings. Conclusions. Our results highlight the need to document SDM benefits and consider the specific contextual challenges, such as the level of uncertainty or whether evidence supports recommending/not recommending screening, when implementing SDM across an array of cancer screening contexts.
Mortality rates in Markov models, as used in health economic studies, are often estimated from summary statistics that allow limited adjustment for confounders. If interventions are targeted at multiple diseases and/or risk factors, these mortality rates need to be combined in a single model. This requires them to be mutually adjusted to avoid ‘double counting’ of mortality. We present a mathematical modeling approach to describe the joint effect of mutually dependent risk factors and chronic diseases on mortality in a consistent manner. Most importantly, this approach explicitly allows the use of readily available external data sources. An additional advantage is that existing models can be smoothly expanded to encompass more diseases/risk factors. To illustrate the usefulness of this method and how it should be implemented, we present a health economic model that links risk factors for diseases to mortality from these diseases, and describe the causal chain running from these risk factors (e.g., obesity) through to the occurrence of disease (e.g., diabetes, CVD) and death. Our results suggest that these adjustment procedures may have a large impact on estimated mortality rates. An improper adjustment of the mortality rates could result in an underestimation of disease prevalence and, therefore, disease costs.
The choice of a cycle length in state-transition models should be determined by the frequency of clinical events and interventions. Sometimes there is need to decrease the cycle length of an existing state-transition model to reduce error in outcomes resulting from discretization of the underlying continuous-time phenomena or to increase the cycle length to gain computational efficiency. Cycle length conversion is also frequently required if a new state-transition model is built using observational data that have a different measurement interval than the model’s cycle length. We show that a commonly used method of converting transition probabilities to different cycle lengths is incorrect and can provide imprecise estimates of model outcomes. We present an accurate approach that is based on finding the root of a transition probability matrix using eigendecomposition. We present underlying mathematical challenges of converting cycle length in state-transition models and provide numerical approximation methods when the eigendecomposition method fails. Several examples and analytical proofs show that our approach is more general and leads to more accurate estimates of model outcomes than the commonly used approach. MATLAB codes and a user-friendly online toolkit are made available for the implementation of the proposed methods.
Background. Visual aids tend to help diverse and vulnerable individuals understand risk communications, as long as these individuals have a basic understanding of graphs (i.e., graph literacy). Tests of objective graph literacy (OGL) can effectively identify individuals with limited skills, highlighting vulnerabilities and facilitating custom-tailored risk communication. However, the administration of these tests can be time-consuming and may evoke negative emotional reactions (e.g., anxiety). Objectives. To evaluate a brief and easy-to-use assessment of subjective graph literacy (SGL) (i.e., self-reported ability to process and use graphically presented information) and to estimate the robustness and validity of the SGL scale and compare it with the leading OGL scale in diverse samples from different cultures. Participants. Demographically diverse residents (n = 470) of the United States, young adults (n = 172) and patients (n = 175) from Spain, and surgeons (n = 175) from 48 countries. Design. A focus group and 4 studies for instrument development and initial validation (study 1), reliability and convergent and discriminant validity evaluation (study 2), and predictive validity estimation (studies 3 and 4). Measures. Psychometric properties of the scale. Results. In about 1 minute, the SGL scale provides a reliable, robust, and valid assessment of skills and risk communication preferences and evokes fewer negative emotional reactions than the OGL scale. Conclusions. The SGL scale can be suitable for use in clinical research and may be useful as a communication aid in clinical practice. Theoretical mechanisms involved in SGL, emerging applications, limitations, and open questions are discussed.
Background. Physicians’ recommendations affect patients’ treatment choices. However, most research relies on physicians’ or patients’ retrospective reports of recommendations, which offer a limited perspective and have limitations such as recall bias. Objective. To develop a reliable and valid method to measure the strength of physician recommendations using direct observation of clinical encounters. Methods. Clinical encounters (n = 257) were recorded as part of a larger study of prostate cancer decision making. We used an iterative process to create the 5-point Physician Recommendation Coding System (PhyReCS). To determine reliability, research assistants double-coded 50 transcripts. To establish content validity, we used 1-way analyses of variance to determine whether relative treatment recommendation scores differed as a function of which treatment patients received. To establish concurrent validity, we examined whether patients’ perceived treatment recommendations matched our coded recommendations. Results. The PhyReCS was highly reliable (Krippendorf’s alpha = 0.89, 95% CI [0.86, 0.91]). The average relative treatment recommendation score for each treatment was higher for individuals who received that particular treatment. For example, the average relative surgery recommendation score was higher for individuals who received surgery versus radiation (mean difference = 0.98, SE = 0.18, P < 0.001) or active surveillance (mean difference = 1.10, SE = 0.14, P < 0.001). Patients’ perceived recommendations matched coded recommendations 81% of the time. Conclusion. The PhyReCS is a reliable and valid way to capture the strength of physician recommendations. We believe that the PhyReCS would be helpful for other researchers who wish to study physician recommendations, an important part of patient decision making.
Background. Policy evaluations taking a lifetime horizon have converted estimated changes in short-term mortality to expected life year gains using general population life expectancy. However, the life expectancy of the affected patients may differ from the general population. In trials, survival models are commonly used to extrapolate life year gains. The objective was to demonstrate the feasibility and materiality of using parametric survival models to extrapolate future survival in health care policy evaluations. Methods. We used our previous cost-effectiveness analysis of a pay-for-performance program as a motivating example. We first used the cohort of patients admitted prior to the program to compare 3 methods for estimating remaining life expectancy. We then used a difference-in-differences framework to estimate the life year gains associated with the program using general population life expectancy and survival models. Patient-level data from Hospital Episode Statistics was utilized for patients admitted to hospitals in England for pneumonia between 1 April 2007 and 31 March 2008 and between 1 April 2009 and 31 March 2010, and linked to death records for the period from 1 April 2007 to 31 March 2011. Results. In our cohort of patients, using parametric survival models rather than general population life expectancy figures reduced the estimated mean life years remaining by 30% (9.19 v. 13.15 years, respectively). However, the estimated mean life year gains associated with the program are larger using survival models (0.380 years) compared to using general population life expectancy (0.154 years). Conclusions. Using general population life expectancy to estimate the impact of health care policies can overestimate life expectancy but underestimate the impact of policies on life year gains. Using a longer follow-up period improved the accuracy of estimated survival and program impact considerably.
Objective. Survival extrapolation using a single, best-fit model ignores 2 sources of model uncertainty: uncertainty in the true underlying distribution and uncertainty about the stability of the model parameters over time. Bayesian model averaging (BMA) has been used to account for the former, but it can also account for the latter. We investigated BMA using a published comparison of the Charnley and Spectron hip prostheses using the original 8-year follow-up registry data. Methods. A wide variety of alternative distributions were fitted. Two additional distributions were used to address uncertainty about parameter stability: optimistic and skeptical. The optimistic (skeptical) model represented the model distribution with the highest (lowest) estimated probabilities of survival but reestimated using, as prior information, the most optimistic (skeptical) parameter estimated for intermediate follow-up periods. Distributions were then averaged assuming the same posterior probabilities for the optimistic, skeptical, and noninformative models. Cost-effectiveness was compared using both the original 8-year and extended 16-year follow-up data. Results. We found that all models obtained similar revision-free years during the observed period. In contrast, there was variability over the decision time horizon. Over the observed period, we detected considerable uncertainty in the shape parameter for Spectron. After BMA, Spectron was cost-effective at a threshold of £20,000 with 93% probability, whereas the best-fit model was 100%; by contrast, with a 16-year follow-up, it was 0%. Conclusions. This case study casts doubt on the ability of the single best-fit model selected by information criteria statistics to adequately capture model uncertainty. Under this scenario, BMA weighted by posterior probabilities better addressed model uncertainty. However, there is still value in regularly updating health economic models, even where decision uncertainty is low.
Background. Health state preferences vary among countries, and country-specific value sets are important in health care reimbursement decisions. When decisions are made at the regional level, regional variation in health state preferences may be important. We propose that shrinkage analysis and Bland-Altman plots can be a helpful way to investigate regional variation. Methods. The presence of regional variation can be investigated by introducing interactions between regions and the regression coefficients in the scoring algorithm. When variation is present, regional scoring algorithms can be derived through shrinkage analysis. The impact of using regional algorithms in place of the national algorithm can be investigated using simulation and illustrated using Bland-Altman plots. We applied this methodological approach to the Canadian EQ-5D-5L valuation study, which used time-tradeoff (TTO) tasks to elicit health state preferences from 1073 participants from 4 regions (Alberta, British Columbia, Ontario, and Quebec). Results. There were statistically significant interactions between the fixed effects of the scoring algorithm and region. On computing regional scoring algorithms and applying them to the EQ-5D-5L health states reported by our population, the mean utility using the Canada-wide scoring algorithm was 0.87 (standard error, 0.0013), compared to 0.85 (0.0013) on using the algorithm for Alberta, 0.80 (0.0013) on using the algorithm for British Columbia, 0.91 (0.0013) for Ontario, and 0.89 (0.0014) for Quebec. Conclusions. When health care falls under regional jurisdiction, shrinkage estimators can be used to generate regional scoring algorithms for the EQ-5D-5L and Bland-Altman plots used to assess the importance of regional variation in health state preferences. Our results suggest that mean health state preferences vary among Canada’s regions and make a sizable impact on estimates of population mean utility.
Background. Although the risk factors that contribute to postoperative complications are well recognized, prediction in the context of a particular patient is more difficult. We were interested in using a visual analog scale (VAS) to capture surgeons’ prediction of the risk of a major complication and to examine whether this could be improved. Methods. The study was performed in 3 stages. In phase I, the surgeon assessed the risk of a major complication on a 100-mm VAS immediately before and after surgery. A quality control questionnaire was designed to check if the VAS was being scored as a linear scale. In phase II, a VAS with 6 subscales for different areas of clinical risk was introduced. In phase III, predictions were completed following the presentation of detailed feedback on the accuracy of prediction of complications. Results. In total, 1295 predictions were made by 58 surgeons in 859 patients. Eight surgeons did not use a linear scale (6 logarithmic, 2 used 4 categories of risk). Surgeons made a meaningful prediction of major complications (preoperative median score 40 mm for complications v. 22 mm for no complication, P < 0.001; postoperative 46 mm v. 21 mm, P < 0.001). In phase I, the discrimination of prediction for preoperative (0.778), postoperative (0.810), and POSSUM (Physiological and Operative Severity Score for the Enumeration of Mortality and Morbidity) morbidity (0.750) prediction was similar. Although there was no improvement in prediction with a multidimensional VAS, there was a significant improvement in the discrimination of prediction after feedback (preoperative, 0.895; postoperative, 0.918). Conclusion. Awareness of different ways a VAS is scored is important when designing and interpreting studies. Clinical assessment of major complications by the surgeon was initially comparable to the prediction of the POSSUM morbidity score and improved significantly following the presentation of clinically relevant feedback.
Background. Order and amount of information influence patients’ risk perceptions, but most studies have evaluated patients’ reactions to written materials. The objective of this study was to examine the effect of 4 communication strategies, varying in their order and/or amount of information, on judgments related to an audible description of a new medication and among patients who varied in subjective numeracy. Methods. We created 5 versions of a hypothetical scenario describing a new medication. The versions were composed to elucidate whether order and/or amount of the information describing benefits and adverse events influenced how subjects valued a new medication. After listening to a randomly assigned version, perceived medication value was measured by asking subjects to choose one of the following statements: the risks outweigh the benefits, the risks and benefits are equally balanced, or the benefits outweigh the risks. Results. Of the 432 patients contacted, 389 participated in the study. Listening to a brief description of benefits followed by an extended description of adverse events resulted in a greater likelihood of perceiving that the medication’s benefits outweighed the risks compared with 1) presenting the extended adverse events description before the benefits, 2) giving a greater amount of information related to benefits, and 3) sandwiching the adverse events between benefits. These associations were only observed among subjects with average or higher subjective numeracy. Conclusion. If confirmed in future studies, our results suggest that, for patients with average or better subjective numeracy, perceived medication value is highest when a brief presentation of benefits is followed by an extended description of adverse events.
Background. Parameter uncertainty in value sets of multiattribute utility-based instruments (MAUIs) has received little attention previously. This false precision leads to underestimation of the uncertainty of the results of cost-effectiveness analyses. The aim of this study is to examine the use of multiple imputation as a method to account for this uncertainty of MAUI scoring algorithms. Method. We fitted a Bayesian model with random effects for respondents and health states to the data from the original US EQ-5D-3L valuation study, thereby estimating the uncertainty in the EQ-5D-3L scoring algorithm. We applied these results to EQ-5D-3L data from the Commonwealth Fund (CWF) Survey for Sick Adults (n = 3958), comparing the standard error of the estimated mean utility in the CWF population using the predictive distribution from the Bayesian mixed-effect model (i.e., incorporating parameter uncertainty in the value set) with the standard error of the estimated mean utilities based on multiple imputation and the standard error using the conventional approach of using MAUI (i.e., ignoring uncertainty in the value set). Result. The mean utility in the CWF population based on the predictive distribution of the Bayesian model was 0.827 with a standard error (SE) of 0.011. When utilities were derived using the conventional approach, the estimated mean utility was 0.827 with an SE of 0.003, which is only 25% of the SE based on the full predictive distribution of the mixed-effect model. Using multiple imputation with 20 imputed sets, the mean utility was 0.828 with an SE of 0.011, which is similar to the SE based on the full predictive distribution. Conclusion. Ignoring uncertainty of the predicted health utilities derived from MAUIs could lead to substantial underestimation of the variance of mean utilities. Multiple imputation corrects for this underestimation so that the results of cost-effectiveness analyses using MAUIs can report the correct degree of uncertainty.
Background. Side effects prompt some patients to forego otherwise-beneficial therapies. This study explored which characteristics make side effects particularly aversive. Methods. We used a psychometric approach, originating from research on risk perception, to identify the factors (or components) underlying side effect perceptions. Women (N = 149) aged 40 to 74 years were recruited from a patient registry to complete an online experiment. Participants were presented with hypothetical scenarios in which an effective and necessary medication conferred a small risk of a single side effect (e.g., nausea, dizziness). They rated a broad range of side effects on several characteristics (e.g., embarrassing, treatable). In addition, we collected 4 measures of aversiveness for each side effect: choosing to take the medication, willingness to pay to avoid the side effect (WTP), negative affective attitude associated with the side effect, and how each side effect ranks among others in terms of undesirability. A principal components analysis (PCA) was used to identify the components underlying side effect perceptions. Then, for each aversiveness measure separately, regression analyses were used to determine which components predicted differences in aversiveness among the side effects. Results. The PCA revealed 4 components underlying side effect perceptions: affective challenge (e.g., frightening), social challenge (e.g., disfiguring), physical challenge (e.g., painful), and familiarity (e.g., common). Side effects perceived as affectively and physically challenging elicited the highest levels of aversiveness across all 4 measures. Conclusions. Understanding what side effect characteristics are most aversive may inform interventions to improve medical decisions and facilitate the translation of novel biomedical therapies into clinical practice.
Background. First impressions are thought to exert a disproportionate influence on subsequent judgments; however, their role in medical diagnosis has not been systematically studied. We aimed to elicit and measure the association between first impressions and subsequent diagnoses in common presentations with subtle indications of cancer. Methods. Ninety UK family physicians conducted interactive simulated consultations online, while on the phone with a researcher. They saw 6 patient cases, 3 of which could be cancers. Each cancer case included 2 consultations, whereby each patient consulted again with nonimproving and some new symptoms. After reading an introduction (patient description and presenting problem), physicians could request more information, which the researcher displayed online. In 2 of the possible cancers, physicians thought aloud. Two raters coded independently the physicians’ first utterances (after reading the introduction but before requesting more information) as either acknowledging the possibility of cancer or not. We measured the association of these first impressions with the final diagnoses and management decisions. Results. The raters coded 297 verbalizations with high interrater agreement (Kappa = 0.89). When the possibility of cancer was initially verbalized, the odds of subsequently diagnosing it were on average 5 times higher (odds ratio 4.90 [95% CI 2.72 to 8.84], P < 0.001), while the odds of appropriate referral doubled (OR 1.98 [1.10 to 3.57], P = 0.002). The number of cancer-related questions physicians asked mediated the relationship between first impressions and subsequent diagnosis, explaining 29% of the total effect. Conclusion. We measured a strong association between family physicians’ first diagnostic impressions and subsequent diagnoses and decisions. We suggest that interventions to influence and support the diagnostic process should target its early stage of hypothesis generation.
Background. Studies show adjuvant endocrine therapy increases survival and decreases risk of breast cancer recurrence for hormone receptor–positive tumors. Yet studies also suggest that adherence rates among women taking this therapy may be as low as 50% owing largely to adverse side effects. Despite these rates, research on longitudinal patient decision making regarding this therapy is scant. Objective. We sought to map the decision-making process for women considering and initiating adjuvant endocrine therapy, paying particular attention to patterns of uncertainty and decisional change over time. Methods. A longitudinal series of semistructured interviews conducted at a multispecialty health care organization in Northern California with 35 newly diagnosed patients eligible for adjuvant endocrine therapy were analyzed. Analysis led to the identification and indexing of 3 new decision-making constructs—decisional phase, decisional direction, and decisional resolve—which were then organized using a visual matrix and examined for patterns characterizing the decision-making process. Results. Our data reveal that most patients do not make a single, discrete decision to take or not take hormone therapy but rather traverse multiple decisional states, characterized by 1) phase, 2) direction, and 3) strength of resolve. Our analysis tracks these decisional states longitudinally using a grayscale-coded matrix. Our data show that decisional resolve wavers not just when considering therapy, as the existing concept of decisional conflict suggests, but even after initiating it, which may signal future decisions to forgo therapy. Conclusions. Adjuvant endocrine therapy, like other chronic care decisions, has a longer decision-making process and implementation period. Thus, theoretical, empirical, and clinical approaches should consider further exploring the new concept and measurement of decisional resolve, as it may help to improve subsequent medication adherence.
Background. Medical decision making may be influenced by contextual factors. We evaluated whether pathologists are influenced by disease severity of recently observed cases. Methods. Pathologists independently interpreted 60 breast biopsy specimens (one slide per case; 240 total cases in the study) in a prospective randomized observational study. Pathologists interpreted the same cases in 2 phases, separated by a washout period of >6 months. Participants were not informed that the cases were identical in each phase, and the sequence was reordered randomly for each pathologist and between phases. A consensus reference diagnosis was established for each case by 3 experienced breast pathologists. Ordered logit models examined the effect the pathologists’ diagnoses on the preceding case or the 5 preceding cases had on their diagnosis for the subsequent index case. Results. Among 152 pathologists, 49 provided interpretive data in both phases I and II, 66 from only phase I, and 37 from phase II only. In phase I, pathologists were more likely to indicate a more severe diagnosis than the reference diagnosis when the preceding case was diagnosed as ductal carcinoma in situ (DCIS) or invasive cancer (proportional odds ratio [POR], 1.28; 95% confidence interval [CI], 1.15–1.42). Results were similar when considering the preceding 5 cases and for the pathologists in phase II who interpreted the same cases in a different order compared with phase I (POR, 1.17; 95% CI, 1.05–1.31). Conclusion. Physicians appear to be influenced by the severity of previously interpreted test cases. Understanding types and sources of diagnostic bias may lead to improved assessment of accuracy and better patient care.
This article describes methods used to estimate parameters governing long-term survival, or times to other events, for health economic models. Specifically, the focus is on methods that combine shorter-term individual-level survival data from randomized trials with longer-term external data, thus using the longer-term data to aid extrapolation of the short-term data. This requires assumptions about how trends in survival for each treatment arm will continue after the follow-up period of the trial. Furthermore, using external data requires assumptions about how survival differs between the populations represented by the trial and external data. Study reports from a national health technology assessment program in the United Kingdom were searched, and the findings were combined with "pearl-growing" searches of the academic literature. We categorized the methods that have been used according to the assumptions they made about how the hazards of death vary between the external and internal data and through time, and we discuss the appropriateness of the assumptions in different circumstances. Modeling choices, parameter estimation, and characterization of uncertainty are discussed, and some suggestions for future research priorities in this area are given.
There is much interest in understanding decision-making processes that determine funding outcomes for health interventions. We use classification and regression trees (CART) to identify cost-effectiveness thresholds and hierarchies in the determinants of funding decisions. The hierarchical structure of CART is suited to analyzing complex conditional and nonlinear relationships. Our analysis uncovered hierarchies where interventions were grouped according to their type and objective. Cost-effectiveness thresholds varied markedly depending on which group the intervention belonged to: lifestyle-type interventions with a prevention objective had an incremental cost-effectiveness threshold of $2356, suggesting that such interventions need to be close to cost saving or dominant to be funded. For lifestyle-type interventions with a treatment objective, the threshold was much higher at $37,024. Lower down the tree, intervention attributes such as the level of patient contribution and the eligibility for government reimbursement influenced the likelihood of funding within groups of similar interventions. Comparison between our CART models and previously published results demonstrated concurrence with standard regression techniques while providing additional insights regarding the role of the funding environment and the structure of decision-maker preferences.
Background. The National Institute for Quality and Efficiency in Health Care (IQWiG) employs an efficiency frontier (EF) framework to facilitate setting maximum reimbursable prices for new interventions. Probabilistic sensitivity analysis (PSA) is used when yes/no reimbursement decisions are sought based on a fixed threshold. In the IQWiG framework, an additional layer of complexity arises as the EF itself may vary its shape in each PSA iteration, and thus the willingness-to-pay, indicated by the EF segments, may vary. Objectives. To explore the practical problems arising when, within the EF approach, maximum reimbursable prices for new interventions are sought through PSA. Methods. When the EF is varied in a PSA, cost recommendations for new interventions may be determined by the mean or the median of the distances between each intervention’s point estimate and each EF. Implications of using these metrics were explored in a simulation study based on the model used by IQWiG to assess the cost-effectiveness of 4 antidepressants. Results. Depending on the metric used, cost recommendations can be contradictory. Recommendations based on the mean can also be inconsistent. Results (median) suggested that costs of duloxetine, venlafaxine, mirtazapine, and bupropion should be decreased by 131, 29, 12, and 99, respectively. These recommendations were implemented and the analysis repeated. New results suggested keeping the costs as they were. The percentage of acceptable PSA outcomes increased 41% on average, and the uncertainty associated to the net health benefit was significantly reduced. Conclusions. The median of the distances between every intervention outcome and every EF is a good proxy for the cost recommendation that would be given should the EF be fixed. Adjusting costs according to the median increased the probability of acceptance and reduced the uncertainty around the net health benefit distribution, resulting in a reduced uncertainty for decision makers.
Background: The Evaluation of Cinacalcet HCl Therapy to Lower Cardiovascular Events (EVOLVE) clinical trial evaluated the effects of cinacalcet on clinical events in patients with secondary hyperparathyroidism (sHPT) who were on hemodialysis. Health-related quality of life (HRQoL) was assessed by a generic, preference-based health outcome measure (EQ-5D) at scheduled visits and after a study event. Here, we report the HRQoL analysis from EVOLVE. Methods: We assessed changes in HRQoL from baseline to scheduled visits, and estimated the acute (3 mo) and chronic (beyond 3 mo) effects of sHPT-related events on HRQoL using generalized estimating equation analysis controlling for baseline HRQoL and randomized assignment. Results: Data on HRQoL were available for 3547 of 3883 subjects, with 1650 events in the placebo and 1502 in the cinacalcet arm. At the study end, no difference in change from baseline HRQoL was observed in the direct comparison of EQ-5D by treatment arms. The regression analysis showed significant effects of events on HRQoL and a modest positive effect of cinacalcet. Estimated quality-adjusted life-year gains were of similar magnitude based on the observed data or the predictions from the model, with only a small gain in precision from the predicted analysis. Conclusions: By contrast with a conventional comparison, a regression analysis demonstrated large decrements in HRQoL after events and a modest improvement in HRQoL with cinacalcet. As randomized controlled trials are rarely powered to detect differences in HRQoL, a prespecified regression analysis may be acceptable to improve precision of the effects and understand their origin.
Purpose. To explore the effects of personalized prognostic information on physicians’ intentions to communicate prognosis to cancer patients at the end of life, and to identify factors that moderate these effects. Methods. A factorial experiment was conducted in which 93 family medicine physicians were presented with a hypothetical vignette depicting an end-stage gastric cancer patient seeking prognostic information. Physicians’ intentions to communicate prognosis were assessed before and after provision of personalized prognostic information, while emotional distress of the patient and ambiguity (imprecision) of the prognostic estimate were varied between subjects. General linear models were used to test the effects of personalized prognostic information, patient distress, and ambiguity on prognostic communication intentions, and potential moderating effects of 1) perceived patient distress, 2) perceived credibility of prognostic models, 3) physician numeracy (objective and subjective), and 4) physician aversion to risk and ambiguity. Results. Provision of personalized prognostic information increased prognostic communication intentions (P < 0.001, 2 = 0.38), although experimentally manipulated patient distress and prognostic ambiguity had no effects. Greater change in communication intentions was positively associated with higher perceived credibility of prognostic models (P = 0.007, 2 = 0.10), higher objective numeracy (P = 0.01, 2 = 0.09), female sex (P = 0.01, 2 = 0.08), and lower perceived patient distress (P = 0.02, 2 = 0.07). Intentions to communicate available personalized prognostic information were positively associated with higher perceived credibility of prognostic models (P = 0.02, 2 = 0.09), higher subjective numeracy (P = 0.02, 2 = 0.08), and lower ambiguity aversion (P = 0.06, 2 = 0.04). Conclusions. Provision of personalized prognostic information increases physicians’ prognostic communication intentions to a hypothetical end-stage cancer patient, and situational and physician characteristics moderate this effect. More research is needed to confirm these findings and elucidate the determinants of prognostic communication at the end of life.
Purpose. Traditional informed consent documents tend to be too lengthy and technical to facilitate proper patient engagement. Patient-centered, short informed consent content could be equally informative, while minimizing patient burden and producing greater patient engagement. This study aimed to develop and evaluate patient-centered, patient-designed paper and video informed consent formats. Methods. Two studies were conducted. In study 1, 118 self-identifying asthma patients recruited from a national, online pool completed survey tasks from their personal computers. Participants in study 1 were randomly assigned to examine sections of a standard informed consent document for an asthma trial and to select information they deemed critical to their decision making. In study 2, a sample of 83 self-identifying asthma patients completed experimental tasks in a university laboratory. Participants in study 2 were randomly assigned to a full informed consent document; a shortened, patient-designed informed consent document created from study 1; or a video with content matched to the shortened paper form. Results. Study 1 yielded a more readable, concise version of a standard informed consent document (5 v. 17 pages). This shortened, patient-designed form closely met normative criteria for good clinical practice. In study 2, participants who viewed either the shortened paper consent or video reported greater engagement than those viewing the standard paper consent, without lowered performance on any other decision-relevant variables (i.e., comprehension, judged risk/benefit, feelings of trust). The video consent format did not cause increased enrollment. Conclusions. Results suggest that providing concise informed consent content, systematically developed from patients’ self-reported information needs, may be more effective at engaging and informing clinical trial participants than the traditional consent approach, without detriment to trial comprehension, risk assessment, or enrollment.
Background. Although drug-eluting stents (DES) have been widely incorporated into clinical practice in developed countries, several countries restrict their use mainly because of their high cost and unfavorable incremental cost-effectiveness ratios (ICER). Objective. To evaluate the cost-effectiveness of DES in comparison with bare-metal stents (BMS) for treatment of coronary artery disease (CAD). Design. Markov model. Data Sources. Published literature, government database, and CAD patient cohort. Target Population. Single-vessel CAD patients. Time Horizon. One year and lifetime. Perspective. Brazilian Public Health System (SUS). Intervention. Six strategies composed of percutaneous intervention with a BMS or 1 of 5 DES (paclitaxel, sirolimus, everolimus, zotarolimus, and zotarolimus resolute). Outcome Measures. Cost for target vessel revascularization avoided and cost for quality-adjusted life year gained. Base Case Analysis. In the short-term analysis, sirolimus was the most effective and least costly among DES (ICER of I$20,642 per target vessel revascularization avoided), with all others DES dominated by sirolimus. Lifetime cumulative costs ranged from I$18,765 to I$21,400. In the base case analysis, zotarolimus resolute had the most favorable ICER among the DES (ICER I$62,761), with sirolimus, paclitaxel, and zotarolimus being absolute dominated and everolimus extended dominated by zotarolimus resolute, although all the results were above the willingness-to-pay threshold of 3 times the gross domestic product per capita (I$35,307). Sensitivity Analysis. In deterministic sensitivity analysis, results were sensitive to cost of DES, number of stents used per patient, baseline probability, and duration of stent thrombosis risk. The probabilistic sensitivity analysis demonstrated a probability of 81% for BMS being the strategy of choice, with 9% for everolimus and 9% zotarolimus resolute, at the willingness-to-pay threshold. Conclusion. DES is not a good value for money in SUS perspective, despite its benefit in reducing target vessel revascularization. Since the cost-effectiveness of DES is mainly driven by the stents’ cost difference, they should cost less than twice the BMS price to become a cost-effective alternative.
Objective. We explored whether active patient involvement in decision making and greater patient knowledge are associated with better treatment decision-making experiences and better quality of life (QOL) among men with clinically localized prostate cancer. Localized prostate cancer treatment decision making is an advantageous model for studying patient treatment decision-making dynamics because there are multiple treatment options and a lack of empirical evidence to recommend one over the other; consequently, it is recommended that patients be fully involved in making the decision. Methods. Men with newly diagnosed clinically localized prostate cancer (N = 1529) completed measures of decisional control, prostate cancer knowledge, and decision-making experiences (decisional conflict and decision-making satisfaction and difficulty) shortly after they made their treatment decision. Prostate cancer–specific QOL was assessed at 6 months after treatment. Results. More active involvement in decision making and greater knowledge were associated with lower decisional conflict and higher decision-making satisfaction but greater decision-making difficulty. An interaction between decisional control and knowledge revealed that greater knowledge was only associated with greater difficulty for men actively involved in making the decision (67% of sample). Greater knowledge, but not decisional control, predicted better QOL 6 months after treatment. Conclusions. Although men who are actively involved in decision making and more knowledgeable may make more informed decisions, they could benefit from decisional support (e.g., decision-making aids, emotional support from providers, strategies for reducing emotional distress) to make the process easier. Men who were more knowledgeable about prostate cancer and treatment side effects at the time that they made their treatment decision may have appraised their QOL as higher because they had realistic expectations about side effects.
Background. Despite a gradual reduction in the workload during residency, 24-hour calls are still an integral part of most training programs. While sleep deprivation increases the risk propensity, the impact on medical risk taking has not been studied. Objective. This study aimed to assess the clinical decision making and psychomotor performance of pediatric residents following a limited nap time during a 24-hour call. Methods. A neurocognitive battery (IntegNeuro) and a medical decision questionnaire were completed by 44 pediatric residents at 2 time points: after a 24-hour call and following 3 nights with no calls (sleep ≥5 hours). To monitor sleep, residents wore actigraphs and completed sleep logs. Results. Nap time during the shift was <1 hour in 14 cases (32%), 1 to 2 hours in 16 cases (35%), and 2 to 3 hours in 14 cases (32%). Residents who napped less than 1 hour chose the riskier medical option in 50% of cases compared with 36% when answering the same questionnaire after 3 nights with no calls (P = 0.002). This effect was not found in residents who napped 1 to 2 hours (no change in risk taking) or 2 to 3 hours (4% decreased risk taking) (difference between groups, P = 0.001). Risk-taking tendency inversely correlated with sustained attention scores (Pearson = –0.433, P = 0.003). Sustained attention was the neurocognitive domain most affected by sleep deprivation (effect size = 0.29, P = 0.025). Conclusions. This study suggests that residents napping less than an hour during a night shift are prone to riskier clinical decisions. Hence, enabling residents to nap at least 1 hour during shifts is recommended.
The classification tree is a valuable methodology for predictive modeling and data mining. However, the current existing classification trees ignore the fact that there might be a subset of individuals who cannot be well classified based on the information of the given set of predictor variables and who might be classified with a higher error rate; most of the current existing classification trees do not use the combination of variables in each step. An algorithm of a logistic regression–based trichotomous classification tree (LRTCT) is proposed that employs the trichotomous tree structure and the linear combination of predictor variables in the recursive partitioning process. Compared with the widely used classification and regression tree through the applications on a series of simulated data and 2 real data sets, the LRTCT performed better in several aspects and does not require excessive complicated calculations.
Background. More than 1 in 4 Americans report difficulty paying medical bills. Cost-reducing strategies discussed during outpatient physician visits remain poorly characterized. Objective. We sought to determine how often patients and physicians discuss health care costs during outpatient visits and what strategies, if any, they discussed to lower patient out-of-pocket costs. Design. Retrospective analysis of dialogue from 1755 outpatient visits in community-based practices nationwide from 2010 to 2014. The study population included 677 patients with breast cancer, 422 with depression, and 656 with rheumatoid arthritis visiting 56 oncologists, 36 psychiatrists, and 26 rheumatologists, respectively. Results. Thirty percent of visits contained cost conversations (95% confidence interval [CI], 28 to 32). Forty-four percent of cost conversations involved discussion of cost-saving strategies (95% CI, 40 to 48; median duration, 68 s). We identified 4 strategies to lower costs without changing the care plan. They were, in order of overall frequency: 1) changing logistics of care, 2) facilitating co-pay assistance, 3) providing free samples, and 4) changing/adding insurance plans. We also identified 4 strategies to reduce costs by changing the care plan: 1) switching to lower-cost alternative therapy/diagnostic, 2) switching from brand name to generic, 3) changing dosage/frequency, and 4) stopping/withholding interventions. Strategies were relatively consistent across health conditions, except for switching to a lower-cost alternative (more common in breast oncology) and providing free samples (more common in depression). Limitation. Focus on 3 conditions with potentially high out-of-pocket costs. Conclusions. Despite price opacity, physicians and patients discuss a variety of out-of-pocket cost reduction strategies during clinic visits. Almost half of cost discussions mention 1 or more cost-saving strategies, with more frequent mention of those not requiring care-plan changes.
Research indicates that there is a preference for natural v. synthetic products, but the influence of this preference on drug choice in the medical domain is largely unknown. We present 5 studies in which participants were asked to consider a hypothetical situation in which they had a medical issue requiring pharmacological therapy. Participants ( N = 1223) were asked to select a natural, plant-derived, or synthetic drug. In studies 1a and 1b, approximately 79% of participants selected the natural v. synthetic drug, even though the safety and efficacy of the drugs were identical. Furthermore, participants rated the natural drug as safer than the synthetic drug, and as that difference increased, the odds of choosing the natural over synthetic drug increased. In studies 2 and 3, approximately 20% of participants selected the natural drug even when they were informed that it was less safe (study 2) or less effective (study 3) than the synthetic drug. Finally, in study 4, approximately 65% of participants chose a natural over synthetic drug regardless of the severity of a specific medical condition (mild v. severe hypertension), and this choice was predicted by perceived safety and efficacy differences. Overall, these data indicate that there is a bias for natural over synthetic drugs. This bias could have implications for drug choice and usage.
Background. To develop statistical models predicting disease progression and outcomes in chronic obstructive pulmonary disease (COPD), using data from ECLIPSE, a large, observational study of current and former smokers with COPD. Methods. Based on a conceptual model of COPD disease progression and data from 2164 patients, associations were made between baseline characteristics, COPD disease progression attributes (exacerbations, lung function, exercise capacity, and symptoms), health-related quality of life (HRQoL), and survival. Linear and nonlinear functional forms of random intercept models were used to characterize these relationships. Endogeneity was addressed by time-lagging variables in the regression models. Results. At the 5% significance level, an exacerbation history in the year before baseline was associated with increased risk of future exacerbations (moderate: +125.8%; severe: +89.2%) and decline in lung function (forced expiratory volume in 1 second [FEV1]) (–94.20 mL per year). Each 1% increase in FEV1 % predicted was associated with decreased risk of exacerbations (moderate: –1.1%; severe: –3.0%) and increased 6-minute walk test distance (6MWD) (+1.5 m). Increases in baseline exercise capacity (6MWD, per meter) were associated with slightly increased risk of moderate exacerbations (+0.04%) and increased FEV1 (+0.62 mL). Symptoms (dyspnea, cough, and/or sputum) were associated with an increased risk of moderate exacerbations (+13.4% to +31.1%), and baseline dyspnea (modified Medical Research Council score ≥2 v. <2) was associated with lower FEV1 (–112.3 mL). Conclusions. A series of linked statistical regression equations have been developed to express associations between indicators of COPD disease severity and HRQoL and survival. These can be used to represent disease progression, for example, in new economic models of COPD.
Background. Influenza vaccination is strongly associated with socioeconomic status, but there is only limited evidence on the respective roles of socioeconomic differences in vaccination intentions versus corresponding differences in follow-through on initial vaccination plans for subsequent socioeconomic differences in vaccine uptake. Methods. Nonparametric mean smoothing, linear regression, and probit models were used to analyze longitudinal survey data on perceived influenza risks, behavioral vaccination intentions, and vaccination behavior of adults during the 2009–2010 influenza A/H1N1 ("swine flu") pandemic in the United States. Perceived influenza risks and behavioral vaccination intentions were elicited prior to the availability of H1N1 vaccine using a probability scale question format. H1N1 vaccine uptake was assessed at the end of the pandemic. Results. Education, income, and health insurance coverage displayed positive associations with behavioral intentions to get vaccinated for pandemic influenza while employment was negatively associated with stated H1N1 vaccination intentions. Education and health insurance coverage also displayed significant positive associations with pandemic vaccine uptake. Moreover, behavioral vaccination intentions showed a strong and statistically significant positive partial association with later H1N1 vaccination. Incorporating vaccination intentions in a statistical model for H1N1 vaccine uptake further highlighted higher levels of follow-through on initial vaccination plans among persons with higher education levels and health insurance. Limitations. Sampling bias, misreporting in self-reported data, and limited generalizability to nonpandemic influenza are potential limitations of the analysis. Conclusions. Closing the socioeconomic gap in influenza vaccination requires multipronged strategies that not only increase vaccination intentions by improving knowledge, attitudes, and beliefs but also facilitate follow-through on initial vaccination plans by improving behavioral control and access to vaccination for individuals with low education, employed persons, and the uninsured.
Background. Probabilistic sensitivity analyses (PSA) may lead policy makers to take nonoptimal actions due to misestimates of decision uncertainty caused by ignoring correlations. We developed a method to establish joint uncertainty distributions of quality-of-life (QoL) weights exploiting ordinal preferences over health states. Methods. Our method takes as inputs independent, univariate marginal distributions for each QoL weight and a preference ordering. It establishes a correlation matrix between QoL weights intended to preserve the ordering. It samples QoL weight values from their distributions, ordering them with the correlation matrix. It calculates the proportion of samples violating the ordering, iteratively adjusting the correlation matrix until this proportion is below an arbitrarily small threshold. We compare our method with the uncorrelated method and other methods for preserving rank ordering in terms of violation proportions and fidelity to the specified marginal distributions along with PSA and expected value of partial perfect information (EVPPI) estimates, using 2 models: 1) a decision tree with 2 decision alternatives and 2) a chronic hepatitis C virus (HCV) Markov model with 3 alternatives. Results. All methods make tradeoffs between violating preference orderings and altering marginal distributions. For both models, our method simultaneously performed best, with largest performance advantages when distributions reflected wider uncertainty. For PSA, larger changes to the marginal distributions induced by existing methods resulted in differing conclusions about which strategy was most likely optimal. For EVPPI, both preference order violations and altered marginal distributions caused existing methods to misestimate the maximum value of seeking additional information, sometimes concluding that there was no value. Conclusions. Analysts can characterize the joint uncertainty in QoL weights to improve PSA and value-of-information estimates using Open Source implementations of our method.
Background. Lower numerical ability is associated with poorer understanding of health statistics, such as risk reductions of medical treatment. For many people, despite good numeracy skills, math provokes anxiety that impedes an ability to evaluate numerical information. Math-anxious individuals also report less confidence in their ability to perform math tasks. We hypothesized that, independent of objective numeracy, math anxiety would be associated with poorer responding and lower confidence when calculating risk reductions of medical treatments. Methods. Objective numeracy was assessed using an 11-item objective numeracy scale. A 13-item self-report scale was used to assess math anxiety. In experiment 1, participants were asked to interpret the baseline risk of disease and risk reductions associated with treatment options. Participants in experiment 2 were additionally provided a graphical display designed to facilitate the processing of math information and alleviate effects of math anxiety. Confidence ratings were provided on a 7-point scale. Results. Individuals of higher objective numeracy were more likely to respond correctly to baseline risks and risk reductions associated with treatment options and were more confident in their interpretations. Individuals who scored high in math anxiety were instead less likely to correctly interpret the baseline risks and risk reductions and were less confident in their risk calculations as well as in their assessments of the effectiveness of treatment options. Math anxiety predicted confidence levels but not correct responding when controlling for objective numeracy. The graphical display was most effective in increasing confidence among math-anxious individuals. Conclusions. The findings suggest that math anxiety is associated with poorer medical risk interpretation but is more strongly related to confidence in interpretations.
We present a generalized model to assess the impact of regionalization on patient care outcomes in the presence of heterogeneity in provider learning. The model characterizes best regionalization policies as optimal allocations of patients across providers with heterogeneous learning abilities. We explore issues that arise when solving for best regionalization, which depends on statistically estimated provider learning curves. We explain how to maintain the problem’s tractability and reformulate it into a binary integer program problem to improve solvability. Using our model, best regionalization solutions can be computed within reasonable time using current-day computers. We apply the model to minimally invasive radical prostatectomy and estimate that, in comparison to current care delivery, within-state regionalization can shorten length of stay by at least 40.8%.
Uptake of vaccination against seasonal influenza is suboptimal in most countries, and campaigns to promote vaccination may be weakened by clustering of opinions and decisions not to vaccinate. This clustering can occur at myriad interacting levels: within households, social circles, and schools. Given that influenza is more likely to be transmitted to a household contact than any other contact, clustering of vaccination decisions is arguably most problematic at the household level. We conducted an international survey study to determine whether household members across different cultures offered direct advice to each other regarding influenza vaccination and whether this advice was associated with vaccination decisions. The survey revealed that household members across the world advise one another to vaccinate, although to varying degrees, and that advice correlates with an increase in vaccination uptake. In addition, respondents in Japan, China, and the United States were less likely to offer advice to older adults than to the young, despite older adults’ being the target age group for vaccination in both Far Eastern countries. Furthermore, advice was not primarily directed to household members within the age groups advised to vaccinate by national health policies. In Japan, advice was offered more to ages outside of the policy guidelines than inside. Harnessing the influence of household members may offer a novel strategy to improve vaccination coverage across cultures worldwide.
Introduction. Because current evidence suggests that numeracy affects how people make decisions, it is an important factor to account for in studies assessing the effectiveness of medical decision support interventions. Subjective and objective numeracy assessment methods are available that vary in theoretical background, skills assessed, known relationship with decision making skills, and ease of implementation. The best way to use these tools to assess numeracy when conducting medical decision-making research is currently unknown. Methods. We conducted Internet surveys comparing numeracy assessments obtained using the subjective numeracy scale (SNS) and 5 objective numeracy scales. Each study participant completed the SNS and 1 objective numeracy measure. Following each assessment, participants indicated willingness to repeat the assessment and rated its user acceptability. Results. The overall response rate was 78%, resulting in a total sample size of 673. Spearman correlations between the SNS and the objective numeracy measures ranged from 0.19 to 0.44. Acceptability assessments for the short form of the Numeracy Understanding in Medicine Instrument and the SNS did not differ significantly. The other objective scales all had lower acceptability ratings than the SNS. Conclusions. These findings are consistent with prior research suggesting that objective and subjective numeracy scales measure related but distinct constructs. Due to current uncertainty regarding which construct is more likely to influence the effectiveness of decision support interventions, these findings warrant further investigation to determine the proper use of objective versus subjective numeracy assessments in medical decision-making research. Pending additional information, a reasonable approach is to measure both objective and subjective numeracy so that the full range of actual and perceived numeracy skills can be taken into account.
Background. The Affordable Care Act allows uninsured individuals to select health insurance from numerous private plans, a challenging decision-making process. This study examined the effectiveness of strategies to support health insurance decisions among the uninsured. Methods. Participants (N = 343) from urban, suburban, and rural areas were randomized to 1 of 3 conditions: 1) a plain language table; 2) a visual condition where participants chose what information to view and in what order; and 3) a narrative condition. We administered measures assessing knowledge (true/false responses about key features of health insurance), confidence in choices (uncertainty subscale of the Decisional Conflict Scale), satisfaction (items from the Health Information National Trends Survey), preferences for insurance features (measured on a Likert scale from not at all important to very important), and plan choice. Results. Although we did not find significant differences in knowledge, confidence in choice, or satisfaction across condition, participants across conditions made value-consistent choices, selecting plans that aligned with their preferences for key insurance features. In addition, those with adequate health literacy skills as measured by the Rapid Estimate of Adult Literacy in Medicine-Short Form (REALM-SF) had higher knowledge overall (
Background. Health messages are more effective when framed to be congruent with recipient characteristics, and health practitioners can strategically choose message features to promote adherence to recommended behaviors. We present exposure to US culture as a moderator of the impact of gain-frame versus loss-frame messages. Since US culture emphasizes individualism and approach orientation, greater cultural exposure was expected to predict improved patient choices and memory for gain-framed messages, whereas individuals with less exposure to US culture would show these advantages for loss-framed messages. Methods. 223 participants viewed a written oral health message in 1 of 3 randomized conditions—gain-frame, loss-frame, or no-message control—and were given 10 flosses. Cultural exposure was measured with the proportions of life spent and parents born in the US. At baseline and 1 week later, participants completed recall tests and reported recent flossing behavior. Results. Message frame and cultural exposure interacted to predict improved patient decisions (increased flossing) and memory maintenance for the health message over 1 week; for example, those with low cultural exposure who saw a loss-frame message flossed more. Incongruent messages led to the same flossing rates as no message. Memory retention did not explain the effect of message congruency on flossing. Limitations. Flossing behavior was self-reported. Cultural exposure may only have practical application in either highly individualistic or collectivistic countries. Conclusions. In health care settings where patients are urged to follow a behavior, asking basic demographic questions could allow medical practitioners to intentionally communicate in terms of gains or losses to improve patient decision making and treatment adherence.
Background. Multiple embryo transfers in in vitro fertilization (IVF) treatment increase the number of successful pregnancies while elevating the risk of multiple gestations. IVF-associated multiple pregnancies exhibit significant financial, social, and medical implications. Clinicians need to decide the number of embryos to be transferred considering the tradeoff between successful outcomes and multiple pregnancies. Objective. To predict implantation outcome of individual embryos in an IVF cycle with the aim of providing decision support on the number of embryos transferred. Design. Retrospective cohort study. Data Source. Electronic health records of one of the largest IVF clinics in Turkey. The study data set included 2453 embryos transferred at day 2 or day 3 after intracytoplasmic sperm injection (ICSI). Each embryo was represented with 18 clinical features and a class label, +1 or -1, indicating positive and negative implantation outcomes, respectively. Methods. For each classifier tested, a model was developed using two-thirds of the data set, and prediction performance was evaluated on the remaining one-third of the samples using receiver operating characteristic (ROC) analysis. The training-testing procedure was repeated 10 times on randomly split (two-thirds to one-third) data. The relative predictive values of clinical input characteristics were assessed using information gain feature weighting and forward feature selection methods. Results. The naïve Bayes model provided 80.4% accuracy, 63.7% sensitivity, and 17.6% false alarm rate in embryo-based implantation prediction. Multiple embryo implantations were predicted at a 63.8% sensitivity level. Predictions using the proposed model resulted in higher accuracy compared with expert judgment alone (on average, 75.7% and 60.1%, respectively). Conclusions. A machine learning–based decision support system would be useful in improving the success rates of IVF treatment.
Health service users err in posttest probability evaluations. Here we document for the first time that users succeed when they reason about numbers of cases and make distributive evaluations. A sample of women interested in prenatal testing incorrectly evaluated the posttest probability that a given fetus had an anomaly, but regardless of their numeracy level, they correctly apportioned the cases for and against that hypothesis. This finding shows that health service users are not doomed to fail in dealing with single-case probabilities and suggests that probabilistic data can be used effectively for communicating test results.
Background. Many healthy women consider genetic testing for breast cancer risk, yet BRCA testing issues are complex. Objective. To determine whether an intelligent tutor, BRCA Gist, grounded in fuzzy-trace theory (FTT), increases gist comprehension and knowledge about genetic testing for breast cancer risk, improving decision making. Design. In 2 experiments, 410 healthy undergraduate women were randomly assigned to 1 of 3 groups: an online module using a Web-based tutoring system (BRCA Gist) that uses artificial intelligence technology, a second group read highly similar content from the National Cancer Institute (NCI) Web site, and a third that completed an unrelated tutorial. Intervention. BRCA Gist applied FTT and was designed to help participants develop gist comprehension of topics relevant to decisions about BRCA genetic testing, including how breast cancer spreads, inherited genetic mutations, and base rates. Measures. We measured content knowledge, gist comprehension of decision-relevant information, interest in testing, and genetic risk and testing judgments. Results. Control knowledge scores ranged from 54% to 56%, NCI improved significantly to 65% and 70%, and BRCA Gist improved significantly more to 75% and 77%, P < 0.0001. BRCA Gist scored higher on gist comprehension than NCI and control, P < 0.0001. Control genetic risk-assessment mean was 48% correct; BRCA Gist (61%) and NCI (56%) were significantly higher, P < 0.0001. BRCA Gist participants recommended less testing for women without risk factors (not good candidates; 24% and 19%) than controls (50%, both experiments) and NCI (32%), experiment 2, P < 0.0001. BRCA Gist testing interest was lower than in controls, P < 0.0001. Limitations. BRCA Gist has not been tested with older women from diverse groups. Conclusions. Intelligent tutors, such as BRCA Gist, are scalable, cost-effective ways of helping people understand complex issues, improving decision making.
Background. Linking patient-reported back pain outcomes with health utility measures is valuable for informing economic evaluations. Methods. We used the Back pain Outcomes using Longitudinal Data (BOLD) registry to assess back pain and quality-of-life measures. The BOLD registry includes participants ≥65 years from 3 health systems. We used multiple baseline outcome measures: Roland-Morris Disability Questionnaire (RMDQ), Euroqol-5D (EQ-5D), and back and leg pain numerical rating scales (NRS). To develop and validate a model, we used a standard split-sample method and a novel multisite validation approach. We applied linear regression to map RMDQ to EQ-5D, adjusting for age, sex, pain numerical rating scores, and nonlinear transformations of outcome measures. We computed R2, root mean squared error, and mean absolute error (MAE) for purposes of model selection. The final model included EQ-5D as the dependent variable with independent variables of age, RMDQ, and back NRS. We used this model to predict EQ-5D scores in validation samples. Results. In total, 5224 participants had both baseline RMDQ and EQ-5D. Mean age was 74 years (65% female). Negative correlations (–0.72) were observed at baseline for RMDQ and EQ-5D. The selected model from all developmental samples had R2 >0.41 and MAE < 0.119. Validation analyses indicated no differences in estimated v. observed mean EQ-5D scores in the split sample. Validation using the multisite validation approach identified prediction error variability, MAE of 0.081 to 0.119, when predicting EQ-5D. Limitations. The statistical relationship may not generalize well to all study populations as we demonstrated in our multisite analysis. Conclusions. An empirical algorithm predicting EQ-5D weights from RMDQ scores provides a currently unavailable link for conducting economic evaluations in low back pain studies.
Background. Rapid tests for malaria are being distributed through vendors to individual patients, presenting the dilemma of determining how individuals are incentivized to pursue testing for malaria, versus the traditional approach of presumptively treating fevers with antimalarial drugs. Methods and Findings. We incorporated testing and treatment data from 6 African countries into a dynamic model of malaria transmission and nonmalarial causes of fever to investigate how variations in the epidemiologic risk of malaria and the prices of rapid diagnostic tests (RDTs) and treatments affect testing and treatment choices from the perspective of febrile patients, public health officials, and drug shop owners. In environments falling below a critical threshold infection rate (entomological inoculation rate) of 282 for patients older than 5 years (95% confidence interval [CI]: 275–289) or 300 for 0- to 5-year-olds (95% CI: 203–307), testing was more beneficial than presumptive therapy in terms of health and financial costs to patients. Infection and cost conditions generally aligned the best patient-level strategy with the best public health strategy to minimize an overall population’s morbidity and mortality from both malaria and nonmalarial causes of fever. However, the infection and cost conditions of very high malaria transmission settings did not align patient interests or public health interests with the interests of private drug shop owners. In such settings, a further lowering of testing prices may realign the interests of all 3 parties. Conclusions. A threshold transmission rate exists under which malaria testing confers more health and financial benefits to patients than presumptive treatment. Studying local transmission rates and testing and treatment costs may facilitate an approach to align the interests of individual patients, public health officials, and distributors of tests and therapies.
Background. Treatment benefits and harms are often communicated as relative risk reductions and increases, which are frequently misunderstood by doctors and patients. One suggestion for improving understanding of such risk information is to also communicate the baseline risk. We investigated 1) whether the presentation format of the baseline risk influences understanding of relative risk changes and 2) the mediating role of people’s numeracy skills. Method. We presented laypeople (N = 1234) with a hypothetical scenario about a treatment that decreased (Experiments 1a, 2a) or increased (Experiments 1b, 2b) the risk of heart disease. Baseline risk was provided as a percentage or a frequency. In a forced-choice paradigm, the participants’ task was to judge the risk in the treatment group given the relative risk reduction (or increase) and the baseline risk. Numeracy was assessed using the Lipkus 11-item scale. Results. Communicating baseline risk in a frequency format facilitated correct understanding of a treatment’s benefits and harms, whereas a percentage format often impeded understanding. For example, many participants misinterpreted a relative risk reduction as referring to an absolute risk reduction. Participants with higher numeracy generally performed better than those with lower numeracy, but all participants benefitted from a frequency format. Limitations are that we used a hypothetical medical scenario and a nonrepresentative sample. Conclusions. Presenting baseline risk in a frequency format improves understanding of relative risk information, whereas a percentage format is likely to lead to misunderstandings. People’s numeracy skills play an important role in correctly understanding medical information. Overall, communicating treatment benefits and harms in the form of relative risk changes remains problematic, even when the baseline risk is explicitly provided.
Background. To identify best-fitting input sets using model calibration, individual calibration target fits are often combined into a single goodness-of-fit (GOF) measure using a set of weights. Decisions in the calibration process, such as which weights to use, influence which sets of model inputs are identified as best-fitting, potentially leading to different health economic conclusions. We present an alternative approach to identifying best-fitting input sets based on the concept of Pareto-optimality. A set of model inputs is on the Pareto frontier if no other input set simultaneously fits all calibration targets as well or better. Methods. We demonstrate the Pareto frontier approach in the calibration of 2 models: a simple, illustrative Markov model and a previously published cost-effectiveness model of transcatheter aortic valve replacement (TAVR). For each model, we compare the input sets on the Pareto frontier to an equal number of best-fitting input sets according to 2 possible weighted-sum GOF scoring systems, and we compare the health economic conclusions arising from these different definitions of best-fitting. Results. For the simple model, outcomes evaluated over the best-fitting input sets according to the 2 weighted-sum GOF schemes were virtually nonoverlapping on the cost-effectiveness plane and resulted in very different incremental cost-effectiveness ratios ($79,300 [95% CI 72,500–87,600] v. $139,700 [95% CI 79,900–182,800] per quality-adjusted life-year [QALY] gained). Input sets on the Pareto frontier spanned both regions ($79,000 [95% CI 64,900–156,200] per QALY gained). The TAVR model yielded similar results. Conclusions. Choices in generating a summary GOF score may result in different health economic conclusions. The Pareto frontier approach eliminates the need to make these choices by using an intuitive and transparent notion of optimality as the basis for identifying best-fitting input sets.
Objective. When waiting lists are used to ration treatments for nonemergency procedures, a prioritization rule is required to ensure that urgent patients are admitted first. This study investigates how the introduction of an explicit prioritization guideline affected the prioritization behavior of doctors, who previously had full discretion for assigning patients. Design. The analysis exploits the publication of recommended priority categories in public hospitals. Taking the recommendations as a reference, deviations from the recommended priority assignments by doctors before and after the guideline publication are assessed. Multinomial logit models are used to control for patient and hospital characteristics. Heterogeneity in the impact of the guideline across patient characteristics is explored through interaction terms. Setting. The state of New South Wales, Australia, between July 2004 and December 2010. Participants. Admissions via waiting lists in public hospitals (N = 753,010). Main Outcome Measure. Priority categories assigned by doctors. Results. The guideline increased the likelihood that doctors would actually assign a semi-urgent priority to admissions with a recommended priority of semi-urgent by 11.7 percentage points (P < 0.000) and would assign a nonurgent priority to admissions with a recommended priority of non-urgent by 13.1 percentage points (P < 0.000). In contrast, the guideline lowered the likelihood of an urgent priority being assigned to admissions with a recommended priority of urgent by 13.7 percentage points (P < 0.000). Priority assignments are affected by payment status; specifically, a higher priority is given to paying patients, and this preferential treatment is not diminished by the presence of the guideline. Conclusion. The presence of a simple clinical priority guideline at the procedural level has not produced systematic, clinically based prioritization behaviors among doctors. The New South Wales priority guideline has curtailed assignments to the highest priority. This result raises a question concerning the usefulness of such a guideline in improving timely and equitable access to health care.
Background: Preference for the status quo, or clinical inertia, is a barrier to implementing treat-to-target protocols in patients with chronic diseases such as rheumatoid arthritis (RA). The objectives of this study were to examine the influence of subjective numeracy on RA-patient preference for the status quo and to determine whether age modifies this relationship. Methods: RA patients participated in a single face-to-face interview. Numeracy was measured using the Subjective Numeracy Scale. Treatment preference was measured using Adaptive Conjoint Analysis. Results: Of 205 eligible subjects, 156 agreed to participate. Higher subjective numeracy was associated with lower preference for the status quo in a regression model including race, employment, and use of biologics (adjusted odds ratio [95% confidence interval] = 0.71 [0.52–0.95], P = 0.02). Higher subjective numeracy was protective against status quo preferences among subjects younger than 65 years (adjusted odds ratio [95% confidence interval] = 0.64 (0.43–0.94), P = 0.02) but not among older subjects. Conclusions: Subjective numeracy is independently associated with younger, but not older, RA patients’ preferences for the status quo. Our results add to the literature demonstrating age and numeracy differences in treatment preferences and medical decision-making processes.
Background. Few studies have compared multiple health-related quality-of-life (HRQOL) instruments simultaneously for pediatric populations. This study aimed to test psychometric properties of 4 legacy pediatric HRQOL instruments: the Child Health and Illness Profile (CHIP), the KIDSCREEN-52, the KINDL, and the Pediatric Quality of Life Inventory (PedsQL). Methods. This study used data from 908 parents whose children (ages 2–19 years) were enrolled in Florida Medicaid. Parents were asked via telephone interview to complete each instrument appropriate to the age of their children. Structural, convergent/discriminant, and known-group validities were investigated. We examined structural validity using confirmatory factor analyses. We examined convergent/discriminant validity by comparing Spearman rank correlation coefficients of homogeneous (physical functioning and physical well-being) versus heterogeneous (physical and psychological functioning) domains of the instruments. We assessed known-groups validity by examining the extent to which HRQOL differed by the status of children with special health needs (CSHCN). Results. Domain scores of the 4 instruments were not normally distributed, and ceiling effects were significant in most domains. The KIDSCREEN-52 demonstrates the best structural validity, followed by the CHIP, KINDL, and PedsQL. The PedsQL and the KIDSCREEN-52 show better convergent/discriminant validity than the other instruments. Known-groups validity in discriminating CSHCN versus no needs was the best for the PedsQL, followed by the KIDSCREEN-52, the CHIP, and the KINDL. Conclusion. No one instrument was fully satisfactory in all psychometric properties. Strategies are recommended for future comparison of item content and measurement properties across different HRQOL instruments for research and clinical use.
We describe a balance beam aid for instruction in diagnosis (BBAID) and demonstrate its potential use in supplementing the training of medical students to diagnose acute chest pain. We suggest the BBAID helps students understand the process of diagnosis because the impact of tokens (weights and helium balloons) attached to a beam at different distances from the fulcrum is analogous to the impact of evidence to the relative support for 2 diseases. The BBAID presents a list of potential findings and allows students to specify whether each is present, absent, or unknown. It displays the likelihood ratios corresponding to a positive (LR+) or negative (LR–) observation for each symptom, for any pair of diseases. For each specified finding, a token is placed on the beam at a location whose distance from the fulcrum is proportional to the finding’s log(LR): a downward force (a weight) if the finding is present and a lifting force (a balloon) if it is absent. Combining the physical torques of multiple tokens is mathematically identical to applying Bayes’ theorem to multiple independent findings, so the balance beam is a high-fidelity metaphor. Seven first-year medical students and 3 faculty members consulted the BBAID while diagnosing brief patient case vignettes. Student comments indicated the program is usable, helpful for understanding pertinent positive and negative findings’ usefulness in particular situations, and welcome as a reference or self-test. All students attended the effect of the tokens on the beam, although some stated they did not use the numerical statistics. Faculty noted the BBAID might be particularly helpful in reminding students of diseases that should not be missed and identifying pertinent findings to ask for.
Understanding the impact of clinical findings in discriminating between possible causes of a patient’s presentation is essential in clinical judgment. A balance beam is a natural physical analogue that can accurately represent the combination of several pieces of evidence with varying ability to discriminate between disease hypotheses. Calculation of Bayes’ theorem using log(posterior odds) as a function of log(prior odds) and the logarithms of the evidence’s likelihood ratios maps onto the physical forces affecting objects placed on a balance beam. We describe the rules governing the functioning of tokens representing clinical findings in the comparison of 2 competing diseases. The likelihood ratios corresponding to positive (LR+) or negative (LR–) observations for each symptom determine the lateral position at which the symptom’s token is placed on the beam, using a weight if the finding is present and a helium balloon if it is absent. We discuss how a balance beam could represent concepts of dynamic specificity (due to changes in competitor diseases’ probabilities) and dynamic sensitivity (due to class-conditional independence). Utility-based thresholds for acting on a diagnosis could be represented by moving the balance beam’s fulcrum. It is suggested that a balance beam can be a useful aid for students learning clinical diagnosis, allowing them to build on existing intuitive understanding to develop an appreciation of how evidence combines to influence degree of belief. The balance beam could also facilitate exploration of the potential impact of available questions or investigations.
The intervals between screens for the early detection of diseases such as breast and colon cancer suggested by screening guidelines are typically based on the average population risk of disease. With the emergence of ever more biomarkers for cancer risk prediction and the development of personalized medicine, there is a need for risk-specific screening intervals. The interval between successive screens should be shorter with increasing cancer risk. A risk-dependent optimal interval is ideally derived from a cost-effectiveness analysis using a validated simulation model. However, this is time-consuming and costly. We propose a simplified mathematical approach for the exploratory analysis of the implications of risk level on optimal screening interval. We develop a mathematical model of the optimal screening interval for breast cancer screening. We verified the results by programming the simplified model in the MISCAN-Breast microsimulation model and comparing the results. We validated the results by comparing them with the results of a full, published MISCAN-Breast cost-effectiveness model for a number of different risk levels. The results of both the verification and validation were satisfactory. We conclude that the mathematical approach can indicate the impact of disease risk on the optimal screening interval.
Background. Decision aids are now a well-established means of supporting patients in their medical decision making. The widespread use of decision aids invites questions about how their components contribute to patient decisions. Objective. The objective of this study was to measure the importance of second opinions, patient-specific outcome forecasts, and patient testimonials relative to patient clinical and socioeconomic factors and the primary physician recommendation on the decision to undergo full knee replacement surgery to treat knee osteoarthritis. Methods. Middle-aged and older members of the RAND American Life Panel (N = 1616) chose whether to recommend surgery as a treatment for each of 3 hypothetical patients (vignettes) presented in a video-enhanced internet survey. Vignettes randomly sampled levels of scenario attributes. Results. Second opinions, person-specific outcome forecasts, and 2 consistent patient testimonials strongly affected respondents’ decision making; a single testimonial, however, did not significantly affect decisions. Conclusions. Information provided in a decision aid, including person-specific outcome forecasts and testimonials, can affect patient choices. The strong effect of testimonials and respondents’ interest in reviewing them reinforces concerns about unwanted influence when testimonials are biased.
Objective. Health professionals must enable patients to make informed decisions about health care choices through unbiased presentation of all options. This study examined whether presenting the decision as "opportunity" rather than "choice" biased individuals’ preferences in the context of trial participation for cancer treatment. Methods. Self-selecting healthy women (N = 124) were randomly assigned to the following decision frames: opportunity to take part in the trial (opt-in), opportunity to be removed from the trial (opt-out), and choice to have standard treatment or take part in the trial (choice). The computer-based task required women to make a hypothetical choice about a real-world cancer treatment trial. The software presented the framed scenario, recorded initial preference, presented comprehensive and balanced information, traced participants’ use of information during decision making, and recorded final decision. A posttask paper questionnaire assessed perceived risk, attitudes, subjective norm, perceived behavioral control, and satisfaction with decision. Results. Framing influenced women’s immediate preferences. Opportunity frames, whether opt-in or opt-out, introduced a bias as they discouraged women from choosing standard treatment. Using the choice frame avoided this bias. The opt-out opportunity frame also affected women’s perceived social norm; women felt that others endorsed the trial option. The framing bias was not present once participants had had the opportunity to view detailed information on the options within a patient decision aid format. There were no group differences in information acquisition and final decisions. Sixteen percent changed their initial preference after receiving full information. Conclusions. A "choice" frame, where all treatment options are explicit, is less likely to bias preferences. Presentation of full information in parallel, option-by-attribute format is likely to "de-bias" the decision frame. Tailoring of information to initial preferences would be ill-advised as preferences may change following detailed information.
Background. Patient outcomes critically depend on accuracy of physicians’ judgment, yet little is known about individual differences in cognitive styles that underlie physicians’ judgments. The objective of this study was to assess physicians’ individual differences in cognitive styles relative to age, experience, and degree and type of training. Methods. Physicians at different levels of training and career completed a web-based survey of 6 scales measuring individual differences in cognitive styles (maximizing v. satisficing, analytical v. intuitive reasoning, need for cognition, intolerance toward ambiguity, objectivism, and cognitive reflection). We measured psychometric properties (Cronbach’s α) of scales; relationship of age, experience, degree, and type of training; responses to scales; and accuracy on conditional inference task. Results. The study included 165 trainees and 56 attending physicians (median age 31 years; range 25–69 years). All 6 constructs showed acceptable psychometric properties. Surprisingly, we found significant negative correlation between age and satisficing (r = –0.239; P = 0.017). Maximizing (willingness to engage in alternative search strategy) also decreased with age (r = –0.220; P = 0.047). Number of incorrect inferences negatively correlated with satisficing (r = –0.246; P = 0.014). Disposition to suppress intuitive responses was associated with correct responses on 3 of 4 inferential tasks. Trainees showed a tendency to engage in analytical thinking (r = 0.265; P = 0.025), while attendings displayed inclination toward intuitive-experiential thinking (r = 0.427; P = 0.046). However, trainees performed worse on conditional inference task. Conclusion. Physicians capable of suppressing an immediate intuitive response to questions and those scoring higher on rational thinking made fewer inferential mistakes. We found a negative correlation between age and maximizing: Physicians who were more advanced in their careers were less willing to spend time and effort in an exhaustive search for solutions. However, they appeared to have maintained their "mindware" for effective problem solving.
Background. Risk perceptions and worry are important determinants of health behavior. Despite extensive research on these constructs, it is unknown whether people’s self-reports of perceived risk and worry are biased by their concerns about being viewed negatively by others (social desirability). Methods. In this study, we examined whether reports of perceived risk and worry about cancer varied across survey modes differing in the salience of social desirability cues. We used data from the National Cancer Institute’s 2007 Health Information National Trends Survey, which assessed perceived cancer risk and worry in 1 of 2 survey modes: an interviewer-administered telephone survey (higher likelihood of socially desirable responding; n = 3678) and a self-administered mail survey (lower likelihood of socially desirable responding; n = 3445). Data were analyzed by regressing perceived risk and worry on survey mode and demographic factors. Results. Analyses showed no effect of survey mode on cancer risk perceptions (B = 0.02, P = 0.55, d = 0.02). However, cancer worry was significantly higher in the self-administered mode than in the interviewer-administered mode (B = 0.24, P < 0.001, d = 0.26). Education moderated this effect, with respondents lower in education exhibiting a stronger mode effect. When cancer worry was dichotomized, the odds of reporting cancer worry were approximately twice as high in the self-administered mode compared with the interviewer-administered mode (OR = 2.13, P < 0.001). Conclusions. These results bolster the veracity of self-reported cancer risk perceptions. They also suggest that interviewer-administered surveys may underestimate the frequency of cancer worry, particularly for samples lower in socioeconomic status. Studies are needed to test for this effect in clinical contexts.
Background. This review systematically appraises the quality of reporting of measures used in trials to evaluate the effectiveness of patient decision aids (PtDAs) and presents recommendations for minimum reporting standards. Methods. We reviewed measures of decision quality and decision process in 86 randomized controlled trials (RCTs) from the 2011 Cochrane Collaboration systematic review of PtDAs. Data on development of the measures, reliability, validity, responsiveness, precision, interpretability, feasibility, and acceptability were independently abstracted by 2 reviewers. Results. Information from 178 instances of use of measures was abstracted. Very few studies reported data on the performance of measures, with reliability (21%) and validity (16%) being the most common. Studies using new measures were less likely to include information about their psychometric performance. The review was limited to reporting of measures in studies included in the Cochrane review and did not consult prior publications. Conclusions. Very little is reported about the development or performance of measures used to evaluate the effectiveness of PtDAs in published trials. Minimum reporting standards are proposed to enable authors to prepare study reports, editors and reviewers to evaluate submitted papers, and readers to appraise published studies.
Background. Ethical, economic, political, and legitimacy arguments support the consideration of public preferences in health technology decision making. The objective was to assess public preferences for funding new health technologies and to compare a profile case best-worst scaling (BWS) and traditional discrete choice experiment (DCE) method. Methods. An online survey consisting of a DCE and BWS task was completed by 930 adults recruited via an Internet panel. Respondents traded between 7 technology attributes. Participation quotas broadly reflected the population of Queensland, Australia, by gender and age. Choice data were analyzed using a generalized multinomial logit model. Results. The findings from both the BWS and DCE were generally consistent in that respondents exhibited stronger preferences for technologies offering prevention or early diagnosis over other benefit types. Respondents also prioritized technologies that benefit younger people, larger numbers of people, those in rural areas, or indigenous Australians; that provide value for money; that have no available alternative; or that upgrade an existing technology. However, the relative preference weights and consequent preference orderings differed between the DCE and BWS models. Further, poor correlation between the DCE and BWS weights was observed. While only a minority of respondents reported difficulty completing either task (22.2% DCE, 31.9% BWS), the majority (72.6%) preferred the DCE over BWS task. Conclusions. This study provides reassurance that many criteria routinely used for technology decision making are considered to be relevant by the public. The findings clearly indicate the perceived importance of prevention and early diagnosis. The dissimilarity observed between DCE and profile case BWS weights is contrary to the findings of previous comparisons and raises uncertainty regarding the comparative merits of these stated preference methods in a priority-setting context.
Introduction. The quality of systematic reviews of health economic evaluations (SR-HE) is often limited because of methodological shortcomings. One reason for this poor quality is that there are no established standards for the preparation of SR-HE. The objective of this study is to compare existing methods and suggest best practices for the preparation of SR-HE. Methods. To identify the relevant methodological literature on SR-HE, a systematic literature search was performed in Embase, Medline, the National Health System Economic Evaluation Database, the Health Technology Assessment Database, and the Cochrane methodology register, and webpages of international health technology assessment agencies were searched. The study selection was performed independently by 2 reviewers. Data were extracted by one reviewer and verified by a second reviewer. On the basis of the overlaps in the recommendations for the methods of SR-HE in the included papers, suggestions for best practices for the preparation of SR-HE were developed. Results. Nineteen relevant publications were identified. The recommendations within them often differed. However, for most process steps there was some overlap between recommendations for the methods of preparation. The overlaps were taken as basis on which to develop suggestions for the following process steps of preparation: defining the research question, developing eligibility criteria, conducting a literature search, selecting studies, assessing the methodological study quality, assessing transferability, and synthesizing data. Discussion. The differences in the proposed recommendations are not always explainable by the focus on certain evaluation types, target audiences, or integration in the decision process. Currently, there seem to be no standard methods for the preparation of SR-HE. The suggestions presented here can contribute to the harmonization of methods for the preparation of SR-HE.
Background. A 2-stage randomized trial design, incorporating participant choice, provides unbiased estimates of the effects of the treatment or intervention (treatment effect), the difference between outcomes for participants who prefer one treatment compared with another (selection effect), and the interaction between participants’ preferences for treatment and the treatment actually received (preference effect). It is important to ensure that such trials are adequately powered to estimate these effects. Sample Size Formulas. This paper presents methods for determining the required sample sizes for estimating treatment, selection, and preference effects. We demonstrate the changes in sample size as various key parameters are changed. In general, approximately twice as many participants (in total) are needed to have equivalent power for detecting both treatment and selection/preference effects compared with a trial of the treatment effect alone. Primary Screening Example. We illustrate their application for the design of a primary screening trial comparing human papillomavirus DNA testing versus cervical screening (by Pap smear). Our example would require 520 participants to have 80% power to detect moderate-sized preference and selection effects and a small to moderate treatment effect. Conclusions. With the growing interest in understanding treatment choices and with the use of decision aids, well-designed and adequately powered 2-stage randomized trial designs offer the opportunity to determine the effects of participants’ preferences. Our sample size formulas will help future studies ensure that they have adequate power to detect selection and preference effects.
Background. Unipolar depression is a mental illness with a substantial health-related and economic burden. Health interventions for depression predominately focus on improving sufferers’ health-related quality of life (HRQoL). Utility is a measure of HRQoL that is required for use in model-based cost-utility analyses to assess the added value of health interventions. This review aimed to identify, summarize, and where feasible, synthesize published utilities for unipolar depression. Methods. A structured electronic search combining common terms for unipolar depression and utility was conducted in MEDLINE, EMBASE, and PsycINFO. Utility values identified were summarized, and the study designs were appraised in terms of the patient population and valuation method used to generate utilities. Random-effect meta-analyses were applied to pool mean utilities identified for 3 depressive health states (mild, moderate, and severe) elicited from direct and indirect valuation methods separately. Results. Thirty-five studies were identified that reported utilities for various levels of depression severity. The most commonly used direct valuation method for eliciting utilities was standard gamble (SG) (n = 5), and the most commonly used indirect valuation method was EQ-5D (n = 20). The pooled mean (standard deviation) utilities from studies using SG as a direct valuation method were mild = 0.69 (0.14), moderate = 0.52 (0.28), and severe = 0.27 (0.26). The pooled utilities from studies using EQ-5D as an indirect valuation method were mild = 0.56 (0.16), moderate = 0.45 (0.18), and severe = 0.25 (0.15). Conclusions. This systematic review is a useful resource for decision analysts who need health-related utility values to populate model-based cost-utility analyses of health interventions for the management of unipolar depression. Further research is necessary to understand whether direct or indirect valuation methods are the most robust sources for utilities in depression.
Objectives. The time tradeoff (TTO) is an important method to directly obtain health utilities. Challenges of the TTO are, among others, "nontraders" and illogical answers. In TTO interviews, these challenges are resolved by the interviewer. In web-based TTOs, training procedures and logical checks are used based on the views of the researchers. As web-based TTOs will be used more often in the future, we investigated how respondents arrive at their ratings to determine the help they require. Methods. In 2 earlier studies performed by this research group, respondents valued 6 EQ-5D states on a TTO. Respondents were asked to think out loud, and all interviews were audiotaped. A random selection of these interviews were transcribed and double-coded by two independent raters, using a priori and inductive coding until saturation was reached. Based on the retrieved mistakes and comments, a list of frequently asked questions (FAQ) was developed. Results. In total, 91 interviews were coded. In all, 85% made at least 1 mistake, 41% made a misreading/miscalculation, 70% misunderstood the tradeoff, 27% misunderstood the EQ-5D dimensions, 29% misunderstood the scenario, 45% made a comment about the TTO, and 43% expressed frustration. More misunderstandings were reported in the Peeters study, which was performed in a realistic setting, whereas the van Osch study was conducted in a more ideal setting. Misunderstandings of the scenario were mosly reported by patients. Conclusions. Almost all respondents need interviewer help. This may have implications for the validity of interviewer-based TTO elicitations when social acceptability bias is an issue or with explicit hypothesis and the interviewer is not blinded. The FAQ list can be used to standardize interviewer help or as a help function in a web-based TTO.
Background. Metaphors influence judgments and decisions in nonmedical contexts. Objective. First, to investigate whether describing the flu metaphorically increases an individual’s willingness and interest in getting a flu vaccination, and second, to explore possible mediators and moderators of the effect that metaphors might have on vaccination intentions. Materials and Methods. Three studies, each using a between-subjects manipulation in which the flu was described literally (as a virus) or metaphorically (as a beast, riot, army, or weed), were conducted. A total of 167 psychology undergraduates (study 1) and 300 and 301 online participants (studies 2 and 3, respectively) were included. Studies 1 through 3 examined vaccination behavioral intentions, absolute risk, comparative risk, perceived flu severity, and recent flu and flu vaccination experience. Studies 2 and 3 assessed vaccination e-mail reminder requests and global affect. Study 3 evaluated affective reactions, personal control, and understanding of the flu. Results. Describing the flu metaphorically increased individuals’ willingness to get vaccinated (studies 1–3), while the impact of metaphors on requests to receive an e-mail reminder to get vaccinated was unclear (studies 2 and 3). These results were moderated by vaccination frequency in study 2, such that the effects were found among individuals who occasionally receive flu vaccinations but not among individuals who never or always receive flu vaccinations. Metaphor use did not significantly impact any of the hypothesized mediators: perceived absolute risk, comparative risk, flu severity, affect, personal control, or understanding of the flu. Limitations include convenience samples and measuring behavioral intentions but not actual vaccination behavior. Conclusions. Describing the flu virus metaphorically in decision aids or information campaigns could be a simple, cost-effective way to increase vaccinations against the flu.
Background. Risk factors increase the incidence and severity of chronic disease. To examine future trends and develop policies addressing chronic diseases, it is important to capture the relationship between exposure and disease development, which is challenging given limited data. Objective. To develop parsimonious risk factor models embeddable in chronic disease models, which are useful when longitudinal data are unavailable. Design. The model structures encode relevant features of risk factors (e.g., time-varying, modifiable) and can be embedded in chronic disease models. Calibration captures time-varying exposures for the risk factor models using available cross-sectional data. We illustrate feasibility with the policy-relevant example of smoking in India. Methods. The model is calibrated to the prevalence of male smoking in 12 Indian regions estimated from the 2009–2010 Indian Global Adult Tobacco Survey. Nelder-Mead searches (250,000 starting locations) identify distributions of starting, quitting, and restarting rates that minimize the difference between modeled and observed age-specific prevalence. We compare modeled life expectancies to estimates in the absence of time-varying risk exposures and consider gains from hypothetical smoking cessation programs delivered for 1 to 30 years. Results. Calibration achieves concordance between modeled and observed outcomes. Probabilities of starting to smoke rise and fall with age, while quitting and restarting probabilities fall with age. Accounting for time-varying smoking exposures is important, as not doing so produces smaller estimates of life expectancy losses. Estimated impacts of smoking cessation programs delivered for different periods depend on the fact that people who have been induced to abstain from smoking longer are less likely to restart. Conclusions. The approach described is feasible for important risk factors for numerous chronic diseases. Incorporating exposure-change rates can improve modeled estimates of chronic disease outcomes and of the long-term effects of interventions targeting risk factors.
Background. Many medical decisions involve an implied choice between alternative survival curves, typically with differing quality of life. Common preference assessment methods neglect this structure, creating some risk of distortions. Methods. Survival curve quality-of-life assessments (SQLA) were developed from Gompertz survival curves fitting the general population’s survival. An algorithm was developed to generate relative discount rate-utility (DRU) functions from a standard survival curve and health state and an equally attractive alternative curve and state. A least means squared distance algorithm was developed to describe how nearly 3 or more DRU functions intersect. These techniques were implemented in a program called X-Trade and tested. Results. SQLA scenarios can portray realistic treatment choices. A side effect scenario portrays one prototypical choice, to extend life while experiencing some loss, such as an amputation. A risky treatment scenario portrays procedures with an initial mortality risk. A time trade scenario mimics conventional time tradeoffs. Each SQLA scenario yields DRU functions with distinctive shapes, such as sigmoid curves or vertical lines. One SQLA can imply a discount rate or utility if the other value is known and both values are temporally stable. Two SQLA exercises imply a unique discount rate and utility if the inferred DRU functions intersect. Three or more SQLA results can quantify uncertainty or inconsistency in discount rate and utility estimates. Pilot studies suggested that many subjects could learn to interpret survival curves and do SQLA. Limitations. SQLA confuse some people. Compared with SQLA, standard gambles quantify very low utilities more easily, and time tradeoffs are simpler for high utilities. When discount rates approach zero, time tradeoffs are as informative and easier to do than SQLA. Conclusions. SQLA may complement conventional utility assessment methods.
Background. Making informed decisions about cancer screening involves understanding the benefits and harms in conjunction with personal values. There is little research examining factors associated with informed decision making or participation in screening in the context of a decision aid trial. Objectives. To identify factors associated with informed choice and participation in fecal occult blood testing (FOBT) among lower education populations. Design. Randomized controlled trial of an FOBT decision aid conducted between July and November 2008. Setting. Socioeconomically disadvantaged areas in New South Wales, Australia. Participants. Included 572 adults aged 55 to 64 years with lower education. Measurements. Sociodemographic variables, perceived health literacy, and involvement preferences in decision making were examined to identify predictors of informed choice (knowledge, attitudes, and behavior). Results. Multivariate analysis identified independent predictors of making an informed choice as having higher education (relative risk [RR], 1.49; 95% confidence interval [CI], 1.13–1.95; P = 0.001), receiving the decision aid (RR, 2.88; 95% CI, 1.87–4.44; P < 0.001), and being male (RR, 1.48; 95% CI, 1.11–1.97; P = 0.009). Participants with no confidence in completing forms and poorer self-reported health were less likely to make an informed choice (RR, 0.74; 95% CI, 0.53–1.03; P = 0.05 and RR, 0.57; 95% CI, 0.36–0.89; P = 0.007, respectively). Independent predictors of completing the FOBT were positive screening attitudes, receiving the standard information, preference for making the decision alone, and knowing that screening may lead to false-positive/negative results. Limitations. We did not objectively measure health literacy. Conclusions. Participants with the lowest levels of education had greater difficulties making an informed choice about participation in bowel screening. Alternative methods are needed to support informed decision making among lower education populations.
Objectives. Mapping algorithms are being developed in increasing numbers to derive health utilities (HUs) from health-related quality-of-life (HRQOL) data. However, the variances of the mapping-derived HUs are observed to be smaller than those of the actual HUs. Methods. Two reasons are proposed: 1) the presence of important unmeasured predictors leading to a high degree of unexplained variance and 2) ignoring that the regression coefficients are random variables themselves. We derive 3 variance estimators of HUs to account for these causes: 1) R2-adjusted estimator, 2) parametric estimator, and 3) nonparametric estimator. We tested these estimators using a simulated dataset and a real dataset involving the EQ-5D-3L and University of Washington Quality of Life questionnaire for patients with head and neck cancers. Results. The R2-adjusted estimator can be used in ordinary least squares (OLS)–based mapping algorithms and requires only the R2 from the derivation study. The parametric estimator can be used in OLS-based mapping algorithms and requires the mean square error (MSE) and design matrix from the derivation study. The nonparametric estimator can be used in any mapping algorithm and requires leave-one-out cross-validation MSE from the derivation study. In the simulated dataset, all 3 estimators are within 1% of the variance of the actual HUs. In the real dataset, the unadjusted variance was 45% less than the actual variance, while all 3 estimators are within 10% of the actual variance. Conclusions. When conducting cost-utility analyses (CUA) based on mapping algorithms, the variances of derived HUs should be properly adjusted using one of the proposed methods so that the results of the CUAs will correctly characterize uncertainty.
Background. Current US colorectal cancer screening guidelines that call for shared decision making regarding the choice among several recommended screening options are difficult to implement. Multicriteria decision analysis (MCDA) is an established method well suited for supporting shared decision making. Our study goal was to determine whether a streamlined form of MCDA using rank-order–based judgments can accurately assess patients’ colorectal cancer screening priorities. Methods. We converted priorities for 4 decision criteria and 3 subcriteria regarding colorectal cancer screening obtained from 484 average-risk patients using the analytic hierarchy process (AHP) in a prior study into rank-order–based priorities using rank order centroids. We compared the 2 sets of priorities using Spearman rank correlation and nonparametric Bland–Altman limits of agreement analysis. We assessed the differential impact of using the rank-order–based versus the AHP-based priorities on the results of a full MCDA comparing 3 currently recommended colorectal cancer screening strategies. Generalizability of the results was assessed using Monte Carlo simulation. Results. Correlations between the 2 sets of priorities for the 7 criteria ranged from 0.55 to 0.92. The proportions of differences between rank-order–based and AHP-based priorities that were more than ±0.15 ranged from 1% to 16%. Differences in the full MCDA results were minimal, and the relative rankings of the 3 screening options were identical more than 88% of the time. The Monte Carlo simulation results were similar. Conclusions. Rank-order–based MCDA could be a simple, practical way to guide individual decisions and assess population decision priorities regarding colorectal cancer screening strategies. Additional research is warranted to further explore the use of these methods for promoting shared decision making.
Purpose. To assess the impact of illicit drug use and chronic hepatitis C virus (HCV) on health-related quality of life (HRQoL) in women with HIV or at risk for HIV infection. Methods. Cross-sectional analysis of data from the Women’s Interagency Health Study (WIHS) of women with HIV (n = 2508) and at high risk of HIV infection (n = 889) in the US. A Short-Form-6D (SF-6D) HRQoL measure derived from the Medical Outcomes Study–HIV (MOS-HIV) questionnaire, HIV infection status, CD4 cell count (a measure of immune status), antiretroviral treatment, current illicit drug use (heroin and/or cocaine), and HCV status were assessed at a recent study visit. We developed multivariate linear regression models adjusting for age, race/ethnicity, education, and testing for interactions. Results. HIV-infected women with ≤200 CD4 cells/µL had lower mean HRQoL scores (0.69) than either HIV-infected women with >200 CD4 cells/µL (0.78) or HIV-uninfected women (0.80) (P < 0.01). In multivariate analysis, illicit drug use, chronic HCV, and low CD4 count were independently associated with lower HRQoL. There was a differential effect of HCV and illicit drug use for HIV-infected women depending on CD4 cell count: HIV-infected women with >200 CD4 cells/µL had a significantly greater reduction in HRQoL associated with illicit drug use (–0.063) and chronic HCV (–0.036) than women with ≤200 CD4 cells/µL (–0.017, –0.005 respectively). Conclusions. Poorly controlled HIV, illicit drug use, and chronic HCV are associated with lower HRQoL. Illicit drug use and chronic HCV have greater HRQoL impacts for HIV-infected women with well-controlled HIV versus those with poorly controlled HIV, which may affect clinical and policy priorities.
Background. SF-6D utility weights are conventionally produced using a standard gamble (SG). SG-derived weights consistently demonstrate a floor effect not observed with other elicitation techniques. Recent advances in discrete choice methods have allowed estimation of utility weights. The objective was to produce Australian utility weights for the SF-6D and to explore the application of discrete choice experiment (DCE) methods in this context. We hypothesized that weights derived using this method would reflect the largely monotonic construction of the SF-6D. Methods. We designed an online DCE and administered it to an Australia-representative online panel (n = 1017). A range of specifications investigating nonlinear preferences with respect to additional life expectancy were estimated using a random-effects probit model. The preferred model was then used to estimate a preference index such that full health and death were valued at 1 and 0, respectively, to provide an algorithm for Australian cost-utility analyses. Results. Physical functioning, pain, mental health, and vitality were the largest drivers of utility weights. Combining levels to remove illogical orderings did not lead to a poorer model fit. Relative to international SG-derived weights, the range of utility weights was larger with 5% of health states valued below zero. Conclusions. DCEs can be used to investigate preferences for health profiles and to estimate utility weights for multi-attribute utility instruments. Australian cost-utility analyses can now use domestic SF-6D weights. The comparability of DCE results to those using other elicitation methods for estimating utility weights for quality-adjusted life-year calculations should be further investigated.
Background: Analysts frequently estimate health state utility values from other outcomes. Utility values like EQ-5D have characteristics that make standard statistical methods inappropriate. We have developed a bespoke, mixture model approach to directly estimate EQ-5D. An indirect method, "response mapping," first estimates the level on each of the 5 dimensions of the EQ-5D and then calculates the expected tariff score. These methods have never previously been compared. Methods: We use a large observational database from patients with rheumatoid arthritis (N = 100,398). Direct estimation of UK EQ-5D scores as a function of the Health Assessment Questionnaire (HAQ), pain, and age was performed with a limited dependent variable mixture model. Indirect modeling was undertaken with a set of generalized ordered probit models with expected tariff scores calculated mathematically. Linear regression was reported for comparison purposes. Impact on cost-effectiveness was demonstrated with an existing model. Results: The linear model fits poorly, particularly at the extremes of the distribution. The bespoke mixture model and the indirect approaches improve fit over the entire range of EQ-5D. Mean average error is 10% and 5% lower compared with the linear model, respectively. Root mean squared error is 3% and 2% lower. The mixture model demonstrates superior performance to the indirect method across almost the entire range of pain and HAQ. These lead to differences in cost-effectiveness of up to 20%. Conclusions: There are limited data from patients in the most severe HAQ health states. Modeling of EQ-5D from clinical measures is best performed directly using the bespoke mixture model. This substantially outperforms the indirect method in this example. Linear models are inappropriate, suffer from systematic bias, and generate values outside the feasible range.
Background. Although terminal cancer is a widely used term, its meaning varies, which may lead to different attitudes toward end-of-life issues. The study was conducted to investigate differences in the understanding of terminal cancer and determine the relationship between this understanding and attitudes toward end-of-life issues. Methods. A questionnaire survey was performed between 2008 and 2009. A total of 1242 cancer patients, 1289 family caregivers, 303 oncologists from 17 hospitals, and 1006 participants from the general population responded. Results. A "6-month life expectancy" was the most common understanding of terminal cancer (45.6%), followed by "treatment refractoriness" (21.1%), "metastatic/recurrent disease" (19.4%), "survival of a few days/weeks" (11.4%), and "locally advanced disease" (2.5%). The combined proportion of "treatment refractoriness" and "6-month life expectancy" differed significantly between oncologists and the other groups combined (76.0% v. 65.9%, P = 0.0003). Multivariate analyses showed that patients and caregivers who understood terminal cancer as "survival of a few days/weeks" showed more negative attitudes toward disclosure of terminal status compared with participants who chose "treatment refractoriness" (adjusted odds ratio [aOR] 0.42, 95% confidence interval [CI] 0.22–0.79 for patients; aOR 0.34, 95% CI 0.18–0.63 for caregivers). Caregivers who understood terminal cancer as "locally advanced" or "metastatic/recurrent disease" showed a significantly lower percentage of agreement with withdrawal of futile life-sustaining treatment compared with those who chose "treatment refractoriness" (aOR 0.19, 95% CI 0.07–0.54 for locally advanced; aOR 0.39, 95% CI 0.21–0.72 for metastatic/recurrent). Conclusions. The understanding of terminal cancer varied among the 4 participant groups. It was associated with different preferences regarding end-of-life issues. Standardization of these terms is needed to better understand end-of-life care.
Objective. The IPDAS Collaboration has developed a checklist and an instrument (IPDASi v3.0) to assess the quality of patient decision aids (PDAs) in terms of their development process and shared decision-making design components. Certification of PDAs is of growing interest in the US and elsewhere. We report a modified Delphi consensus process to agree on IPDASi (v3.0) items that should be considered as minimum standards for PDA certification, for inclusion in the refined IPDASi (v4.0). Methods. A 2-stage Delphi voting process considered the inclusion of IPDASi (v3.0) items as minimum standards. Item scores and qualitative comments were analyzed, followed by expert group discussion. Results. One hundred and one people voted in round 1; 87 in round 2. Forty-seven items were reduced to 44 items across 3 new categories: 1) qualifying criteria, which are required in order for an intervention to be considered a decision aid (6 items); 2) certification criteria, without which a decision aid is judged to have a high risk of harmful bias (10 items); and 3) quality criteria, believed to strengthen a decision aid but whose omission does not present a high risk of harmful bias (28 items). Conclusions. This study provides preliminary certification criteria for PDAs. Scoring and rating processes need to be tested and finalized. However, the process of appraising the quality of the clinical evidence reported by the PDA should be used to complement these criteria; the proposed standards are designed to rate the quality of the development process and shared decision-making design elements, not the quality of the PDA’s clinical content.
Background: The trend for terminally ill patients to receive much of their end-of-life care at home necessitates the design of services to facilitate this. Care at home also requires that informal care be provided by family members and friends. This study investigated informal carers’ preferences for support services to aid the development of end-of-life health care services. Methods: This cross-sectional study used 2 discrete choice experiments to ascertain the preferences of carers supporting patients with different levels of care need, determined by the assistance needed with personal care and labeled High Care (HC) and Low Care (LC). The sample included 168 informal carers of people receiving palliative care at home from 2 palliative care services in Sydney, Australia. Data were collected in face-to-face interviews; carers chose between 2 hypothetical plans of support services and their current services. Data were analyzed with generalized multinomial logit models that were used to calculate the impact of each attribute on the probability of a carer choosing a service plan. Results: Preferred support included nursing services; the probability of choosing a plan increased significantly if it included nurse home visits and phone advice (P < 0.001). HC carers also wanted doctor home visits, home respite, and help with personal care (P < 0.05), and LC carers wanted help with household tasks, transport, and a case coordinator (P < 0.001). On average, both groups of carers preferred their current services, but this varied with characteristics of the carer and the caregiving situation. Conclusions:The most valued services are those that support carers in their caregiving role; however, supportive care preferences vary with the different circumstances of patients and carers.
Background. Profiling is increasingly being used to generate input for improvement efforts in health care. For these efforts to be successful, profiles must reflect true provider performance, requiring an appropriate statistical model. Sophisticated models are available to account for the specific features of performance data, but they may be difficult to use and explain to providers. Objective. To assess the influence of the statistical model on the performance profiles of primary care providers. Data Source. Administrative data (2006–2008) on 2.8 million members of a Dutch health insurer who were registered with 1 of 4396 general practitioners. Methods. Profiles are constructed for 6 quality measures and 5 resource use measures, controlling for differences in case mix. Models include ordinary least squares, generalized linear models, and multilevel models. Separately for each model, providers are ranked on z scores and classified as outlier if belonging to the 10% with the worst or best performance. The impact of the model is evaluated using the weighted kappa for rankings overall, percentage agreement on outlier designation, and changes in rankings over time. Results. Agreement among models was relatively high overall (kappa typically >0.85). Agreement on outlier designation was more variable and often below 80%, especially for high outliers. Rankings were more similar for processes than for outcomes and expenses. Agreement among annual rankings per model was low for all models. Conclusions. Differences among models were relatively small, but the choice of statistical model did affect the rankings. In addition, most measures appear to be driven largely by chance, regardless of the model that is used. Profilers should pay careful attention to the choice of both the statistical model and the performance measures.
Background. Key to conducting active drug safety surveillance using longitudinal health care data is determining whether and when there is sufficient evidence to raise a safety alert. We propose to quantify the expected value of the information (VOI) to be gained through continued monitoring in terms of its potential to reduce health losses among future patients and weigh this against the health cost of exposing current patients during continued monitoring. Objective. To apply this sequential VOI approach to monitoring the comparative safety of prasugrel v. clopidogrel on gastrointestinal (GI) bleeding. Methods. We calculated expected health losses assuming expected mortality, nonfatal myocardial infarction (MI), and nonfatal stroke on clopidogrel were 1.27, 5.93, and 1.14 per 100 person-years, using historical data; relative rates on prasugrel were 0.95, 0.76, and 1.02 based on trial data; and MI, stroke, and GI bleed were 9%, 25%, and 0.1% as bad as death, respectively. We assigned gamma prior distributions to the rates of bleeding on clopidogrel and prasugrel to capture baseline uncertainty; in Monte Carlo simulations, prasugrel’s efficacy parameters were sampled from distributions. Results. Treating all patients with prasugrel minimized expected health losses, resulting in 475.3 death-equivalents over 25,000 person-years of treatment. Monitoring increased expected losses by 5, and treating all patients with clopidogrel increased losses by 46.4. In Monte Carlo simulation, monitoring on average increased expected losses by 4.6, but a reduction in losses from monitoring was supported within the bounds of uncertainty (95% confidence interval, –0.6 to 11.1). Limitations. Patient heterogeneity and the possibility of updating efficacy parameters during monitoring were not incorporated. Conclusion. The proposed approach integrates expected health harms and benefits of continued monitoring in the decision to raise a safety alert.
Cost-effectiveness analysis has become a widely accepted tool for decision making in health care. The standard textbook cost-effectiveness analysis focuses on whether to make the switch from an old or common practice technology to an innovative technology, and in doing so, it takes a global perspective. In this article, we are interested in a local perspective, and we look at the questions of whether and when the switch from old to new should be made. A new approach to cost-effectiveness from a local (e.g., a hospital) perspective, by means of a mathematical model for cost-effectiveness that explicitly incorporates time, is proposed. A decision rule is derived for establishing whether a new technology should be adopted, as well as a general rule for establishing when it pays to postpone adoption by 1 more period, and a set of decision rules that can be used to determine the optimal timing of adoption. Finally, a simple example is presented to illustrate our model and how it leads to optimal decision making in a number of cases.
Background. Efforts to predict success in chronic disease management programs have been generally unsuccessful. Objective. To identify patient subgroups associated with success at each of 6 steps in a diabetes self-management (DSM) program. Design. Using data from a randomized trial, recursive partitioning with signal detection analysis was used to identify subgroups associated with 6 sequential steps of program success: agreement to participate, completion of baseline, initial website engagement, 4-month behavior change, later engagement, and longer-term maintenance. Setting. The study was conducted in 5 primary care clinics within Kaiser Permanente Colorado. Patients. Different numbers of patients participated in each step, including 2076, 544, 270, 219, 127, and 89. All measures available were used to address success at each step. Intervention. Participants were randomized to receive either enhanced usual care or 1 of 2 Internet-based DSM programs: 1) self-administered, computer-assisted self-management and 2) the self-administered program with the addition of enhanced social support. Measurements. Two sets of potential predictor variables and 6 dichotomous outcomes were created. Results. Signal detection analysis differentiated successful and unsuccessful subgroups at all but the final step. Different patient subgroups were associated with success at these different steps. Demographic factors (education, ethnicity, income) were associated with initial participation but not with later steps, and the converse was true of health behavior variables. Limitations. Analyses were limited to one setting, and the sample sizes for some of the steps were modest. Conclusions. Signal detection and recursive partitioning methods may be useful for identifying subgroups that are more or less successful at different steps of intervention and may aid in understanding variability in outcomes.
Background. Illness-related presenteeism (suboptimal work performance) may be a significant factor in worker productivity. Until now, there has been no generally accepted best method of measuring presenteeism across different industries and occupations. This study sought to validate the Health and Work Performance Questionnaire (HPQ)–based measure of presenteeism across occupations and industries and assess the most appropriate method for data analysis. Methods. Work performance was measured using the modified version of the HPQ conducted in workforce samples from the education and health workforce in Queensland, Australia (N = 30,870) during 2005 and 2006. Three approaches to data analysis of presenteeism measures were assessed using absolute performance, the ratio of own performance to others’ performance, and the difference between others’ and own performance. The best measure is judged by its sensitivity to changes in health indicators. Results. The measure that best correlated to health indicators was absolute presenteeism. For example, in the health sector, correlations between physical health status and absolute presenteeism were 4 to 5 times greater than the ratio or difference approaches, and in the education sector, these correlations were twice as large. Using this approach, the estimated cost of presenteeism in 2006 was $Aus8338 and $Aus8092 per worker per annum for the health and education sectors, respectively. Conclusions. The HPQ is a valid measure of presenteeism. Transforming responses by perceived performance of peers is unnecessary as absolute presenteeism correlated best with health indicators. Absolute presenteeism was more insightful for ascertaining the cost of presenteeism.
Purpose. Multiple imputation (MI) has been proposed for handling missing data in cost-effectiveness analyses (CEAs). In CEAs that use cluster randomized trials (CRTs), the imputation model, like the analysis model, should recognize the hierarchical structure of the data. This paper contrasts a multilevel MI approach that recognizes clustering, with single-level MI and complete case analysis (CCA) in CEAs that use CRTs. Methods. We consider a multilevel MI approach compatible with multilevel analytical models for CEAs that use CRTs. We took fully observed data from a CEA that evaluated an intervention to improve diagnosis of active labor in primiparous women using a CRT (2078 patients, 14 clusters). We generated scenarios with missing costs and outcomes that differed, for example, according to the proportion with missing data (10%–50%), the covariates that predicted missing data (individual, cluster-level), and the missingness mechanism: missing completely at random (MCAR), missing at random (MAR), or missing not at random (MNAR). We estimated incremental net benefits (INBs) for each approach and compared them with the estimates from the fully observed data, the "true" INBs. Results. When costs and outcomes were assumed to be MCAR, the INBs for each approach were similar to the true estimates. When data were MAR, the point estimates from the CCA differed from the true estimates. Multilevel MI provided point estimates and standard errors closer to the true values than did single-level MI across all settings, including those in which a high proportion of observations had cost and outcome data MAR and when data were MNAR. Conclusions. Multilevel MI accommodates the multilevel structure of the data in CEAs that use cluster trials and provides accurate cost-effectiveness estimates across the range of circumstances considered.
A recent publication includes a review of survival extrapolation methods used in technology appraisals of treatments for advanced cancers. The author of the article also noted shortcomings and inconsistencies in the analytical methods used in appraisals. He then proposed a survival model selection process algorithm to guide modelers’ choice of projective models for use in future appraisals. This article examines the proposed algorithm and highlights various shortcomings that involve questionable assumptions, including researchers’ access to patient-level data, the relevance of proportional hazards modeling, and the appropriateness of standard probability functions for characterizing risk, which may mislead practitioners into employing biased structures for projecting limited data in decision models. An alternative paradigm is outlined. This paradigm is based on the primacy of the experimental data and adherence to the scientific method through hypothesis formulation and validation. Drawing on extensive experience of survival modeling and extrapolation in the United Kingdom, practical advice is presented on issues of importance when using data from clinical trials terminated without complete follow-up as a basis for survival extrapolation.
Background. Participation in cancer clinical trials is low, particularly in racial and ethnic minorities in some cases, which has negative consequences for the generalizability for study findings. The objective of this study was to determine what factors are associated with patients’ participation or willingness to participate and whether these factors vary by race/ethnicity. Design or Methods. White, Hispanic, and black participants were obtained through the Florida cancer registry and who were diagnosed with breast, lung, colorectal, or prostate cancer (
Objective. To obtain estimates of direct health care costs for prostate cancer (PC) from diagnosis to death to inform state transition models. Methods. A stratified random sample of PC patients residing in 3 geographically diverse regions of Ontario, Canada, and diagnosed in 1993–1994, 1997–1998, and 2001–2002, was selected from the Ontario Cancer Registry. We retrieved patients’ pathology reports to identify referring physicians and contacted surviving patients and next of kin of deceased patients for informed consent. We reviewed clinic charts to obtain data required to allocate each patient’s observation time to 11 PC-specific health states. We linked these data to health care administrative databases to calculate resource use and costs (Canadian dollars, 2008) per health state. A multivariable mixed-effects model determined predictors of costs. Results. The final sample numbered 829 patients. In the regression model, total direct costs increased with age, comorbidity, and Gleason score (all P < 0.0001). Radical prostatectomy was the most costly primary treatment health state ($4676 per 100 days). Radical prostatectomy, hormone-refractory metastatic disease ($6398 per 100 days), and final (predeath) ($13,739 per 100 days) health states were significantly more costly (P < 0.05) than nontreated nonmetastatic PC ($3440 per 100 days), whereas the postprostatectomy ($732 per 100 days) and postradiation ($1556 per 100 days) states cost significantly less (P < 0.0001). Conclusions. This study used an innovative but labor-intensive approach linking chart and administrative data to estimate health care costs. Researchers should weigh the potential benefits of this method against what is involved in implementation. Modifications in methodology may achieve similar gains with less outlay in individual studies. However, we believe that this is a promising approach for researchers wishing to advance the quality of costing in state transition modeling.
Background. Measuring utilities and health-related quality of life (HRQL) in children is challenging due to their cognitive abilities and changing developmental stages. Purpose. To identify methodological issues on utility measurements in children, we performed a systematic review on utilities measured with a single instrument, the Health Utilities Index (HUI), in pediatric acute lymphoblastic leukemia (ALL). The secondary goal was to facilitate future cost-utility analyses without the need for time-consuming assessments. Data Sources. PubMed, Embase, Cochrane Library, CINAHL, and PsycINFO were searched from inception to June 2012. Studies had to report on utility scores in pediatric ALL, either on or after treatment, to be included. Results. Fifteen studies were included. Most studies had methodological shortcomings, which mainly concerned study design and definition and representativeness of the study group. Utility scores were dependent on treatment variables, and there generally was an improvement in HRQL as treatment or survivorship advanced. In general, proxy-respondents were less reliable for subjective phenomena than for observable conditions. HUI2 and HUI3 scores were not interchangeable. Limitations. Studies may have been missed because no validated search method for utility studies exists, due to language bias or the exclusion of non–peer-reviewed papers. Conclusions. Most studies in this review were methodologically suboptimal. Future developments should focus on including developmentally appropriate items for the whole pediatric age group. Adding disease-specific domains may enhance the sensitivity and responsiveness of instruments. Efforts should be undertaken to elicit valuation of health states from older children and teenagers as much as possible. For now, it remains difficult to make valid and informed decisions on the financing of interventions until health state valuation in children has become more methodologically robust.
Background. Percutaneous coronary intervention (PCI) with either drug-eluting stents (DES) or bare metal stents (BMS) reduces angina and repeat procedures compared with optimal medical therapy alone. It remains unclear if these benefits are sufficient to offset their increased costs and small increase in adverse events. Objective. Cost utility analysis of initial medical therapy v. PCI with either BMS or DES. Design. Markov cohort decision model. Data Sources. Propensity-matched observational data from Ontario, Canada, for baseline event rates. Effectiveness and utility data obtained from the published literature, with costs from the Ontario Case Costing Initiative. Target Population. Patients with stable coronary artery disease, confirmed after angiography, stratified by risk of restenosis based on diabetic status, lesion size, and lesion length. Time Horizon. Lifetime. Perspective. Ontario Ministry of Health and Long Term Care. Interventions. Optimal medical therapy, PCI with BMS or DES. Outcome Measures. Lifetime costs, quality-adjusted life years (QALYs), and the incremental cost-effectiveness ratio (ICER). Results of Base Case Analysis. In the overall population, medical therapy had the lowest lifetime costs at $22,952 v. $25,081 and $25,536 for BMS and DES, respectively. Medical therapy had a quality-adjusted life expectancy of 10.1 v. 10.26 QALYs for BMS, producing an ICER of $13,271/QALY. The DES strategy had a quality-adjusted life expectancy of only 10.20 QALYs and was dominated by the BMS strategy. This ranking was consistent in all groups stratified by restenosis risk, except diabetic patients with long lesions in small arteries, in whom DES was cost-effective compared with medical therapy (ICER of $18,826/QALY). Limitations. There is the possibility of residual unobserved confounding. Conclusions. In patients with stable coronary artery disease, an initial BMS strategy is cost-effective.
Decision-analytic models must often be informed using data that are only indirectly related to the main model parameters. The authors outline how to implement a Bayesian synthesis of diverse sources of evidence to calibrate the parameters of a complex model. A graphical model is built to represent how observed data are generated from statistical models with unknown parameters and how those parameters are related to quantities of interest for decision making. This forms the basis of an algorithm to estimate a posterior probability distribution, which represents the updated state of evidence for all unknowns given all data and prior beliefs. This process calibrates the quantities of interest against data and, at the same time, propagates all parameter uncertainties to the results used for decision making. To illustrate these methods, the authors demonstrate how a previously developed Markov model for the progression of human papillomavirus (HPV-16) infection was rebuilt in a Bayesian framework. Transition probabilities between states of disease severity are inferred indirectly from cross-sectional observations of prevalence of HPV-16 and HPV-16–related disease by age, cervical cancer incidence, and other published information. Previously, a discrete collection of plausible scenarios was identified but with no further indication of which of these are more plausible. Instead, the authors derive a Bayesian posterior distribution, in which scenarios are implicitly weighted according to how well they are supported by the data. In particular, we emphasize the appropriate choice of prior distributions and checking and comparison of fitted models.
Risk attitudes include risk aversion as well as higher-order risk preferences such as prudence and temperance. This article analyzes the effects of such preferences on medical test and treatment decisions, represented either by test and treatment thresholds or—when the test result is not given—by optimal cutoff values for diagnostic tests. For a risk-averse decision maker, effective treatment is a risk-reducing strategy since it prevents the low health outcome of forgoing treatment in the sick state. Compared with risk neutrality, risk aversion thus lowers both the test and the treatment threshold and decreases the optimal test cutoff value. Risk vulnerability, which combines risk aversion, prudence, and temperance, is relevant if there is a comorbidity risk: thresholds and optimal cutoff values decrease even more. Since common utility functions imply risk vulnerability, our findings suggest that diagnostics in low prevalence settings (e.g., screening) may be considered more beneficial when risk preferences are taken into account.
During the 20th century, deaths from a range of serious infectious diseases decreased dramatically due to the development of safe and effective vaccines. However, infant immunization coverage has increased only marginally since the 1960s, and many people remain susceptible to vaccine-preventable diseases. "Catch-up vaccination" for age groups beyond infancy can be an attractive and effective means of immunizing people who were missed earlier. However, as newborn vaccination rates increase, catch-up vaccination becomes less attractive: the number of susceptible people decreases, so the cost to find and vaccinate each unvaccinated person may increase; in addition, the number of infected individuals decreases, so each unvaccinated person faces a lower risk of infection. This article presents a general framework for determining the optimal time to discontinue a catch-up vaccination program. We use a cost-effectiveness framework: we consider the cost per quality-adjusted life year gained of catch-up vaccination efforts as a function of newborn immunization rates over time and consequent disease prevalence and incidence. We illustrate our results with the example of hepatitis B catch-up vaccination in China. We contrast results from a dynamic modeling approach with an approach that ignores the impact of vaccination on future disease incidence. The latter approach is likely to be simpler for decision makers to understand and implement because of lower data requirements.
Background and Objective. Existing research concludes that measures of general numeracy can be used to predict individuals’ ability to assess health risks. We posit that the domain in which questions are posed affects the ability to perform mathematical tasks, raising the possibility of a separate construct of "health numeracy" that is distinct from general numeracy. The objective was to determine whether older adults’ ability to perform simple math depends on domain. Methods. Community-based participants completed 4 math questions posed in 3 different domains: a health domain, a financial domain, and a pure math domain. Participants were 962 individuals aged 55 and older, representative of the community-dwelling US population over age 54. Results. We found that respondents performed significantly worse when questions were posed in the health domain (54% correct) than in either the pure math domain (66% correct) or the financial domain (63% correct). Our experimental measure of numeracy consisted of only 4 questions, and it is possible that the apparent effect of domain is specific to the mathematical tasks that these questions require. Conclusions. These results suggest that health numeracy is strongly related to general numeracy but that the 2 constructs may not be the same. Further research is needed into how different aspects of general numeracy and health numeracy translate into actual medical decisions.
Background. Undescended testis (UDT) or cryptorchidism is the most common genital anomaly seen in boys and can be treated surgically by orchidopexy. The age at which orchidopexy should be performed is controversial for both congenital and acquired UDT. Methods. A decision analysis is performed in which all available knowledge is combined to assess the outcomes of orchidopexy at different ages. Results. Without surgery, unilateral congenital UDT and bilateral congenital UDT are associated with average losses in quality-adjusted life-years (QALYs) of 1.53 QALYs (3% discounting 0.66 QALYs) and 5.23 QALYs (1.91 QALYs), respectively. Surgery reduces this QALY loss to on average 0.84 QALYs (0.21 QALYs) for unilateral UDT and 1.66 QALYs (0.40 QALYs) for bilateral UDT. Surgery at detection will lead to the lowest QALY loss of 0.91 (0.34) and 1.73 (0.60) QALYs, respectively, for unilateral and bilateral acquired UDT compared with surgery during puberty and no surgery. No sensitivity analysis is able to change the preferences for these strategies. Conclusions. Based on our decision analytic model using societal valuations of health outcomes, surgery for unilateral UDT (both congenital and acquired) yielded the lowest loss in QALYs. Given the modest differences in outcomes, there is room for patient (or parent) preference with respect to the performance and timing of surgery in case of unilateral UDT. For bilateral UDT (both congenital and acquired), orchidopexy at any age provides considerable benefit, in particular through improved fertility. As there is no strong effect of timing, the age at which orchidopexy is performed should be discussed with the parents and the patient. More clinical evidence on issues related to timing may in the future modify these results and hence this advice.
Background. Some experts have proposed limiting the use of Supplemental Nutrition Assistance Program (SNAP) benefits, for calorie-dense foods or subsidizing SNAP purchases of healthier foods. Objective. To estimate health effects and cost-effectiveness of banning or taxing sugar-sweetened beverages (SSBs) or subsidizing fruits and vegetables purchased with SNAP. Design. Microsimulation. Data Sources. National Health and Nutrition Examination Survey, US Department of Agriculture Quarterly Food-at-Home Price Database, and SNAP program data. Target Population: US adults aged 25 to 64 y. Time Horizon. 10 y. Perspective. Governmental. Outcome Measures. Incremental costs, quality-adjusted life-years (QALYs), body mass index, Alternative Healthy Eating Index, Food Security Score, diabetes person-years, and deaths from myocardial infarctions (MIs) and strokes. Results of Base-Case Analysis. Banning SSB purchases using SNAP benefits would be expected to avert 510,000 diabetes person-years and 52,000 deaths from MIs and strokes over the next decade, with a savings of $2900 per QALY saved. A penny-per-ounce tax on SSBs purchased with SNAP dollars would produce higher cost savings due to tax revenues but avert fewer chronic disease deaths. However, some SNAP participants are likely to preferentially purchase SSBs through their disposable income, indirectly reducing their food security. A 30% produce subsidy would be expected to avert 39,000 diabetes person-years and 4600 cardiovascular deaths over 10 y without effects on food security. Results of Sensitivity Analysis. Results are sensitive to the intake elasticities of SSBs and produce. Limitations. Input data did not provide information on heterogeneity in response to price changes within the SNAP-using population. Conclusions. SNAP restrictions on SSBs could lower chronic disease mortality, but further testing should examine indirect effects on disposable income and food security. Subsidizing produce could confer fewer benefits or risks but at higher cost.
Background/Objective. After a curative treatment for cancer, patients enter into a posttherapeutic surveillance phase. This phase aims to detect relapses as soon as possible to improve the outcome. Mould and others predicted with a simple formula, using a parametric mixture cure model, how long early-stage breast cancer patients should be followed after treatment. However, patients in posttherapeutic surveillance phase are at risk of different events types with different responses according to their prognostic factors and different probabilities to be cured. This paper presents an adaptation of the method proposed by Mould and others, taking into account competing risks. Our loss function estimates, when follow-up is stopped at a given time, the proportion of patients who will fail after this time and who could have been treated successfully. Method. We use the direct approach for cumulative incidence modeling in the presence of competing risks with an improper Gompertz probability distribution as proposed by Jeong and Fine. Prognostic factors can be taken into account, leading to a proportional hazards model. In a second step, the estimates of the Gompertz model are combined with the probability for a patient to be treated successfully in case of relapse for each event type. The method is applied to 2 examples, a numeric fictive example and a real data set on soft tissue sarcoma. Results and Conclusion. The model presented is a good tool for decision making to determine the total length of posttherapeutic surveillance. It can be applied to all cancers regardless of the localizations.
Background/Objective. Modelers lack a tool to systematically and clearly present complex model results, including those from sensitivity analyses. The objective was to propose linear regression metamodeling as a tool to increase transparency of decision analytic models and better communicate their results. Methods. We used a simplified cancer cure model to demonstrate our approach. The model computed the lifetime cost and benefit of 3 treatment options for cancer patients. We simulated 10,000 cohorts in a probabilistic sensitivity analysis (PSA) and regressed the model outcomes on the standardized input parameter values in a set of regression analyses. We used the regression coefficients to describe measures of sensitivity analyses, including threshold and parameter sensitivity analyses. We also compared the results of the PSA to deterministic full-factorial and one-factor-at-a-time designs. Results. The regression intercept represented the estimated base-case outcome, and the other coefficients described the relative parameter uncertainty in the model. We defined simple relationships that compute the average and incremental net benefit of each intervention. Metamodeling produced outputs similar to traditional deterministic 1-way or 2-way sensitivity analyses but was more reliable since it used all parameter values. Conclusions. Linear regression metamodeling is a simple, yet powerful, tool that can assist modelers in communicating model characteristics and sensitivity analyses.
Background: The primary aim of this study is to understand more about the perceptual-cognitive mechanisms underpinning the expert advantage in electrocardiogram (ECG) interpretation. While research has examined visual search processes in other aspects of medical decision making (e.g., radiology), this is the first study to apply the paradigm to ECG interpretation. The secondary aim is to explore the role that clinical history plays in influencing visual search behavior and diagnostic decision making. While clinical history may aid diagnostic decision making, it may also bias the visual search process. Methods: Ten final-year medical students and 10 consultant emergency medics were presented with 16 ECG traces (8 with clinical history that was not manipulated independently of case) while wearing eye tracking equipment. The ECGs represented common abnormalities encountered in emergency departments and were among those taught to final-year medical students. Participants were asked to make a diagnosis on each presented trace and report their level of diagnostic confidence. Results: Experts made significantly faster, more accurate, and more confident diagnoses, and this advantage was underpinned by differences in visual search behavior. Specifically, experts were significantly quicker at locating the leads of critical importance. Contrary to our hypothesis, clinical history had no significant effect on the readers’ ability to detect the abnormality or make an accurate diagnosis. Conclusions: Accurate ECG interpretation appears dependent on the perceptual skill of pattern recognition and specifically the time to fixate the critical lead(s). Therefore, there is potential clinical utility in developing perceptual training programs to train novices to detect abnormalities more effectively.
There is growing interest in markers that can be used to identify which patients are most likely to benefit from a treatment. For example, the Gail breast cancer risk prediction model may be useful for identifying a subset of older women for whom the benefit of tamoxifen for breast cancer prevention is likely to outweigh the harm. Two general classes of approaches to evaluating treatment selection markers have been developed. The first uses data on a cohort of untreated subjects to develop a risk prediction model, such as the Gail model, which is used to identify a high-risk subset of subjects. This model is paired with a measure of treatment effect to assess the impact of identifying and treating the high-risk subset. The second approach uses data from a randomized trial to model the treatment effect on a composite outcome that includes all effects of treatment (positive and negative). The treatment effect model is used to identify a subset of subjects with positive treatment effects and to assess the impact of identifying and treating this subset. We describe a framework that includes both existing approaches as special cases. In doing so, we review the existing approaches, clarify their underlying assumptions, and facilitate the evaluation of markers under less restrictive assumptions.
Background and Objective. Adapting best evidence to the care of the individual patient has been characterized as "contextualizing care" or "patient-centered decision making" (PCDM). PCDM incorporates clinically relevant, patient-specific circumstances and behaviors, that is, the patient’s context, into formulating a contextually appropriate plan of care. The objective was to develop a method for analyzing physician-patient interactions to ascertain whether decision making is patient centered. Methods. Patients carried concealed audio recorders during encounters with their physicians. Recordings and medical records were reviewed for clues that contextual factors, such as an inability to pay for a medication or competing responsibilities, might undermine an otherwise appropriate care plan, rendering it ineffective. Iteratively, the team refined a coding process to achieve high interrater agreement in determining (a) whether the clinician explored the clues—termed "contextual red flags"—for possible underlying contextual factors affecting care, (b) whether the presence of contextual factors was confirmed and, if so, (c) whether they were addressed in the final care plan. Results. A medical record data extraction instrument was developed to identify contextual red flags such as missed appointments or loss of control of a treatable chronic condition which signal that contextual factors may be affecting care. Interrater agreement (Cohen’s kappa) for coding whether the clinician explored contextual red flags, whether a contextual factor was identified, and whether the factors were addressed in the care plan was 88% (0.76, P < 0.001), 94% (0.88, P < 0.001), and 85% (0.69, P < 0.001) respectively. Conclusions. PCDM can be assessed with high interrater agreement using a protocol that examines whether essential contextual information (when present) is addressed in the plan of care.
Background/Purpose. The benefits of prescribing cardiac rehabilitation (CR) for patients following heart surgery is well documented; however, physicians continue to underuse CR programs, and disparities in the referral of women are common. Previous research into the causes of these problems has relied on self-report methods, which presume that physicians have insight into their referral behavior and can describe it accurately. In contrast, the research presented here used clinical judgment analysis (CJA) to discover the tacit judgment and referral policies of individual physicians. The specific aims were to determine 1) what these policies were, 2) the degree of self-insight that individual physicians had into their own policies, 3) the amount of agreement among physicians, and 4) the extent to which judgments were related to attitudes toward CR. Methods. Thirty-six Canadian physicians made judgments and decisions regarding 32 hypothetical cardiac patients, each described on 5 characteristics (gender, age, type of cardiovascular procedure, presence/absence of musculoskeletal pain, and degree of motivation) and then completed the 19 items of the Attitude towards Cardiac Rehabilitation Referral scale. Results. Consistent with previous studies, there was wide variation among physicians in their tacit and stated judgment policies, and self-insight was modest. On the whole, physicians showed evidence of systematic gender bias as they judged women as less likely than men to benefit from CR. Insight data suggest that 1 in 3 physicians were unaware of their own bias. There was greater agreement among physicians in how they described their judgments (stated policies) than in how they actually made them (tacit policies). Correlations between attitude statements and CJA measures were modest. Conclusions. These findings offer some explanation for the slow progress of efforts to improve CR referrals and for gender disparities in referral rates.
Background and Objective: Asymptomatic stenosis of the carotid arteries is associated with stroke. Carotid revascularization can reduce the future risk of stroke but can also trigger an immediate stroke. The objective was to model the generic relationship between immediate risk, long-term benefit, and life expectancy for any one-time prophylactic treatment and then apply the model to the use of revascularization in the management of asymptomatic carotid disease. Methods: In the "payoff time" framework, the possibility of losing quality-adjusted life-years (QALYs) because of revascularization failure is conceptualized as an "investment" that is eventually recouped over time, on average. Using this framework, we developed simple mathematical forms that define relationships between the following: perioperative probability of stroke (P); annual stroke rate without revascularization (r0); annual stroke rate after revascularization, conditional on not having suffered perioperative stroke (r1); utility levels assigned to the asymptomatic state (ua) and stroke state (us); and mortality rates (). Results: In patients whose life expectancy is below a critical life expectancy
Background: We sought to determine the psychometric properties of SURE, a 4-item checklist designed to screen for clinically significant decisional conflict in clinical practice. Methods: This study was a secondary analysis of a clustered randomized trial assessing the effect of DECISION+2, a 2-hour online tutorial followed by a 2-hour interactive workshop on shared decision making, on decisions to use antibiotics for acute respiratory infections. Patients completed SURE and also the Decisional Conflict Scale (DCS), as the gold standard, after consultation. We evaluated internal consistency of SURE using the Kuder-Richardson 20 coefficient (KR-20). We compared DCS and SURE scores using the Spearman correlation coefficient. We assessed sensitivity and specificity of SURE scores (cut-off score ≤3 out of 4) by identifying patients with and without clinically significant decisional conflict (DCS score >37.5 on a scale of 0–100). Results: Of the 712 patients recruited during the trial, 654 completed both tools. SURE scores showed adequate internal consistency (KR-20 coefficient of 0.7). There was a significant correlation between DCS and SURE scores (Spearman’s = –0.45, P < 0.0001). The prevalence of clinically significant decisional conflict as estimated by the DCS was 5.2% (95% CI 3.7–7.3). Sensitivity and specificity of SURE ≤3 were 94.1% (95% CI 78.9–99.0) and 89.8% (95% CI 87.1–92.0), respectively. Conclusions: SURE shows adequate psychometric properties in a primary care population with a low prevalence of clinically significant decisional conflict. SURE has the potential to be a useful screening tool for practitioners, responding to the growing need for detecting clinically significant decisional conflict in patients.
Background/Objective. Economic evaluations adopting a societal perspective need to include informal care whenever relevant. However, in practice, informal care is often neglected, because there are few validated instruments to measure and value informal care for inclusion in economic evaluations. The CarerQol, which is such an instrument, measures the impact of informal care on 7 important burden dimensions (CarerQol-7D) and values this in terms of general quality of life (CarerQol-VAS). The objective of the study was to calculate utility scores based on relative utility weights for the CarerQol-7D. These tariffs will facilitate inclusion of informal care in economic evaluations. Methods. The CarerQol-7D tariff was derived with a discrete choice experiment conducted as an Internet survey among the general adult population in the Netherlands (N = 992). The choice set contained 2 unlabeled alternatives described in terms of the 7 CarerQol-7D dimensions (level range: "no,""some," and "a lot"). An efficient experimental design with priors obtained from a pilot study (N = 104) was used. Data were analyzed with a panel mixed multinomial parameter model including main and interaction effects of the attributes. Results. The utility attached to informal care situations was significantly higher when this situation was more attractive in terms of fewer problems and more fulfillment or support. The interaction term between the CarerQol-7D dimensions physical health and mental health problems also significantly explained this utility. The tariff was constructed by adding up the relative utility weights per category of all CarerQol-7D dimensions and the interaction term. Conclusions. We obtained a tariff providing standard utility scores for caring situations described with the CarerQol-7D. This facilitates the inclusion of informal care in economic evaluations.
Background. To achieve fair-coverage decision making, both material criteria and criteria of procedural justice have been proposed. The relationship between these is still unclear. Objective. To analyze hypotheses underlying the assumption that more assessment, transparency, and participation have a positive impact on the reasonableness of coverage decisions. Methods. We developed a structural equation model in which the process components were considered latent constructs and operationalized by a set of observable indicators. The dependent variable "reasonableness" was defined by the relevance of clinical, economic, and other ethical criteria in technology appraisal (as opposed to appraisal based on stakeholder lobbying). We conducted an Internet survey among conference participants familiar with coverage decisions of third-party payers in industrialized countries between 2006 and 2011. Partial least squares path modeling (PLS-PM) was used, which allows analyzing small sample sizes without distributional assumptions. Data on 97 coverage decisions from 15 countries and 40 experts were used for model estimation. Results. Stakeholder participation (regression coefficient [RC] =0.289; P = 0.005) and scientific rigor of assessment (RC = 0.485; P < 0.001) had a significant influence on the construct of reasonableness. The path from transparency to reasonableness was not significant (RC = 0.289; P = 0.358). For the reasonableness construct, a considerable share of the variance was explained (R2 = 0.44). Biases from missing data and nesting effects were assessed through sensitivity analyses. Limitations. The results are limited by a small sample size and the overrepresentation of some decision makers. Conclusions. Rigorous assessment and intense stakeholder participation appeared effective in promoting reasonable decision making, whereas the influence of transparency was not significant. A sound evidence base seems most important as the degree of scientific rigor of assessment had the strongest effect.
Background: During the 2009 outbreak of novel influenza AH1N1, insufficient data were available to adequately inform decision makers about benefits and risks of vaccination and disease. We hypothesized that individuals would opt to mimic their peers, having no better decision anchor. We used Game Theory, decision analysis, and transmission models to simulate the impact of subjective risks and preference estimates on vaccination behavior. Methods: We asked 95 students to provide estimates of risk and health state valuations with regard to AH1N1 infection, complications, and expectations of vaccine benefits and risks. These estimates were included in a sequential chain of models: a dynamic epidemic model, a decision tree, and a population-level model. Additionally, participants’ intentions to vaccinate or not at varying vaccination rates were documented. Results: The model showed that at low vaccination rates, vaccination dominated. When vaccination rates increased above 78%, nonvaccination was the dominant strategy. We found that vaccination intentions did not correspond to the shift in strategy dominance and segregated to 3 types intentions: regardless of what others do 29/95 (31%) intended to vaccinate while 27/95 (28%) did not among 39 of 95 (41%) intention was positively associated with putative vaccination rates. Conclusions: Some people conform to the majority’s choice, either shifting epidemic dynamics toward herd immunity or, conversely, limiting societal goals. Policy leaders should use models carefully, noting their limitations and theoretical assumptions. Behavior drivers were not explicitly explored in this study, and the discrepant results beg further investigation. Models including real subjective perceptions with empiric or subjective probabilities can provide insight into deviations from expected rational behavior and suggest interventions in order to provide better population outcomes.
Background: In economic evaluations, participants have to report their health care utilization continuously during follow-up. To unburden participants, researchers often collect data intermittently (i.e., in at least 3 months a year). However, comparability of intermittent v. continuous data collection is unknown. Therefore, this study aimed to compare costs estimated with intermittent data collection of health care utilization with those based on continuous data collection. Methods: We used continuous health care utilization data from a trial with 12 months of follow-up and simulated several intermittent data collection patterns. Then 3 imputation techniques—individual mean (IM), last observation carried forward (LOCF) and next observation carried backward (NOCB)—were used to estimate total annual costs. Estimated annual costs were compared with observed annual costs from continuous data collection both in the original sample and in 1000 bootstrap samples. Results: Analyses showed that intermittent data collection using cost diaries may offer good estimates of the actual total annual health expenditures. However, estimations of groups of costs differ between data collection patterns and imputation methods. The best estimations of annual total costs and groups of costs were obtained by random cohort data collection, using 3 random cohorts, ensuring that at least a third of the participants were measuring costs each month, combined with IM imputation. Intermittent data collection of health expenditures carries a small risk of missing infrequent expensive events. Conclusions: Continuous cost data collection remains the first choice. However, if intermittent measurement is chosen, we recommend calculating annual costs from intermittent data collection in random cohorts, combined with IM imputation. Key words: cost diary; health care utilization; continuous data collection; cost measurement; imputation.
Background. There has been a growing interest around the world in developing country-specific scoring algorithms for the EQ-5D. This study systematically reviews all existing EQ-5D valuation studies to highlight their strengths and limitations, explores heterogeneity in observed utilities using meta-regression, and proposes a methodological checklist for reporting EQ-5D valuation studies. Methods. We searched Medline, EMBASE, the National Health Service Economic Evaluation Database (NHS EED) via Wiley’s Cochrane Library, and Wiley’s Health Economic Evaluation Database from inception through November 2012, as well as bibliographies of key papers and the EuroQol Plenary Meeting Proceedings from 1991 to 2012 for English-language reports of EQ-5D valuation studies. Two reviewers independently screened the titles and abstracts for relevance. Three reviewers performed data extraction and compared the characteristics and scoring algorithms developed in the included valuation studies. Results. Of the 31 studies included in the review, 19 used the time trade-off (TTO) technique, 10 used the visual analogue scale (VAS) technique, and 2 used both TTO and VAS. Most studies included respondents from the general population selected by random or quota sampling and used face-to-face interviews or postal surveys. Studies valued between 7 and 198 total states, with 1–23 states valued per respondent. Different model specifications have been proposed for scoring. Some sample or demographic factors, including gender, education, percentage urban population, and national health care expenditure, were associated with differences in observed utilities for moderate or severe health states. Conclusions. EQ-5D valuation studies conducted to date have varied widely in their design and in the resulting scoring algorithms. Therefore, we propose the Checklist for Reporting Valuation Studies of the EQ-5D (CREATE) for those conducting valuation studies.
Background and Objective. The generic preference-based measures (GPBMs) of health have been widely used to obtain health utility scores for calculating quality-adjusted life-years (QALYs) for economic evaluations. It has been recognized that GPBMs may miss relevant or important dimensions of health for some specific medical conditions. The objective of this study is to explore the effect of extending the current EQ-5D descriptive system by adding a sleep dimension. Methods. A new instrument, EQ-5D+Sleep, is proposed by adding a sleep dimension to the EQ-5D. Based on an orthogonal design, 18 EQ-5D+Sleep states and EQ-5D states were selected and a valuation study was undertaken whereby 160 members of the generic public in South Yorkshire, UK, were interviewed using time tradeoff (TTO). Econometric models have been fitted to the data. Two null hypotheses were tested: 1) the coefficient for the sleep dimension is not significant; and 2) the inclusion of the sleep dimension has no impact on the way people value the original dimensions of EQ-5D. Results and Conclusions. The results support these two null hypotheses. There seems to be no benefit to adding a sleep dimension to the EQ-5D. Research is required to explore the method of adding dimensions to existing descriptive systems of health.
Background. Decision-analytic models are routinely used as a framework for cost-effectiveness analyses of health care services and technologies; however, these models mostly ignore resource constraints. In this study, we use a discrete-event simulation model to inform a cost-effectiveness analysis of alternative options for the organization and delivery of clinical services in the ophthalmology department of a public hospital. The model is novel, given that it represents both disease outcomes and resource constraints in a routine clinical setting. Methods. A 5-year discrete-event simulation model representing glaucoma patient services at the Royal Adelaide Hospital (RAH) was implemented and calibrated to patient-level data. The data were sourced from routinely collected waiting and appointment lists, patient record data, and the published literature. Patient-level costs and quality-adjusted life years were estimated for a range of alternative scenarios, including combinations of alternate follow-up times, booking cycles, and treatment pathways. Results. The model shows that a) extending booking cycle length from 4 to 6 months, b) extending follow-up visit times by 2 to 3 months, and c) using laser in preference to medication are more cost-effective than current practice at the RAH eye clinic. Conclusions. The current simulation model provides a useful tool for informing improvements in the organization and delivery of glaucoma services at a local level (e.g., within a hospital), on the basis of expected effects on costs and health outcomes while accounting for current capacity constraints. Our model may be adapted to represent glaucoma services at other hospitals, whereas the general modeling approach could be applied to many other clinical service areas.
Objective. To measure the cost of nonattendance ("no-shows") and benefit of overbooking and interventions to reduce no-shows for an outpatient endoscopy suite. Methods. We used a discrete-event simulation model to determine improved overbooking scheduling policies and examine the effect of no-shows on procedure utilization and expected net gain, defined as the difference in expected revenue based on Centers for Medicare & Medicaid Services reimbursement rates and variable costs based on the sum of patient waiting time and provider and staff overtime. No-show rates were estimated from historical attendance (18% on average, with a sensitivity range of 12%–24%). We then evaluated the effectiveness of scheduling additional patients and the effect of no-show reduction interventions on the expected net gain. Results. The base schedule booked 24 patients per day. The daily expected net gain with perfect attendance is $4433.32. The daily loss attributed to the base case no-show rate of 18% is $725.42 (16.4% of net gain), ranging from $472.14 to $1019.29 (10.7%–23.0% of net gain). Implementing no-show interventions reduced net loss by $166.61 to $463.09 (3.8%–10.5% of net gain). The overbooking policy of 9 additional patients per day resulted in no loss in expected net gain when compared with the reference scenario. Conclusions. No-shows can significantly decrease the expected net gain of outpatient procedure centers. Overbooking can help mitigate the impact of no-shows on a suite’s expected net gain and has a lower expected cost of implementation to the provider than intervention strategies.
Objective. Making surrogate decisions on behalf of incapacitated patients can raise difficult questions for relatives, physicians, and society. Previous research has focused on the accuracy of surrogate decisions (i.e., the proportion of correctly inferred preferences). Less attention has been paid to the procedural satisfaction that patients’ surrogates and patients attribute to specific approaches to making surrogate decisions. The objective was to investigate hypothetical patients’ and surrogates’ procedural satisfaction with specific approaches to making surrogate decisions and whether implementing these preferences would lead to tradeoffs between procedural satisfaction and accuracy. Methods. Study 1 investigated procedural satisfaction by assigning participants (618 in a mixed-age but relatively young online sample and 50 in an older offline sample) to the roles of hypothetical surrogates or patients. Study 2 (involving 64 real multigenerational families with a total of 253 participants) investigated accuracy using 24 medical scenarios. Results. Hypothetical patients and surrogates had closely aligned preferences: Procedural satisfaction was highest with a patient-designated surrogate, followed by shared surrogate decision-making approaches and legally assigned surrogates. These approaches did not differ substantially in accuracy. Limitations are that participants’ preferences regarding existing and novel approaches to making surrogate decisions can only be elicited under hypothetical conditions. Conclusions. Next to decision making by patient-designated surrogates, shared surrogate decision making is the preferred approach among patients and surrogates alike. This approach appears to impose no tradeoff between procedural satisfaction and accuracy. Therefore, shared decision making should be further studied in representative samples of the general population, and if people’s preferences prove to be robust, they deserve to be weighted more strongly in legal frameworks in addition to patient-designated surrogates.