5
Network Meta-Analysis Within Cost-Effectiveness Analysis

5.1 Introduction

This chapter looks at the questions that arise when a network meta-analysis is used to inform a cost-effectiveness analysis (CEA). This is a topic that has both technical and conceptual aspects.

Throughout the entire tradition of meta-analysis, going right back to the earliest papers, it has been seen as a way of pooling the relative treatment effects in trials. The focus on relative effects, it should be said, follows the huge body of classic work in epidemiological biostatistics on pooling measures of association in 2 × 2 tables (Mantel and Haenszel, 1959; Zelen, 1971; Robbins et al., 1986). The network meta-analysis models presented throughout this book belong firmly within this line of thought. Decision makers, however, have to attend more to the absolute differences between arms in events or outcomes of interest rather than to differences expressed with ratio measures like the odds ratio or relative risk. It is the absolute difference between treatments in the number of positive outcomes achieved, or negative outcomes avoided, that will determine the value of the treatment to patients and to society. This remains true regardless if the decision is based on an economic analysis or, for example, on numbers needed to treat.

We therefore distinguish between the model for relative treatment effects, as set out in Chapters 2 and 4, and what we refer to as the baseline model. Technically, this is a model for the effect of the reference treatment in the target population. The first issue addressed in this chapter is how this baseline model is informed and estimated and how the model for the relative treatment effects is integrated with the baseline model to generate absolute effects on every treatment. Sections 5.2 and 5.3 will look at alternative proposals about how information on absolute effects can be identified and estimated, because some methods may impact on the relative effect estimates.

The evaluation of treatments usually relies on randomised controlled trials, often with quite short-term outcomes. As a result, the baseline model must almost always be extended to represent longer-term outcomes, often ‘extrapolating’ trial outcomes to longer-term lifetime and health benefits. The embedding of the network meta-analysis model within a wider natural history model is therefore covered in Sections 5.4 and 5.5. Our focus here is on appropriate data sources for this extended baseline model and how the relative effect model generated by the network meta-analysis is combined with the extended baseline model to generate long-term outcome predictions on all treatments. For wider issues on CEA modelling methods, readers are referred to recent textbooks (Hunink et al., 2001; Briggs et al., 2006) and good-practice statements (Weinstein et al., 2003; Briggs et al., 2012; Eddy et al., 2012).

In Section 5.6 we illustrate the embedding of a network meta-analysis within a highly stylised CEA. This allows us to show how cost-effectiveness acceptability curves (CEACs) can be generated by adding simple WinBUGS code. It also presents us with an opportunity to discuss which outputs from a random effects model should be used in a CEA.

Finally, again, based on the assumption that investigators will wish to adopt probabilistic decision analytic methods, as recommended by the National Institute for Health and Care Excellence (NICE, 2013a) and leading texts (Briggs et al., 2006), Section 5.7 outlines alternative methods of statistical estimation that are suitable for embedding network meta-analysis in decision analytic models.

5.2 Sources of Evidence for Relative Treatment Effects and the Baseline Model

Most CEAs consist of two separate components: the baseline model that represents the absolute effect of the reference treatment, often placebo or standard care, and a model for relative treatment effects. Note that the baseline model referred to here is not a model on the control arms of each trial, even though we refer to these a ‘trial-specific baselines’. It is a model for the reference treatment in the meta-analysis. Both models are on the scale of the chosen linear predictor (see Chapter 4). The baseline model may be based on trial or cohort evidence, while the relative effect model is generally based on RCT data. The absolute effect under each treatment is then obtained by adding the relative treatment effects from the network meta-analysis to the absolute effect of the reference treatment from the baseline model. This addition takes place on the linear predictor scale, so the results must finally be converted back to the appropriate scale by inverting the link function (Table 4.1).

For example, if the baseline model is that the probability of an undesirable event under standard care is 0.25 and the odds ratio for a given treatment compared to standard care is 0.8 (favouring the treatment), then, ignoring the uncertainty in these quantities, the probability, p, of an event with the treatment can be obtained from

which readers will recognise as the model statement for a standard evidence synthesis based on binomial data from Chapter 2, except that the parameter μi for the absolute effect on the ‘control’ arm of trial i has been replaced by the effect given by the baseline model. This gives p = 0.21.

We begin by noting that the two questions ‘which is the most effective treatment?’ and ‘what happens under standard care?’ have to be considered separately. There are two reasons for this view. As we emphasised in the introduction of this chapter, the entire basis for meta-analysis is that it pools relative treatment effects, not absolute effects. This in turn is based on the assumption that, while the absolute efficacy of a treatment may vary with the trial population, the relative effect remains relatively stable. Note that the concept of a relative treatment effect implies further assumptions about the scale. For example, when we claim that the relative treatment effect is more stable, we need to state whether it is to be measured as a risk difference, a risk ratio, an odds ratio or a hazard ratio. A relative effect that is stable if measured as a relative risk may be highly variable if measured as a risk difference (Deeks, 2002; Caldwell et al., 2012) and the consistency assumptions made on one scale will not transfer to another (Norton et al., 2012; van Valkenhoef and Ades, 2013).

The second reason has to do with identifying appropriate sources of evidence. We know that the appropriate source of evidence for relative treatment effects is the ensemble of trials including comparisons of two or more of the treatments in the comparator set (Chapter 1). But the trial evidence may not constitute the best evidence about absolute outcomes on placebo or standard care or on any treatment. Investigators should identify evidence sources to inform the baseline model based on a protocol-driven systematic search (Sculpher et al., 2000; Weinstein et al., 2003; Petrou and Gray, 2011). In particular, the baseline model should be as specific as possible to the target population of interest (Briggs et al., 2006; National Institute for Health and Clinical Excellence, 2008b). As such, it may be that the best evidence is from a carefully selected subset of the trials or from relevant cohort studies, register studies (Golder et al., 2005) or expert opinion (Petrou and Gray, 2011).

If trials are to be used to inform the baseline model, investigators are not obliged to use the same set of trials that inform the relative treatment effects or even a subset of these trials. Nor are they restricted to studies with data on the treatment chosen as the reference treatment in the network meta-analysis. This is because, given a set of relative treatment effect estimates, a baseline model for the reference treatment can be constructed from a model on the absolute effects of any of the treatments in the network. This is illustrated in Section 5.3.

5.3 The Baseline Model

The baseline model for the reference treatment used to define the absolute efficacy will generally be the same GLM as the model for the treatment effect. Given a set of studies that represent the absolute effect on a particular treatment J, which is among those in the network informing the relative effects, a typical approach is to put a hierarchical model on the absolute effects of treatments. If the GLM for treatment effects is images, for k > 1, as defined in Chapter 4, one might model the absolute effects for treatment J using the same link function, as follows:

Absolute effects for any other treatment k, Tk, can then be reconstructed as

as in the relationship in equation (5.1).

This formulation calculates the absolute efficacy of treatment J at the mean of the distribution of effects in the selected data. But there is no reason why the effect seen in the future should be equated to the mean of what has been seen in the past. We recommend instead that investigators use the predictive distribution (see Chapter 3) where a new baseline effect, μJ. new, is sampled from the posterior distribution:

We emphasise again that there is nothing in equation (5.4) to suggest that the absolute effects should be informed by the same set of trials as the relative effects. It would, in fact, be a strange coincidence if the sources of evidence on the relative treatment effects were also the most appropriate to inform the absolute effects in the target population.

5.3.1 Estimating the Baseline Model in WinBUGS

We argue against using the same studies for relative effects and the baseline model on epidemiological grounds. However, in cases where it can be justified – perhaps in syntheses of new classes of treatments, whose position in the treatment pathway has yet to be decided – the precise estimation method requires little care. Use of a hierarchical baseline model as in equations (5.2) and (5.4) could introduce bias in the estimated relative treatment effects if they are estimated jointly, unless the baseline model is correct. Suppose, for example, that the condition of patients included in trials with mortality as the outcome was increasingly less severe over the time period of the data. The effect of equation (5.2) would be to shrink the absolute effect in recent trials towards higher mortality, thereby magnifying relative treatment effects. In earlier trials, the opposite effect would occur, and relative treatment effects would be underestimated.

It is precisely to avoid any such systematic bias that traditional meta-analysis has always conditioned out the effect on the control treatment by operating on treatment contrasts in the form of estimated log-odds ratios, (standardised) mean differences or other relative treatment effect measures. In our network meta-analysis models (Chapters 2 and 4) with arm-based likelihoods, we put vague and unrelated priors images on the trial-specific control arm effects, for precisely the same reason.

As implied by equations (5.2) and (5.4), the absolute outcomes may be informed by several studies, requiring their own synthesis on the scale of the linear predictor. In order to have the benefits of vague and unrelated priors on the study-specific effects on the control arms when estimating the relative treatment effect model, while still constructing a hierarchical baseline model, we run them as separate analyses. This can be done by running a single WinBUGS program that incorporates the network meta-analysis model and a baseline model. This program must, of course, include both a dataset for estimating relative effects and a dataset for estimating the baseline model, so that in some cases the same trial arms may occur twice in the data. Note that different variable names must be used in the two sections. An example of this separate estimation model is given in Ch5_Smoking_separatebaselineNMA.odc, which uses the Smoking Cessation data (Hasselblad, 1998).

The Smoking Cessation data consists of 24 trials on four treatments: no contact, self-help, individual counselling and group counselling (Table 5.1). The network plot is presented in Figure 5.1. No contact is taken as the reference treatment, and it is included in 19 trials. The code for the separate baseline model therefore loops through the ns=24 trials for the standard random effects model and then loops through the nsb=19 (number of studies with treatment 1 arms) from a separate dataset.

Table 5.1 Smoking Cessation Data (Lu and Ades 2006): events, r, are the number of individuals with successful smoking cessation at 6–12 months out of the total individuals randomised to each trial arm, n.

Reproduced with permission of Taylor & Francis.

Study ID Number of arms Treatment Arm 1 Arm 2 Arm 3
Arm 1 Arm 2 Arm 3
na[] t[,1] t[,2] t[,3] r[,1] n[,1] r[,2] n[,2] r[,3] n[,3]
1 3 1 3 4 9 140 23 140 10 138
2 3 2 3 4 11 78 12 85 29 170
3 2 1 3 NA 75 731 363 714 NA 1
4 2 1 3 NA 2 106 9 205 NA 1
5 2 1 3 NA 58 549 237 1561 NA 1
6 2 1 3 NA 0 33 9 48 NA 1
7 2 1 3 NA 3 100 31 98 NA 1
8 2 1 3 NA 1 31 26 95 NA 1
9 2 1 3 NA 6 39 17 77 NA 1
10 2 1 2 NA 79 702 77 694 NA 1
11 2 1 2 NA 18 671 21 535 NA 1
12 2 1 3 NA 64 642 107 761 NA 1
13 2 1 3 NA 5 62 8 90 NA 1
14 2 1 3 NA 20 234 34 237 NA 1
15 2 1 4 NA 0 20 9 20 NA 1
16 2 1 2 NA 8 116 19 149 NA 1
17 2 1 3 NA 95 1107 143 1031 NA 1
18 2 1 3 NA 15 187 36 504 NA 1
19 2 1 3 NA 78 584 73 675 NA 1
20 2 1 3 NA 69 1177 54 888 NA 1
21 2 2 3 NA 20 49 16 43 NA 1
22 2 2 4 NA 7 66 32 127 NA 1
23 2 3 4 NA 12 76 20 74 NA 1
24 2 3 4 NA 9 55 3 26 NA 1

Treatment codes are defined in Figure 5.1.

Image described by caption.

Figure 5.1 Smoking Cessation network: there are 22 two-arm and 2 three-arm trials. Each circle represents a treatment; connecting lines indicate pairs of treatments, which have been directly compared in randomised trials. The numbers on the lines indicate the numbers of trials making that comparison, and the numbers by the treatment names are the treatment codes used in the modelling. Line thickness is proportional to the number of trials making that comparison, and the width of the circles is proportional to the number of patients randomised to that treatment.

Although we do not recommend joint estimation of baseline and network meta-analysis parameters except in specific circumstances (see Chapter 6 for some examples), Table 5.2 compares the results from joint estimation and the recommended approach of separate estimation of baseline model and treatment effects. The joint estimation model (code previously published in Dias et al. (2011d, 2013c)) has a better fit, lower between-trial variation in both relative and absolute treatment effects and lower posterior standard deviation. Nevertheless, for the aforementioned reasons, we would contend that the simultaneous estimation method is subject to bias. In this dataset, the differences in the estimated treatment effects, in the order of 5–15%, are noticeable, but not large, which seems to be a fairly typical finding.

Table 5.2 Comparison of separate estimation of absolute and relative effects, the preferred approach, with joint estimation of absolute and relative effect.

Separate models Simultaneous modelling
Mean/median sd 95% CrI Mean/median sd 95% CrI
Baseline model parameters ‘no contact
m −2.59 0.16 (−2.94, −2.30) −2.49 0.13 (−2.75, −2.25)
σm 0.54 0.16 (0.32, 0.93) 0.45 0.11 (0.29, 0.71)
μnew −2.59 0.60 (−3.82, −1.41) −2.49 0.49 (−3.48, −1.52)
Relative treatment effects compared to ‘no contact
Self-help 0.49 0.40 (−0.29, 1.31) 0.53 0.33 (−0.11, 1.18)
Individual counselling 0.84 0.24 (0.39, 1.34) 0.78 0.19 (0.41, 1.17)
Group counselling 1.10 0.44 (0.26, 2.01) 1.05 0.34 (0.39, 1.72)
σ 0.82 0.19 (0.55, 1.27) 0.71 0.13 (0.51, 1.02)
Absolute probabilities of response based on the posterior distribution of the baseline probability
No contact 0.07 0.01 (0.05, 0.09) 0.08 0.01 (0.06, 0.10)
Self-help 0.12 0.05 (0.05, 0.23) 0.13 0.04 (0.07, 0.21)
Individual counselling 0.15 0.04 (0.09, 0.24) 0.15 0.03 (0.11, 0.21)
Group counselling 0.19 0.07 (0.08, 0.37) 0.20 0.05 (0.11, 0.31)
Model fit and DIC
Residual deviance 54.1 47.4
pD 45.0 40.1
DIC 99.1 87.5

Posterior summaries of the parameters of the baseline model and the relative treatment effects model.

5.3.2 Alternative Computation Methods for the Baseline Model

The previous section proposes two separate sections of code and data in the same WinBUGS run, one for the network meta-analysis model for relative effects and another for a separate baseline model. An alternative is to run the two models as separate programs (Dias et al., 2011d, 2013c). There are two options: one is to bring the separate WinBUGS MCMC outputs together later, for example, in a CEA programmed in R, at which point the absolute effects Tk for each treatment k are composed from its constituents as in equation (5.4). This can be particularly effective if different baseline models are to be used, for example, in sensitivity analyses. A second, perhaps even simpler, alternative is to take the posterior summary mean and standard deviation of the predictive effects of the reference treatment from the WinBUGS output and use it to identify the parameters of a normal distribution, with, say, mean MJ for reference treatment J and variance S2. This can then be plugged into the WinBUGS code for the network meta-analysis to generate the absolute outcomes for treatment k, Tk, as described in Chapters 2 and 4:

images

This involves a degree of approximation, as the posterior of μJ. new may not be exactly normally distributed, but the approximation is likely to be acceptable except in extreme cases. This is the approach we have adopted in the exercises at the end of this chapter. Using the Smoking Cessation dataset as an example, the program Ch5_Smoking_nocontactbaseline.odc estimates the predictive distribution of the log-odds of the ‘no contact’ treatment. The predictive mean is −2.589, corresponding to a 7.1% chance of smoking cessation, with a standard deviation of 0.6021. These figures can then be plugged into the programs for CEA (see Exercises 5.2 and 5.3).

Of course, investigators are not necessarily obliged to carry out any formal synthesis to inform the baseline model. Expert opinion, perhaps based on evidence from clinical studies, can be used to characterise a baseline event rate, using a beta distribution for probabilities or a gamma for hazard rates. These should then be put on the appropriate scale so that equation (5.3) can be used to obtain absolute effects for all treatments.

5.3.3 *Arm-Based Meta-Analytic Models

The arm-based meta-analytic models that have been proposed in a series of recent papers (Hedges, 1981; Carlin and Hong, 2014; Zhang et al., 2014; Hong et al., 2016) are a particular form of baseline model. These models, which must of course be distinguished from the arm-based likelihood approaches that form the backbone of this book (Chapters 2 and 4), provide a seemingly attractive solution that generates both relative and absolute treatment effect estimates from a single model. For binomial data, with Φ() representing the standard normal cumulative distribution function, the joint model is represented as

Here the probits, images, of the observed proportion responding on trial i and treatment k, νik have a multivariate normal distribution, with treatment-specific means μk and with random effects νik having a between-trial standard deviation σk and a between-trial correlation matrix R. The population mean absolute response probability on treatment k, πk, can be recovered from these parameters as

images

From here the ‘treatment effects’ of, say, Y compared to X can then be estimated, as a risk difference images, as an odds ratio images or as a relative risk πY/πX. The probit model in equation (5.5) could be replaced with a logit link without changing the fundamental properties of the model, but there is no commitment to a scale of measurement, or equivalently to a link function and linear predictor on which relative treatment effects are additive, and comparatively stable across different trial populations.

The arm-based models, therefore, represent an intriguing challenge to the accepted wisdom that meta-analysis should be pooling relative effects and that these effects should be calculated by comparison to a group of patients selected at the same time according to the same protocol and differing only in the treatment they were randomised to (Senn et al., 2013). The proposed models discard the concept of ‘treatment effect’ and with it, apparently, the randomisation structure in the evidence. Covariates are readily incorporated, but the distinction between modifiers of relative effects and covariates acting on the baseline model is gone. The extensive literature on heterogeneity of treatment effects in meta-analysis or indeed heterogeneity in 2 × 2 tables, which has played such a prominent role in epidemiology, is apparently rendered redundant.

Suffice it to say that, from a traditional meta-analysis viewpoint, arm-based models risk generating biased estimates of treatment effects if the model for the absolute effects is wrong or if trials have unequal allocation between arms (Senn, 2010; Senn et al., 2013). A second undesirable consequence is that the posterior variance of the relative effect estimates is greatly increased because they are exposed to the high between-arm variances (Hong et al., 2016). This is precisely what meta-analysis on the relative effects was designed to avoid. Against this, it must be conceded that traditional meta-analysis will itself generate biased estimates if the wrong scale is chosen for the linear predictor.

Whether or not arm-based models are considered attractive from a statistical or epidemiological perspective, they are not well suited to the current practice in decision making, because they would oblige investigators to use the same trial data to inform both the baseline model and the relative treatment effects. This is quite contrary to the guidance on how natural history models should be populated and to the way decision models are conceptualised: the idea of a relative treatment effect that can be estimated in one context and applied in another is quite fundamental to all formal decision-making methods we are aware of. To take one example, the long-standing public health debate on statin use has revolved around the baseline risk at which they should be recommended, not on their relative efficacy compared to placebo (Stone et al., 2013; National Clinical Guideline Centre, 2014). This debate is premised, of course, on having separable relative and absolute effect models. Further discussion of the advantages and disadvantages on arm-based models can be found in Dias and Ades (2016) and Hong et al. (2016).

5.3.4 Baseline Models with Covariates

5.3.4.1 Using Aggregate Data

Covariates may be included in the baseline model by including terms in the linear predictor. For a covariate C, which could either be a continuous covariate or a dummy covariate, we would have, for arm k of trial i

images

where images. But, again, we would want the baseline model with its covariate to be estimated separately from the relative effects. As before, the estimate of the covariate effect β, like the estimate of μ, could be obtained from the trial data or externally. Govan et al. (2010) give an example where the covariate on the baseline is estimated from aggregate trial data with the purpose of reducing aggregation bias (Rothman et al., 2012). This is a phenomenon in which the presence of a strong covariate, even if balanced across arms and not a relative effect modifier, causes a bias in the estimation of the relative treatment effects, towards the null. Some degree of aggregation bias inevitably occurs whenever treatment effects on the probability scale are not linear in the scale of the GLM linear predictor – in other words, whenever the link function is not the identity link.

Covariate effects are seldom reported in every study. Govan et al. (2010) analyse an example where some studies report a breakdown by severity, others by age and others by both. They illustrate a method for dealing with missing data on covariates, although, like other methods for missing covariates (Dominici et al., 1997; Dominici, 2000), it can only be applied when there is at least one study that provides results at the finest breakdown of the covariates (see also Section 8.2.1).

5.3.4.2 Risk Equations for the Baseline Model Based on Individual Patient Data

A far more reliable approach to informing a baseline model, which expresses difference in baseline progression due to covariates such as age, sex and disease severity at the onset of treatment, is to use individual patient data. This is considered superior to aggregate data as the coefficients can be estimated more precisely and with less risk of ecological bias. The results are often presented as ‘risk equations’ based on multiple regressions from large trial databases, registers or cohort studies. Natural histories for each treatment are then generated by simply adding the treatment effects based on trial data to the risk equations as if they were another risk factor. Examples are the Framingham risk equations (Anon, 2015), which are used to predict coronary heart disease risk, and the UK Prospective Diabetes Study (UKPDS Office), which is used to inform absolute event rates in patients with diabetes. Both these models are used routinely in clinical guidelines. The main difficulty facing the cost-effectiveness analyst here is in justifying the choice of data source and its relevance to the target population. At NICE, organisations making submissions to the Health Technology Appraisals process are obliged to set out a detailed justification for data sources used to inform baseline risk (National Institute for Health and Clinical Excellence, 2012).

5.4 The Natural History Model

Generally speaking, the source of evidence used for each natural history parameter should be determined by a protocol-driven review (Sculpher et al., 2000; National Institute for Health and Clinical Excellence, 2008b; Petrou and Gray, 2011). Previous CEAs are an important source of information on the data sources that can inform natural history. In practice, the full systematic review is generally reserved for data to inform relative treatment effects (Kaltenthaler et al., 2011). For other parameters in the natural history model, a series of strategies may be adopted, which aim as much for efficiency as for sensitivity (Bates, 1989; Pirolli and Card, 1999; Cooper et al., 2007).

Exactly how a natural history model is constructed and how the assembled information is used within it is beyond the scope of this book. However, we provide some comments on one particular aspect of these models: how treatment differences based generally on trial evidence that is typically somewhat short-term are propagated through the extrapolation model.

In Section 5.3, we described how the relative treatment effects are put together with the baseline model to generate absolute effects for all treatments. The role of the natural history model is to extrapolate these absolute effects on trial outcomes to the full range of lifetime health benefits and clinical care costs. The simplest strategy is to assume that there are no differences between treatments in the ‘downstream’ outcomes, conditional on the shorter-term trial outcomes. We can call this the ‘single-mapping hypothesis’ as the implication is that, given the information on the short-term differences, longer-term differences can be obtained by a single mapping applicable to all treatments.

The single-mapping hypothesis is effectively the assumption of surrogacy. For example, that the effect of cholesterol-lowering or blood pressure-lowering drugs on cardiovascular outcomes is entirely predictable from their effect on cholesterol levels and blood pressure. Or, to take a more complex example, in a model to assess cost-effectiveness of various antiviral drugs for the treatment of influenza, the base-case analysis assumed that the use of antivirals only affected short-term outcomes and had no additional impact on longer-term complication and hospitalisation rates (Burch et al., 2010). Models with this property are attractive; although they make strong assumptions that the short-term outcomes are ‘perfect surrogates’, there is no difference between longer-term outcomes that cannot be predicted on the basis of short-term differences.

The use of ‘surrogate endpoint’ arguments in HTA extends far beyond the outcomes classically understood as ‘surrogates’ in the clinical and statistical literature (Taylor and Elston, 2009). HTA literature makes frequent use of ‘mapping’ from short-term to longer-term outcomes, as this allows modellers to base the modelled treatment differences on short-term evidence. Needless to say, the strong assumptions required need to be carefully reviewed and justified, particularly if the treatments under consideration are based on different physiological mechanisms.

If the assumption that all downstream differences between treatments outcomes are due exclusively to differences in shorter-term trial outcomes is not supported by the evidence, then the first option is to use available randomised evidence to drive longer-term outcomes. This necessarily implies different ‘mappings’ for each treatment, but it does at least base this on randomised evidence. The second and least preferred option is the use of non-randomised evidence. However, as with short-term outcomes, it is essential that any use of non-randomised data that directly impacts on differential treatment effects within the model is carefully justified and that the increased uncertainty and the possibility of bias are recognised and addressed (National Institute for Health and Clinical Excellence, 2008b).

In Chapter 11 we set out a series of extensions to network meta-analysis to encompass multiple outcomes as well as multiple treatments. These models are designed to capture and exploit the clinical and logical relationships between trial outcomes. Where the outcomes occur at different time points in the natural history, the synthesis models for the relative treatment effects can no longer be kept entirely separate from the estimation of the natural history model. There is a potential advantage in having models express the logical relationships between outcomes, as well as basing treatment effects of all outcomes on randomised evidence. But, equally, it is essential that such relationships are supported by expert clinical advice.

Against this, there will be a need to extrapolate anyway, with or without additional information on longer-term treatment effects, and it is clearly preferable that data on longer-term outcomes, if available, is allowed to contribute to predictions on longer-term treatment effects. Unless integrated models for relative treatment effects at different time points are adopted, investigators will be obliged to ignore the most informative trial evidence – those with the longer follow-up – and fall back on short-term studies to make inferences about long-term relative effects.

5.5 Model Validation and Calibration Through Multi-Parameter Synthesis

Natural history models should be validated against independent data wherever possible. For example, in CEAs comparing a new cancer treatment to a standard comparator, the survival predicted in the standard arm could be compared with the published survival, perhaps after suitable adjustment on age or other covariates. With other conditions, given an initial estimate of incidence or prevalence, together with statistics on the size of the population, the natural history model may deliver predictions on absolute numbers of those admitted to hospitals with certain sequelæ, complications or mortality. Once again these predictions could be checked against independent data to provide a form of validation.

A more sophisticated approach is use this external data to ‘calibrate’ the natural history model. This entails changing the ‘progression rate’ parameters within the model so that the model accurately predicts the independent calibrating data. Calibration, in a Bayesian framework particularly, can also be seen as a form of evidence synthesis (Welton and Ades, 2005; Ades and Sutton, 2006). In this case the calibrating data is characterised as providing an estimate of a complex function of model parameters. This approach offers a remarkably simple form of calibration because, in principle, all that are required are that the investigator specifies the function of model parameters that the calibrating data estimates and that a term for the likelihood for the additional data is added to the model. The information then propagates ‘backwards’ through the model to inform the basic parameters. There are many advantages of this method over standard methods of calibration, which have recently been reviewed (Vanni et al., 2011):

  1. It gives an appropriate weight to the calibrating data, taking account of sampling error.
  2. It avoids the ‘tweaking’ of model parameters until they ‘fit’ the calibrating data, a procedure that fails to capture the uncertainty in the data.
  3. It can simultaneously accommodate data informing more than one function of parameters that could be used for calibration.
  4. It avoids forcing the investigator to decide which of several natural history parameters should be changed.
  5. Assessment of whether the validating data conflicts with the rest of the model and the data supporting it can proceed using standard model diagnostics, such as residual deviance, DIC or cross-validation (Ades, 2003; Dias et al., 2011a, 2011c) (see also Chapter 3).

Examples of Bayesian calibration approaches have appeared in descriptive epidemiology (Goubar et al., 2008; Presanis et al., 2008; Sweeting et al., 2008; Presanis et al., 2012), particularly in screening applications. In a model of early-onset neonatal group B streptococcus (EOGBS) disease, a chain of decision-tree parameters on maternal infection, neonatal infection and neonatal disease have been calibrated to British Isles surveillance data on the frequency of EOBGS disease (Colbourn et al., 2007a). A particularly important area of application is in cancer models: cancer registry data has been used to recalibrate parameters informed by colorectal cancer screening trials (Whyte et al., 2011). The effect of this kind of calibration is to put rather weak constraints on the individual progression parameters and to place quite strong constraints on complex functions of progression parameters.

Another example is the use of external cancer registry data on conditional survival, which, for many cancers, stabilises 5 or 6 years after diagnosis (Merrill and Hunter, 2010; Yu et al., 2012), to impose constraints on spline-based (Royston and Parmar, 2002) extrapolation of survival curves in cancer treatment trials (Guyot, 2014). Such calibration is necessary to extrapolate short-term trials to obtain life expectancy estimates and is distinctly superior to extrapolation by fitting standard parametric survival curves where results are notoriously sensitive to the choice of curve (Latimer, 2013).

Bayesian calibration is a very powerful application of multi-parameter synthesis. It could be applied in a number of clinical areas to harmonise model predictions with observed data on disease incidence, which is essential for coherent decision-making.

5.6 Generating the Outputs Required for Cost-Effectiveness Analysis

In this section we illustrate how a network meta-analysis can be embedded in a CEA and how the relevant outputs can be generated. We then take the opportunity to discuss which outputs correctly capture the sources of uncertainty and variation, particularly when random effects models are used for the relative treatment effects.

5.6.1 Generating a CEA

Here we build a highly simplistic incremental CEA around the Smoking Cessation network meta-analysis introduced earlier in this chapter (Table 5.2). In addition to the network meta-analysis itself, we define the absolute log-odds of Smoking Cessation using the predictive distribution, as in equation (5.4), plugging in a mean and precision estimated in a separate WinBUGS run. This delivers a vector of absolute probabilities Tk of giving up smoking on intervention k.

The next step is to elaborate the costs and benefits attaching to each treatment. The expected costs of the four treatments are as follows: no contact = 0, self-help = £200, individual counselling = £6000 and group therapy = £600. We can already anticipate that group counselling will be the most cost-effective treatment as it is the most effective, but not the most costly. We assume that the benefits of Smoking Cessation are an additional 15 quality adjusted life years gained, with a standard error of 4 years. A decision tree (Figure 5.2) can be drawn up showing exactly how the costs and benefits will be quantified for each treatment. Notice that the benefit accrues only to the proportion who give up smoking, while the costs on each strategy apply whether the treatment is successful or not. We may now define our objective function, which will be the net benefit, monetised health gain less cost (Claxton and Posnett, 1996; Stinnett and Mullahy, 1998):

images
Image described by caption and surrounding text.

Figure 5.2 Smoking Cessation: decision tree for cost-effectiveness analysis of the four strategies (Welton et al. 2012).

Reproduced with permission of John Wiley & sons.

The parameter w represents the ‘willingness to pay’ of the decision maker, the amount of money a decision maker is prepared to spend to obtain a unit gain in health outcome, measured here in quality adjusted life years (QALYs). A typical value, used for example by NICE, is £20,000 per additional QALY. A CEAC is generated by plotting out the probability that each treatment will be cost-effective, against w, varied, for example, from 0 to £50,000. These probabilities are generated by counting, at each value of w, the proportion of iterations in which NB(k, w) for the k-th treatment is higher than all the others. To do this we can monitor the node p.ce[k,w] in the following WinBUGS code (Ch5_Smoking_CEA.odc):

for (k in 1:nt) {
 ly[k] <- T[k] * lyg
 for (will in 1:50){
 nb[k,will] <- ly[k]*will*1000 - C[k]
 p.ce[k,will] <- equals(rank(nb[,will],k),nt)
 }
}

The required probabilities of being cost-effective can then be obtained from the posterior means of the matrix p.ce[,] (Welton et al., 2012). These can be copied to an external package, to be plotted against w to form CEACs (Figure 5.3). At any willingness to pay, group counselling is most likely to be the cost-effective strategy.

Willingness to pay vs. probability of being cost-effective illustrating distribution of mean (top) and predictive (bottom) treatment effects, each displaying 4 various curves for no contact, self-help, etc.

Figure 5.3 Smoking Cessation: cost-effectiveness acceptability curves, (a) distribution of mean treatment effects and (b) distribution of predictive treatment effects.

5.6.2 Heterogeneity in the Context of Decision-Making

In Chapter 3 we introduced the predictive distribution of the relative treatment effect and interpreted it as the (true) effect we would expect in a ‘new trial’ whose relative treatment effect was drawn from the same random effects distribution estimated from the existing trial evidence. Its role there was to assist in predictive model checking. In this section we consider this, and other interpretations of ‘random treatment effects’, from a decision maker’s point of view.

In Section 5.3 we suggested that, if data from several studies was used to inform the absolute effect on the reference treatment 1, then the predictive distribution should be used, rather than the distribution of the mean. This is in order to capture more appropriately the actual uncertainty about what the response to the reference treatment might be in a future scenario. To understand the intuition behind this, consider a situation where there have been hundreds of trials, so that the uncertainty regarding the mean absolute response on the reference treatment will be negligible. But, in the presence of heterogeneity, our uncertainty about what the baseline effect will be in a future scenario is not really changed: we know this is highly variable from the present collection of trials. This intuition is correctly captured if we use the predictive distribution, which is dominated by the variance of the random effects distribution, however much (or little) certainty there is about the mean.

For the relative treatment effect, decision modellers have almost invariably used the random effects mean. However, some authorities have suggested that the predictive effect might be more appropriate in a decision-making context (Spiegelhalter et al., 2004). In fact there are several outputs from a random effects model that could be used as inputs to a CEA, each reflecting a particular interpretation and origin for the random effect (Ades et al., 2005; Welton et al., 2007; Welton and Ades, 2012):

  1. Random effects mean. If the true relative treatment effect is fixed but is observed under some form of noise, then the random effects mean is exactly appropriate in a decision problem. For example, the observed variation between trials might be due to variation in the way an outcome is defined or to differences in the way a test instrument is scored.
  2. Shrunken study-specific estimate. Here we suppose that among the M trials, there are one or more in which the circumstances (setting, clinical population) are a very close match to the target population for the decision problem. The random effects model expresses the belief that the different circumstances impact on the treatment effect in an unpredictable way, but that they are drawn from a common distribution. The specific estimate for the trial (or trials) whose circumstances match the target will be influenced by being in the model and will be ‘shrunk’ towards the overall mean (see Section 2.2).
  3. Predictive distribution. In this scenario the target population/setting bears no specific relation to any of the previous trials. The target treatment effect could be considered as another sample from the random effects distribution. It should be noted that this trial-specific effect is, however, to be considered as a ‘fixed’ effect, in the sense that, if the target population/setting was to be reproduced in 20 ‘new’ trials, these trials would all estimate the same ‘fixed’ effect. There are, in effect, two models: a random effects model for the previous data and a fixed effects model for an imaginary ensemble of trials on the new target population. For example, if the predictive distribution was used to inform a prior distribution for the treatment effect (Spiegelhalter et al., 2004), it could only be updated by the likelihood of the data in the new trial if it was believed that both represented the same ‘fixed’ treatment effect. In a CEA, the use of the predictive distribution in a decision model, illustrated in Ch5_Smoking_CEApred.odc, rather than the distribution of the mean, will produce markedly different results (Figure 5.3).
  4. Whole distribution. This final option is subtly different from the predictive distribution. The key difference is that the future scenario does not consist of a single sample from the random effects distribution, but the entire distribution itself. A good model for this would be a multicentre trial of a psychotherapy protocol in which therapists in different centres came ‘off protocol’ to different random extents. This would generate a distribution of effects that we might expect to be duplicated in any future roll-out of the therapy. Possibly random differences between the patient populations in different centres could generate a similar result. This fourth way of using a random effects distribution requires a further level of integration within each MCMC cycle, to account for what is, in effect, patient or centre heterogeneity in the treatment effect. This has been illustrated in a stylized example (Ades et al., 2005; Welton et al., 2007), but it has yet to be applied to relative treatment effects in applications. There are examples, however, where it has been applied to account for heterogeneity in the baseline model (Welton et al., 2008a).

    In practice, of course, investigators faced with a random effects model on treatment effects will have an estimate of between-studies variation that may include both random error in observation, true heterogeneity between patient populations, random variation from internal biases and random variation from centre or care differences. We have no specific advice to offer, other than to encourage investigators to give some thought to the sources of unexplained variation in treatment efficacy and how this is captured in the CEA.

5.7 Strategies to Implement Cost-Effectiveness Analyses

In this and previous chapters, we have proposed Bayesian posterior simulation as the ‘engine’ with which to estimate both the relative treatment effects in pairwise and network syntheses, the baseline model and any further parameters required to extrapolate treatment effects in a cost-effectiveness model. The decision maker must then choose an intervention strategy under uncertainty, selecting the one that delivers the highest expected net benefit. Put formally, there is a net benefit function NB(S, θ) with uncertain parameters θ. The optimal decision S* is the one that delivers the highest expected net benefit

The expectation over the uncertain parameters is therefore an expectation over a joint posterior distribution that is likely to comprise complex correlations, because parameters have been estimated simultaneously from the same input data. In addition, the net benefit function is likely to be non-linear in its parameters. This will necessarily be true whenever log or logistic link functions feature in the synthesis model, or when the natural history model includes a Markov model. We note in passing that the concept of ‘optimal decision’ is entirely conditional on the model.

If there are correlations between parameters in the net benefit function or if it is non-linear in any of its parameters, the expectation of the net benefit is not the same as the net benefit at the expected value of the parameters. It is for this general reason that decision-making under parameter uncertainty requires integration over net benefit. Posterior simulation is a simple and popular method for carrying out the evaluation of those integrals. The posterior correlations are also very important in the incremental analysis, as they bear on the differences between the net benefits of alternative strategies (see Exercise 5.3).

There are several ways in which the results of the evidence synthesis can be incorporated into the probabilistic CEA.

5.7.1 Bayesian Posterior Simulation: One-Stage Approach

When estimation of the synthesis parameters is via sampling from a Bayesian posterior distribution of the relevant parameters, this can be integrated with the CEA as a single process within a single programming package. Bayesian MCMC simulation (Gilks et al., 1996) using WinBUGS (Lunn et al., 2000, 2013), OpenBUGS (Lunn et al., 2009) or other MCMC packages provides the obvious example. The advantage of this approach is that it not only estimates a Bayesian posterior distribution but also is simulation-based, so that its outputs are perfectly compatible with the MC sampling approach that has become the standard modelling method in so many areas of science. Samples from the joint posterior distribution can be put directly through the decision analysis, so that net benefit and other outputs can be evaluated for each set of parameter samples, without requirements for assumptions about its distributional form. Distributions of additional parameters and costs can be readily incorporated.

Development of MCMC algorithms and sampling schemes is a specialised area of research. For completeness it is worth mentioning that a broad range of non-MCMC simulation-based Bayesian updating schemes have also been proposed, including the sample importance resampling algorithm (Rubin, 1988), Bayesian Melding (Raftery et al., 1995; Poole and Raftery, 2000) and Bayesian Monte Carlo (MC) (Brand and Small, 1995). All these have the same properties as Bayesian MCMC in that they all feature both Bayesian estimation and sampling from joint posterior distributions. The latter two were specifically designed for evidence synthesis. We describe some of them in Section 5.7.2.

5.7.2 Bayesian Posterior Simulation: Two-Stage Approach

If investigators have a preferred software for CEA, another option is to take the posterior samples from the Bayesian MCMC, or other posterior sampling scheme, and use them as input to the CEA package. This has the same technical properties as the Bayesian one-stage approach since the full posterior distribution is preserved. From WinBUGS, the convergence diagnostics and output analysis (CODA) output, which lists all values generated from the full posterior distribution, can be exported into a spreadsheet-based program such as EXCEL, using BUGS Utility for Spreadsheets (BUS) (Hahn, 2001). The CODA output can also be converted to the freely available statistical software R (R Development Core Team, 2010) for convergence diagnostics, further analysis and plotting using add-on packages such as Bayesian Output Analysis Program (BOA) (Smith, 2005) or CODA (Plummer et al., 2006). When using the CODA output for a CEA, it is important that the correlations in the parameter estimates are preserved. This is done by ensuring that all parameter values are sampled from the same MCMC iteration. If the CODA output is stored as separate columns for each parameter with iteration values along the rows, this would correspond to sampling all the parameter values in one row, each time.

The two-stage approach is particularly useful when there is substantial autocorrelation between successive MCMC samples. This can arise in many situations but usually depends on the statistical model, the way it is parameterised and sparseness of the data. The effect of high levels of autocorrelation is to increase the degree of MC error, with the result that it may require hundreds of thousands of simulations, rather than tens of thousands, before stable estimates are obtained. A common practice in decision modelling has been to ‘thin’ the posterior sampling. For example, rather than store every posterior value from the MCMC process, one might store every 10th or every 20th. This will usually be enough to reduce autocorrelation substantially, so that the decision model can be run with, say, 25,000 samples from a thinned chain, rather than with the 500,000 original samples. This is particularly relevant for computationally expensive models, although users should ensure that MC error is still appropriately small when using a reduced number of samples. MC error at less than 5% of the posterior standard deviation is the recommended target (see Chapter 2).

Finally, another approach to embedding an evidence synthesis in a CEA is to characterise the posterior distributions in an algebraic form, most often as a multivariate normal distribution, and then simulate from this distribution to implement the CEA. Care must be taken, of course, to ensure that the assumption of multivariate normality is justified (see Exercise 5.3).

5.7.3 Multiple Software Platforms and Automation of Network Meta-Analysis

Interfacing between software packages leads to greater flexibility and facilitates a multidisciplinary approach to CEA. For example, statisticians may wish to use general statistical software, whereas decision modellers may wish to use those designed specifically for decision modelling. Or, while WinBUGS may be chosen to conduct the network meta-analysis, packages with more advanced graphical capability such as R may be needed to display the results. In this section we briefly review the potential to communicate between software platforms, integrated use of different platforms and also on progress towards automating network meta-analysis analysis.

Transfer of data between packages can be effected in a variety of ways. If the analysis is to be carried out in WinBUGS, data columns can be copied directly from spreadsheet software into WinBUGS and pasted by selecting Paste Special from the WinBUGS Edit menu and choosing the Plain text option. Alternatively, XL2BUGS (Misra, 2011) is an EXCEL add-in that converts EXCEL data into WinBUGS vector format, while BAUW (Zhang and Wang, 2006) converts data in text format into WinBUGS vector or matrix format. If data are stored in R, R2WinBUGS (Sturtz et al., 2005) can be used to convert R objects into WinBUGS list data using the bugs.data function. Details on software capable of communicating with WinBUGS are available in the WinBUGS web pages (MRC Biostatistics Unit, 2015a).

Integrated platforms reduce the need to copy data and intermediate results from one screen/system to another and thereby reduce the risk of transcription errors. They also facilitate rerunning analyses on updated datasets, conducting sensitivity analyses and, more generally, promoting transparency. It is possible to integrate input, analysis and the display of results using multiple packages, into a single step, adding an interface that facilitates access by clinicians. To this end, a transparent interactive decision interrogator (Bujkiewicz et al., 2011), which integrated syntheses conducted in WinBUGS with graphical displays and the decision model conducted in R and a ‘point and click’ interface in EXCEL was developed for use by the NICE Appraisals Committee, allowing members to rerun the analyses with different parameters and different synthesis models in real time in the committee meetings (Thompson et al., 2006).

Several freely available code routines have been developed for commonly used packages in HTA, which allow them to communicate with other packages, and these can be utilised in the creation of integrated analyses (Thompson et al., 2006; Heiberger and Neuwirth, 2009; Yan and Prates, 2013).

Finally, procedures have been written that take appropriately formatted datasets as their input, and which automatically generate initial values and computer code, using Bayesian MCMC as the basic computational engine (van Valkenhoef et al., 2012a, 2012b). Further similar software to carry out node-splitting checks for inconsistency (see Chapter 7) are also available (van Valkenhoef et al., 2012a, 2012b; van Valkenhoef and Kuiper, 2016).

5.8 Summary and Further Reading

In this chapter we have proposed that the choice of evidence to inform the baseline model should be considered entirely separately from the choice of data to choose the relative treatment effect, although this can be less convenient from a modelling viewpoint. We have briefly discussed other aspects of the natural history model and have proposed that, wherever possible, the natural history of the disease beyond the trial outcomes should be identical for all treatments conditional on the estimates of the trial outcomes. The ‘single-mapping hypothesis’ is essentially an assumption of perfect surrogacy.

Given the properties of the joint posterior distribution of treatment effects, we have proposed that the simplest way to ensure that parameter uncertainty is correctly propagated through a decision model is to use the MCMC outputs directly. Frequentist network meta-analysis is an alternative that many investigators may find more convenient, but generation of a joint distribution of parameter values would then be an extra step if inputs for a CEA were required.

This is a good opportunity to briefly summarise why we believe that Bayesian MCMC is the statistical method of choice for network meta-analysis. Among the purely statistical reasons we would cite the following:

  • The generation of joint parameter distributions given the data, rather than a distribution of the data at the maximum likelihood parameter values
  • Better performance with sparse data and zero cells (see Chapter 6)
  • Flexibility to introduce informative priors for variance parameters (see Chapters 2 and 4)
  • Modular structure with relatively simple extensions to shared parameter models, meta-regression, inconsistency analysis and multiple outcomes reported irregularly (see Chapters 4, 7, 8, 10 and 11)

These reasons would hold even if the network meta-analysis was not going to be embedded in further decision analytic procedures. If there is to be such embedding, the further advantages are as follows:

  • Flexibility to combine treatment effects with natural history parameters, which might also have required complex synthesis.
  • Posterior simulation is compatible with probabilistic decision analytic methods.
  • Correct uncertainty propagation through to any objective function, in one step.

Two topics are recommended for further reading. Firstly, equation (5.6) defines the ‘optimal decision based on current evidence’. It is important to appreciate that this is not necessarily the ‘correct decision’ that would be taken under conditions where there was no uncertainty. The value to the decision maker in reducing uncertainty is the subject of value of information theory (Raiffa, 1961; Raiffa and Schlaiffer, 1967; Pratt et al., 1995). This fascinating theory has attracted increasing interest in the health-care field (Thompson and Evans, 1997; Felli and Hazen, 1998; Claxton, 1999; Claxton et al., 2000, 2005a). The majority of the recent work has been devoted to solving the considerable computational problems in its implementation (Brennan and Kharroubi, 2007; Welton et al., 2008a, 2014; Oakley et al., 2010; Madan et al., 2014a; Strong et al., 2015).

Secondly, we have discussed the embedding of a network meta-analysis within a CEA at some length, but net benefit (Stinnett and Mullahy, 1998) is only one example of an objective function that can be used to decide between treatments. These same principles can be applied to any objective function. Perhaps the best known decision-making method that does not consider costs is MCDA. Here outcomes can be weighted in various ways, and the objective function is the sum of a linear combination of weights and treatment effects on absolute scales. There are a wide range of variants including fixed weights, weight distributions or imprecise weights that can be considered similar to preferences (Lahdelma and Salminen, 2001; Tervonen and Lahdelma, 2007). Examples where network meta-analysis has been embedded in MCDAs are beginning to be published (van Valkenhoef et al., 2012c; Naci et al., 2014b; Tervonen et al., 2015).

5.9 Exercises

  1. 5.1 For the Smoking Cessation example, using the WinBUGS files Ch5_Smoking_CEA.odc and Ch5_Smoking_CEApred.odc, examine the posterior correlations between the T[k]: (a) when the T[k] are based on the mean treatment effect and (b) when they are based on the predictive distribution of the relative treatment effects. Explain why one set of correlations is higher than the other.

    NB: monitor d[] and record their posterior correlations for use in Exercise 5.3.

  2. 5.2 For the Smoking Cessation example, enter the posterior summaries of the log-odds ratios d[k] based on the mean treatment effects as ‘data’ into a cost-effectiveness analysis, rather than embedding the network meta-analysis within the CEA. Import a distribution for the baseline model for T[1] as suggested in Section 5.3.2. Note that this means ignoring the correlations. Construct the CEACs and comment on how and why they differ from the CEACs (a) in Figure 5.3.
  3. 5.3 Repeat Exercise 5.2, but now incorporate the posterior correlations of the absolute treatment effects as data. Compare to the results from Ch5_Smoking_CEA.odc and Figure 5.2.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.138.135.80