The Relative Ethicalness of Clinical Trial Designs

TLDR;

  • Clinical trials feature the ‘explore-exploit’ trade-off.

  • Clinical trial designs exist on a possibility frontier.

  • If you are going to pick a trial design that isn’t on the frontier of power and patient benefit, make sure you have a good reason.

  • This has controversial implications because the standard RCT design—“we should randomise half the patients to one treatment, half to the other and then take stock of the results”—is not on that frontier.

  • EA funders and researchers should make an effort to consider research methods that, for any given statistical power, maximise the number of treatment successes.

This post is aimed at EAs who use RCTs and EAs who fund projects that use RCTs.

Thanks to Nathan Young and my dad for reviewing early versions of this post.

Introduction

Medical research trials balance several aims. They should

  1. be very likely to find the right answer

  2. give as many people as possible the best treatment, and

  3. be cheap.

These aims are in tension. If we gave every patient the treatment we think is best, we’d learn nothing about the relative safety and efficacy of the treatments. If we give half the patients one treatment and the other half the other, we’d learn a reasonable amount but in doing so we’d give half patients a treatment that might—fairly quickly—seem obviously bad.

This is known as the explore vs exploit trade-off.[1]

Some trial designs achieve one aim at the expense of another. Some trial designs achieve none.

It is the job of the team designing the trial to choose a trial design that balances this trade-off. They should ensure that the design they use is on the efficient frontier.

Trial Designs

In clinical trials, there are lots of ways to decide who gets which treatment.

One popular method is Equal Randomisation (ER): randomly assign half the patients one treatment, and the other half the other, in a 1:1 allocation ratio. This method is reasonably powerful and it is relatively simple to implement. ER does not—despite a widespread misconception to the contrary—maximise statistical power in general. As we will see shortly, other methods often achieve higher power.

Another method is Thompson Sampling (TS): the probability a patient gets assigned a treatment should match the probability that that treatment is best. Since this probability depends on the interim results of the trial, it is an example of an adaptive design. TS is less statistically powerful than ER but outperforms in terms of patient benefit—it quickly figures out which treatment is best and, as it grows in certainty, gives more and more patients that treatment.

In fact, there are many many trial designs. Some maximise statistical power; some patient benefit. Others some mixture of the two. For a recent review of trial designs see Robertson et al (2023).

The Patient-Benefit-Power Frontier

We can take these designs, simulate them and observe the effective power and expected patient benefit. I have done this for 20 different trial designs and plotted the results in Figure 1. Note that I have also drawn a line through the points on the patient-benefit-power frontier, points where no other trial design outperforms on both power and patient benefit.

Figure 1: A scatter plot showing the results of a simulation study with binary endpoints, uniform priors and trials of 200 patients. Shown is the expected power and expected number of successes for 20 various trial designs. Also plotted is a line through the points on the efficient frontier.

Inspecting Figure 1 confirms several things.

  1. Some trial designs are not on the efficient frontier.

  2. One the efficient frontier, there is a trade-off between power and patient benefit.

  3. The common research method of allocating half the patients to one treatment and half the patients to the other is labelled “Equal Randomisation” on the plot. We can see that Equal Randomisation is not on the frontier either. If you want to maximise statistical power, allocate patients using Drop The Loser (DTL) or some similar method. These alternatives, as well as being more powerful, are also more likely to give patients the best treatment.

Conclusions

Human capital related constraints aside, there are few reasons to use an allocation strategy that isn’t on the efficient frontier.

Because choosing a strategy that isn’t on the efficient frontier involves reducing the safety/​efficacy of the treatments for no corresponding gain of statistical power, unless practical reasons have ruled out all strategies which dominate them, it is plainly unethical to use allocation strategies that aren’t on the efficient frontier.[2]

Figure 1 shows that, under the assumed priors, Equal Randomisation is not on the patient-benefit-power frontier. As stated in Robertson et al (2023), in general “ER does not maximize the power for a given when responses are binary. The notion that ER maximises power in general is an established belief that appears in many papers but it only holds in specific settings (e.g. if comparing means of two normally-distributed outcomes with a common variance).” So unless very specific statistical conditions are met, or unless all the trial designs which dominate it have been ruled out for practical reasons, it is unethical for clinical trials to use Equal Randomisation.

With the priors and sample size used in Figure 1, we can see that using Equal Randomisation adds no additional power over TW(1/​5) and yet results in 25% fewer treatment successes!

Implications for EA

I don’t want to overstate my case; adaptive trials can have downsides. They are, for example, operationally harder to run, they can be harder to explain and regulatory bodies will be less familiar with them relative to more standard study designs.

The main message I want EAs to take away is this: If you are going to pick a trial design that isn’t on the frontier of power and patient benefit, make sure you have a good reason.

If you know someone running an RCT, ask them if other trial designs, holding sample size and statistical power constant, would have higher expected patient benefit. If they’re not using those designs or if they’ve not considered the question, ask them why not. (Reasonable answers very much do exist.)

If you are designing an RCT, consider adaptive designs in addition to the standard ones. Use simulation studies to estimate the pros and cons of the different trial designs in terms of power and patient benefit. Consider whether you and your team can implement them. And make the normal considerations too. Whether you go for adaptive designs or not, I’d encourage you to strongly consider it.

For further reading, I highly recommend Robertson et al (2023).

References

Lattimore, T. and Szepesvári, C. (2020) Bandit Algorithms. 1st edn. Cambridge University Press. Available at: https://​​doi.org/​​10.1017/​​9781108571401.

Christian, B. and Griffiths, T. (2017) Algorithms to Live By. Paperback. William Collins.

Robertson, D.S. et al. (2023) ‘Response-adaptive randomization in clinical trials: from myths to practical considerations’, Statistical science : a review journal of the Institute of Mathematical Statistics, 38(2), pp. 185–208. Available at: https://​​doi.org/​​10.1214/​​22-STS865.

  1. ^

    For a technical survey of the relevant Mathematics, see Lattimore and Szepesvári (2020). For a general discussion of this research and its applications, see Christian and Griffiths (2017).

  2. ^

    This fairly straightforward observation has a counterintuitive and controversial implication: It is therefore unethical for doctors to just give whatever treatment seems best. The Greedy allocation strategy is not on the efficient frontier.