Determining how long to run an experiment for

How long do you run an experiment for? How long does it take to actually get a result? At what point do you give up? Let's talk about that in more detail.

If someone in your company has developed a new experiment, a new change that they want to test, then they have a vested interest in seeing that succeed. They put a lot of work and time into it, and they want it to be successful. Maybe you've gone weeks with the testing and you still haven't reached a significant outcome on this experiment, positive or negative. You know that they're going to want to keep running it pretty much indefinitely in the hope that it will eventually show a positive result. It's up to you to draw the line on how long you're willing to run this experiment for.

How do I know when I'm done running an A/B test? I mean, it's not always straightforward to predict how long it will take before you can achieve a significant result, but obviously if you have achieved a significant result, if your p-value has gone below 1 percent or 5 percent or whatever threshold you've chosen, and you're done.

At that point you can pull the plug on the experiment and either roll out the change more widely or remove it because it was actually having a negative effect. You can always tell people to go back and try again, use what they learned from the experiment to maybe try it again with some changes and soften the blow a little bit.

The other thing that might happen is it's just not converging at all. If you're not seeing any trends over time in the p-value, it's probably a good sign that you're not going to see this converge anytime soon. It's just not going to have enough of an impact on behavior to even be measurable, no matter how long you run it.

In those situations, what you want to do every day is plot on a graph for a given experiment the p-value, the t-statistic, whatever you're using to measure the success of this experiment, and if you're seeing something that looks promising, you will see that p-value start to come down over time. So, the more data it gets, the more significant your results should be getting.

Now, if you instead see a flat line or a line that's all over the place, that kind of tells you that that p-value's not going anywhere, and it doesn't matter how long you run this experiment, it's just not going to happen. You need to agree up front that in the case where you're not seeing any trends in p-values, what's the longest you're willing to run this experiment for? Is it two weeks? Is it a month?

Another thing to keep in mind is that having more than one experiment running on the site at once can conflate your results.

Time spent on experiments is a valuable commodity, you can't make more time in the world. You can only really run as many experiments as you have time to run them in a given year. So, if you spend too much time running one experiment that really has no chance of converging on a result, that's an opportunity you've missed to run another potentially more valuable experiment during that time that you are wasting on this other one.

It's important to draw the line on experiment links, because time is a very precious commodity when you're running A/B tests on a website, at least as long as you have more ideas than you have time, which hopefully is the case. Make sure you go in with agreed upper bounds on how long you're going to spend testing a given experiment, and if you're not seeing trends in the p-value that look encouraging, it's time to pull the plug at that point.

Table of Contents for Determining how long to run an experiment for

Create new playlist

Sign In

Sign Up

Table of Contents for
Determining how long to run an experiment for