Sep 7 / Manisha Arora & Julian Hsu

Experimental Design: Zeroing in on your Baseline and Expected Impact - Part 4/4

We know from our previous post that there is inevitable risk from random noise in the data causing us to draw the wrong conclusion. In another previous post, we already decided how much risk we are willing to accept based on your business intuition and understanding of the context. Another place your intuition comes in is how much lift you expect your experiment to have.

The larger the lift or impact, the less likely noise will drive us to make an error. We want to make sure this impact is larger than the minimum detectable effect (MDE) of the experimental design.

These expectations can come from a variety of places:

1. Your own business intuition and understanding of the industry landscape;

2. Internal sources such as the results of previous experiments or related science work; and

3. External sources such as customer surveys.

Very often whomever is posing the experiment has an impact in mind; it may not be a specific number, but they at least have a range over what the impact will be.

Stop, you may be asking:

“What a minute, why are we using our subjective beliefs about whether our experiment will succeed to design it?”

This is a completely valid question! After all, experiments allow data-driven decisions, so why are we using our potentially biased beliefs in a data-driven approach?

With two examples, let me convince you that your subjective beliefs have a place in experimental design.

The first example is where you want to test whether a control and treatment are the same or not. We are an online ads company, and want to test what shade of orange we want our logo to be. In our control group, we have the logo in one shade of orange. In the treatment group, we have the same logo in an ever so slightly different shade of orange. So slight in fact, that no one can tell the difference. You are suspicious of this, and want to know “how long would I need to run this experiment to learn that control and treatment are the same?” In other words, the MDE is 0.

When your MDE is 0, you need to run the experiment forever. You’d have to run this experiment until the end of the universe (and maybe a little longer, just to be safe). This is because any natural variation in your metric can be interpreted as due to the treatment.

The second example is when you expect the MDE to be very large. Suppose we are an ice cream store company and want to test how increasing the number of sample ice cream impacts sales. In our control group, we have stores that let customers two spoons, while in the treatment group it is five spoons. We expect that this will double sales: our MDE is a 100% increase. Our experiment can be very short, because even though sales naturally fluctuate, it will never fluctuate enough to create a 100% increase.

Your own experiment is somewhere between these two examples. The second example seems like a good deal: expect a really large impact and have a short experiment! However, there are drawbacks of setting too large an MDE.

MDE is the minimum detectable effect size. This means that your experiment cannot rule out that impacts smaller than it are driven by natural variation and statistical noise. Let’s come back to the ice cream store example. We have determined that the MDE is 5%, and calculated that the experiment should run for 10 weeks. At the end of 10 weeks, we collect our data and estimate the impact.

Scenario one: we estimate the impact is 10%. This estimate is greater than our MDE, so we are confident that this is due to our new ice sampling policy.

Scenario two: we estimate the impact is 2%. This estimate is smaller than our MDE, so we do not know if this is driven by statistical noise or the treatment. Therefore, our experiment wasted resources in that we did not learn whether our new sampling policy increased sales. If we had run the experiment longer, we could have collected enough data.

This highlights that too large an MDE increases the risk of a wasted experiment. When you are thinking about an MDE, consider what’s the smallest number where you think the impact can be ignored? This depends on your context, such as how much the treatment might cost to implement or other external factors that drive your metrics.

Finally, make sure that you do not set the MDE and design your experiment so it’s only useful results can confirm your beliefs. We have preconceptions and some biases that can guide our experiment, but we need to actively work against them to make sure we can be convinced otherwise.

If you strongly think that a new sampling policy for your ice cream store or your ads campaign will not work, then setting too low an MDE means running a very long experiment. Similarly, thinking that your new policy or campaign is sure to work and setting too high an MDE means that your experiment cannot reliably detect more nuanced impacts.

Use your subjective judgment to set your MDE, but think about what sort of evidence would convince you otherwise. This gives the experiment a fair chance to give you an relatively unbiased way of informing your business decision.

Company

All Courses
Contact

Legal

Social

Contact Us!

I would like to receive news, tips and tricks, and other promotional material

Thank you!

I would like to receive news, tips and tricks, and other promotional material

Thank you!

Experimental Design: Zeroing in on your Baseline and Expected Impact - Part 4/4

Minimum detectable effects (MDE) can’t be too high or too low

What is the risk behind MDE?

Do MDEs just Confirm your Beliefs?

About the authors

Company

Legal

Social

Experimental Design: Zeroing in on your Baseline and Expected Impact - Part 4/4

Minimum detectable effects (MDE) can’t be too high or too low

What is the risk behind MDE?

Do MDEs just Confirm your Beliefs?

About the authors

Company

Legal

Social

Contact Us!

One more step!

One more step!

One more step!

One more step!

One more step!

One more step!

One more step!

One more step!

One more step!

One more step!

Access has ended, sorry.

One more step!

One more step!

One more step!