Testing too small and testing too big

by James Young
Partner

As I continue to work with clients on A/B testing, the question of what is too small of a test and what is too broad of a test comes up over and over. We often try to make rules around this, but that has proven difficult.

One very good way to make sure that you are not testing too broad or too small is to have a strong test hypothesis which becomes the theme or reason for the test. Ideally the test hypothesis should have its basis in user insights but a hunch can be OK since we are going to be testing it. Read the hypothesis before every meeting to make sure everyone is on the same page with the goals of the test. Also, have someone like the program manager be the “bad cop” if anyone starts to suggest ideas that don’t satisfy the hypothesis.

Example of a hypothesis that is too broad:
“Making improvements to the homepage will increase conversion and revenue.”
Good luck with that one. Your stakeholders are going to have you spinning throughout the brainstorming and review process unless you narrow this down.

Example of a hypothesis that is too narrow:
“By changing the color of headline from green to blue we can increase revenue.”
We have to be allowed to think harder than this when developing ideas and recipes to increase conversion and revenue.

Example of a good hypothesis:
“Exposing users to the option to contact customer support from the homepage will give users confidence that they can get help if they need it, when they need it, which will increase conversion and revenue.”

This has a good, focused challenge that has its basis in user insights without being too prescriptive. It’s easy to understand what each individual recipe should focus on. When you have a good hypothesis like this you can really dig in and come up with some widely differing recipes. What will make the user confident? Which type of help should be offered? Do I need “more cowbell” for the contact options or can they be subtler?

Once we have a good hypothesis, we can go broad with brainstorming then narrow the recipes down to just the set that our team feels will truly move the needle. It’s important that each one be different from the others. Also, we should have some sort of rationale for each one: “This will serve the hypothesis because the contact options are right there in the user’s face, and that will help them feel more confident,” or, “This subtly exposes the 800 number, which will subconsciously give the user confidence without being to overt.”

Test bigger first

One of my most favorite Program Mangers, Leah Bennet, often advises clients to test broad or big and then refine from there as needed. This is great advice, especially when we are just starting out on the testing journey, because we want to test for the biggest impact first then move to the smaller stuff.

When we are discussing testing, clients often want to know, “If a test wins, how will we know exactly which element won?” Was it the headline, or was it the better subhead, or was it the new layout? As long as the test appropriately addressed the hypothesis, our reply usually is, “It doesn’t matter. We roll out the winner.” Being overly concerned about which factor changed the outcome of the overall test often times leads to testing too small and wasting time before rolling out the winning creative campaign. Later, if we have the luxury of time and traffic, we can do back-tests to see which small element was the most influential.

The time for testing small

I have had new clients come to me saying they want to get out of the mode of testing small things like buttons or headline color. In some instances, though, testing small things like buttons can make a big difference. There is really no such thing as testing too small, as long as these tests are done at the appropriate time. Once broader tests have helped establish a solid creative campaign, testing smaller elements can be beneficial. One of my long-term clients has done testing on button wording with great success. The very famous 300 million dollar button case is another example but in this instance it was based upon good user insights and so it really was not a stab in the dark that some small tests can be.

So testing is about knowing when to test small or big and comes down to testing experience (or practice), good user insights, which all inform a strong hypothesis. Let that be your guiding light.