Ad Testing Results - Thinking Bigger about Delivery

by Max Brown on Monday 7 November 2016

Statistical ad testing is a much discussed topic within digital marketing and there are seemingly dozens of ad testing tools available for free online, which claim to provide the best results to suit your ad testing needs. Similarly there are hundreds of blog entries from well-respected digital marketing sources espousing the benefits of statistical ad testing in general, and promoting a specific ad testing tool expected to fulfil all the readers’ ad testing needs. Here I hope to inspire you to think a little bigger in terms of how to respond to ad testing and statistical significance.

To summarise the vast majority of the current crop of blog entries regarding ad testing and statistical significance, in a few bullet points:


1.       Ad testing is good and you should be doing it.
2.       Use this method of determining whether a test is statistically significant.
3.       Put some numbers in this table!


Doing as instructed will yield a message something along the lines of “We are 95% confident that ad A (or B) has a stronger CTR than ad B (or A)?” or, more dishearteningly “There is no clear winner”.

Now I am not going to quibble with bullet points 1 and 2. However, I am going to take issue with an implicit assumption in bullet point 3, the assumption that I am willing to spend time inserting the number of impressions and clicks that ad A and ad B generated, every day, for each ad test I am undertaking.

Every ad group you manage with more than 2 ads constitutes at least 1 ad test. Every day the ads in these ad groups are accumulating impressions and clicks, and every day is a new chance to see a statistically significant performance difference. If you are using the above method you are either wasting time putting numbers into boxes every day, or worse, you are neglecting to check every day, in which case your CTR will suffer. If you have an account with 50,000 ad groups each with only 2 ads, assuming a conservative 10 second rate at which test significance is calculated, it would take over 5 and a half days to test all of them.


At agency level, we strive to provide our clients with a competitive advantage. How are you going to do this using the same tools that anyone with a browser can find? At agency level, the technology available to most every single analyst is enough to improve upon these tools.


One of the reasons everyone uses spreadsheet software is that it is far quicker than a calculator where a calculation needs to be repeated. Can we use a spreadsheet to determine the significance of multiple ad tests more efficiently? It turns out we can, and it’s not too hard.


Though there are testing tools out there that allow you to distinguish between a test of two ads and a test of three ads, it is more useful to test pairs of ads independently of each other. We can also use excel to pre-define an ad A and an ad B for as many unique ad tests we wish to perform, whether it be comparing the performance of a pair of individual creative IDs, or comparing the aggregate performance of a pair of ‘ad types’ repeated across multiple ad groups. It is therefore possible, perhaps with a bit of research, to create a spreadsheet document in which up-to-date performance data can be pasted, and a significance result generated for each defined ad test.


But wait, there’s more. If you are lucky enough to be able to query AdWords API data, it is possible to receive daily ad test significance updates automatically. Our own solution at Forward 3D is an SQL query which generates an alert once an test is significant, and to output the winning ad and the losing ad, including performance data and ad location.

AdWords ad labels are available from the API and can be used to define a testing group and an ad type for potentially every ad in the account. If we assume we have 3 ad types for each test, we might generate an output such as the following.

Here we can see that in testing group 1, ad A has a significantly different (and higher) CTR than ad B, and in the other testing groups we also see an ad test with a statistical significance. In each case I can pause the losing ad, write a new variant of the winning ad, and the next test starts automatically. There may be additional considerations you need to make, such as retaining at least 2 ads in each ad group, but these decisions can also be automated to an extent.


The internet is a fantastic source of information to help you decide how to define how your ad tests are performed, but there are far more effective ways to use technology to scale your output. A few hours spent generating an automated ad testing solution can save many more hours down the line. If you work within an agency environment or manage an account with more than a few campaigns, there are fantastic time saving measures available to you now which put browser based ad testing methods to shame.


Max Brown - Technical Consultant