Can You Trust Your Test Results?

Mar 26, 2007 9:43 PM  By

Direct marketing is all about the numbers. Whether it is response rate, lifetime value, square-inch analysis, or any other metric you can dream of calculating, we all rely on measurability to give our businesses the advantage they need to succeed.

This measurability also affords us the opportunity to test ideas in the marketplace with a high degree of accuracy and a low degree of risk. That’s a powerful combination. But lately we’ve been seeing some data at Lenser that caused us to ask; How high is that degree of accuracy? Are we taking anything for granted?

Executing a direct marketing test mirrors the venerable scientific method, brilliant in its simplicity: We develop a hypothesis, set up a control group, change one variable in a test group that is exactly like the control in all respects, and measure the impact of that change. But an interesting thing happens to the scientist in us during this process. That is, we tend to bias our analytical efforts toward the test group, and understandably so. Implicit in this method is the assumption that we know everything there is to know about the control group; otherwise you couldn’t confidently ascribe any measured changes to whatever you are testing.

When one mailer recently started seeing repeated (albeit isolated) counterintuitive test results, we thought it was time to take a hard look at that “control” group. We tested the predictability of the control by setting up two identical controls in a recent test instead of just one. None of the customers selected into these control files were mailed a catalog in a chosen drop. Our goal was to determine the incremental value of that specific catalog contact. When it came time to analyze the results of the test, we looked to see if the two “identical” groups of customers were behaving identically, as one would expect.

We were lucky to be able to test across more than one title in this case, ending up with richer data, and we also had the ability to isolate Internet-acquired buyers from catalog-acquired buyers. The results are shown below, and what we discovered was surprising – a 32% to 34% difference in sales being generated by the two holdout control groups. This seemed inexplicable to us, with the alarming notion that if we can’t trust the control, we can’t trust the results of the test. But let’s take a closer look. That variance was being driven largely by the average order value (AOV) of Internet-acquired buyers in both titles. By contrast, catalog-acquired buyers and requestors were much less volatile in their buying behavior, and this is significant in understanding what’s happening here.

Can we calculate these same results by e-mail, by search engine optimization, by search engine marketing, by affiliates? Our guess at this time is that just one or two of those factors are behind this apparently quixotic behavior.

And so, as is often the case in this discipline, there is always more data to mine. We are reminded that with the proliferation of multiple marketing channels, changes in our customers’ behavior are accelerating. We would be wise to occasionally challenge assumptions that pre-date seismic changes in our marketing arsenal, be they tactical or strategic. And finally, multichannel marketers must be prepared for multidimensional analyses. I believe our business metrics are telling us to do it. The corollary to that is, of course, that our customers are telling us to do it.

Jude Hoffner is director of circulation, business-to-consumer markets, for San Raphael, CA-based catalog consultancy Lenser.