Some misconceptions about data mining

The very notion of being able to “mine” data conjures up an image of discovering a rare and valuable mineral in your database. But just as all that glitters isn’t gold, data mining may not be a marketing cure-all.

The hype of data mining has been phenomenal. Most define it as a rigorous analysis of data to discern patterns that may have gone undetected; data-mining proponents then try to exploit results with more-intelligent marketing programs. When data mining first became popular, many marketing managers expected

it – and its semiautomatic tools incorporating tree analysis and regression technologies – to resolve all sorts of marketing issues, including prospecting, loyalty, and retention. But these great expectations, combined with the proliferation of data-mining software, have created several fallacies in the direct marketing industry:

1) New data-mining tools mean more effective modeling. The modeling tools used to examine data and identify hard-to-find patterns have been available for years. In fact, marketers have used regression and tree-based technologies for at least 25 years. What is new is the blending of several tools into one software package. This “umbrella” framework has prompted many to believe that results of a data-mining exercise far exceed the outcomes available through the pre-data-mining era. After all, “new is better.”

The new graphical user interfaces (GUIs) also contributed to the popularity of data mining. GUIs enable users to complete an analysis by using the mouse to select commands and functions from well-designed menus and graphics. Previously, users had to use the keyboard to manually type in commands to perform the analysis.

The ease of GUIs prompted many industries to begin using data-mining tools. Mutual-fund managers used them to assist in buy-and-sell decisions. The petroleum industry used them to calculate the odds of locating lucrative oil deposits. And in the mid-1990s, catalogers jumped on the data-mining bandwagon as well. After all, margins were getting thinner, competition was getting thicker, and no one was quite sure how the Internet would affect the landscape.

But some problems that users perceived to be better understood through data mining may have been comprehended just as well with traditional modeling tools. For instance, marketers often develop a retention model to guide them in selecting customers to contact. While this is only one small piece of a well-designed retention program, it nevertheless plays an important role – and one that experienced marketers have been using for 20 years!

Which leads one to conclude that it is not the modeling or data-mining tool that will improve model performance, but rather the creative use of data. No tool can replace creativity and ingenuity.

2) Data-mining tools speed up the analysis process. Many marketers don’t know how long model development and analysis takes to begin with. Harte-Hanks recently asked a group of marketing managers which of the following parts of a data-mining effort take the longest to complete:

– pre-algorithm formulation, including data cleansing and data transformation (arriving at new potential predictors that are not readily available from the analysis data set);

– algorithm formulation, including the generation of model and/or rule code; or

– post-algorithm formulation, including quality control and preparation of model documentation.

More than half of the marketers stated that algorithm formulation would take the most time, with this phrase demanding 40%-70% of the total project time. But experience shows that it’s the pre-algorithm formulation that takes the most time – up to 65% of the entire process is spent cleaning and transforming the data. That’s nearly twice the time spent data mining and post-modeling.

No current data-mining tool can take all the poor data often found on analysis files and “clean” them. For example, a recent file I worked with had a value of “0” for the number of children. While this factor might have been true, the finding was inconsistent with the nature of the data being studied. After making some inquiries, we soon discovered that “0” was being used as a missing value indicator. A typical data-mining tool would not have captured this irregularity.

So data mining in and of itself will not provide the answer, and data-mining tools will not make the process faster. Someone has to look at the data and ask the appropriate questions. This pre-algorithm step can take a week or longer.

3) Data-mining tools work by themselves, requiring little human intervention to perform the analysis. Some believe that the final product of a data-mining project is self-explanatory, but this is definitely not the case. An analyst needs to interpret and make recommendations based on the results.

Take the case of an electronics catalog executive who, after completing a data-mining exercise, discovered that first-time buyers were his best customers. Of course the rule of thumb in the industry is that existing customers respond better to direct mail programs, so the results of his analysis appeared to be in direct contrast to the accepted norm.

The cataloger reasoned that his aggressive and intelligent marketing helped him succeed in attracting first-time buyers, and he used the result of his model (the first one he had developed) as a guide for his very next mailing. But as he began to analyze the results of this subsequent mailing, he noticed that the better responders were coming from what he had expected would be the poorer-performing segments.

After further examination, he discovered that the original files used to mine the data had not been complete. Indeed, the oldest 24 months of history had not been appended to these analysis files. This resulted in the existing customers “looking like” newer buyers.

This type of data-mining mistake is occurring more often in the industry, as marketers mistake the tools and technology meant to help them for “magic bullets” that will do all the work. In reality, it’s important to remember that data-mining doesn’t and never will imply automatic analysis. But if used by a skilled and business-savvy analyst, data mining can and will make significant contributions to your business. It should be just one of the many tools in the marketer’s tool set.