Some misconceptions about data mining Mar 1, 2000 12:00 PM
, SAM KOWALSKI
JobZone
Search and post jobs for the Multichannel Merchant. Including jobs for brand & agency marketers, e-commerce, catalog marketers, ops & fulfillment, direct marketing and more.
The very notion of being able to "mine" data conjures up an image of
discovering a rare and valuable mineral in your database. But just as all
that glitters isn't gold, data mining may not be a marketing cure-all.
The hype of data mining has been phenomenal. Most define it as a rigorous
analysis of data to discern patterns that may have gone undetected;
data-mining proponents then try to exploit results with more-intelligent
marketing programs. When data mining first became popular, many marketing
managers expected
it - and its semiautomatic tools incorporating tree analysis and regression
technologies - to resolve all sorts of marketing issues, including
prospecting, loyalty, and retention. But these great expectations, combined
with the proliferation of data-mining software, have created several
fallacies in the direct marketing industry:
1) New data-mining tools mean more effective modeling. The modeling tools
used to examine data and identify hard-to-find patterns have been available
for years. In fact, marketers have used regression and tree-based
technologies for at least 25 years. What is new is the blending of several
tools into one software package. This "umbrella" framework has prompted
many to believe that results of a data-mining exercise far exceed the
outcomes available through the pre-data-mining era. After all, "new is
better."
The new graphical user interfaces (GUIs) also contributed to the popularity
of data mining. GUIs enable users to complete an analysis by using the
mouse to select commands and functions from well-designed menus and
graphics. Previously, users had to use the keyboard to manually type in
commands to perform the analysis.
The ease of GUIs prompted many industries to begin using data-mining tools.
Mutual-fund managers used them to assist in buy-and-sell decisions. The
petroleum industry used them to calculate the odds of locating lucrative
oil deposits. And in the mid-1990s, catalogers jumped on the data-mining
bandwagon as well. After all, margins were getting thinner, competition was
getting thicker, and no one was quite sure how the Internet would affect
the landscape.
But some problems that users perceived to be better understood through data
mining may have been comprehended just as well with traditional modeling
tools. For instance, marketers often develop a retention model to guide
them in selecting customers to contact. While this is only one small piece
of a well-designed retention program, it nevertheless plays an important
role - and one that experienced marketers have been using for 20 years!
Which leads one to conclude that it is not the modeling or data-mining tool
that will improve model performance, but rather the creative use of data.
No tool can replace creativity and ingenuity.
2) Data-mining tools speed up the analysis process. Many marketers don't
know how long model development and analysis takes to begin with.
Harte-Hanks recently asked a group of marketing managers which of the
following parts of a data-mining effort take the longest to complete:
- pre-algorithm formulation, including data cleansing and data
transformation (arriving at new potential predictors that are not readily
available from the analysis data set);
- algorithm formulation, including the generation of model and/or rule code; or
- post-algorithm formulation, including quality control and preparation of
model documentation.
More than half of the marketers stated that algorithm formulation would
take the most time, with this phrase demanding 40%-70% of the total project
time. But experience shows that it's the pre-algorithm formulation that
takes the most time - up to 65% of the entire process is spent cleaning and
transforming the data. That's nearly twice the time spent data mining and
post-modeling.
No current data-mining tool can take all the poor data often found on
analysis files and "clean" them. For example, a recent file I worked with
had a value of "0" for the number of children. While this factor might have
been true, the finding was inconsistent with the nature of the data being
studied. After making some inquiries, we soon discovered that "0" was being
used as a missing value indicator. A typical data-mining tool would not
have captured this irregularity.
So data mining in and of itself will not provide the answer, and
data-mining tools will not make the process faster. Someone has to look at
the data and ask the appropriate questions. This pre-algorithm step can
take a week or longer.
3) Data-mining tools work by themselves, requiring little human
intervention to perform the analysis. Some believe that the final product
of a data-mining project is self-explanatory, but this is definitely not
the case. An analyst needs to interpret and make recommendations based on
the results.
Take the case of an electronics catalog executive who, after completing a
data-mining exercise, discovered that first-time buyers were his best
customers. Of course the rule of thumb in the industry is that existing
customers respond better to direct mail programs, so the results of his
analysis appeared to be in direct contrast to the accepted norm.
The cataloger reasoned that his aggressive and intelligent marketing helped
him succeed in attracting first-time buyers, and he used the result of his
model (the first one he had developed) as a guide for his very next
mailing. But as he began to analyze the results of this subsequent mailing,
he noticed that the better responders were coming from what he had expected
would be the poorer-performing segments.
After further examination, he discovered that the original files used to
mine the data had not been complete. Indeed, the oldest 24 months of
history had not been appended to these analysis files. This resulted in the
existing customers "looking like" newer buyers.
This type of data-mining mistake is occurring more often in the industry,
as marketers mistake the tools and technology meant to help them for "magic
bullets" that will do all the work. In reality, it's important to remember
that data-mining doesn't and never will imply automatic analysis. But if
used by a skilled and business-savvy analyst, data mining can and will make
significant contributions to your business. It should be just one of the
many tools in the marketer's tool set.