There has been too much investment made trying to come up with new data mining tools rather than focusing on the input and manipulation sides of the problem, according to Jim Wheaton.
The cofounder of Chicago-based Wheaton Group and its Daystar Wheaton Group joint venture, Wheaton says he is not against new data mining tools per se. What he is against is extravagant claims by the software manufactures, and the resulting sloppy data management.
“One extravagant claim is that experienced human analysts will no longer be required,” Wheaton says. “The problem is that it is easy to write software to identify statistical patterns in the data. But, it is a lot more difficult to figure out which of these patterns makes business sense and will hold up over time.”
For example, Wheaton says, consider a model to predict the short-term purchase volume of each of a retailer’s customers. If the software identifies a strong positive relationship between purchase volume and ownership of the retailer’s own credit card, should the variable be allowed into the model?
Or would your answer change if you knew that the analysis file was cut at the time of the card’s inauguration sign-up period, when the retailer’s most fervent customers had rushed to enroll? Or would you expect that over time, as the retailer resorts to ever-more-enticing offers to expand the number of card users, the relationship of card ownership to purchase volume will change?
“These are the sorts of questions that an experienced human being rather than a piece of data mining software are equipped to ask,” Wheaton says. “Often, we see cutting-edge data mining software employed against marketing databases that house inferior data content.”
In order to be truly effective, data mining software requires what Wheaton refers to as best practices marketing database content.
For example, a model he recently constructed to predict which customers would respond to a holiday promotion. Unfortunately, Wheaton says the client’s data management vendor was rolling all data content older than 36 months off the database during every update cycle, and was not even archiving the data.
By definition, the only way to build the holiday model was to go back to the previous holiday promotion. This reduced to 24 months the historical data available to drive the model. Even worse, Wheaton says, the model had to be validated off another holiday promotion.
“Of course, the most recent one had taken place two years earlier. This, in turn, reduced to 12 months the amount of available data,” Wheaton said. “Clearly, this is no way to support breakthrough data mining.”