Analyzing the Analysis Options

Regression analysis, CHAID, predictive modeling. Even if you don’t bandy these terms about, surely you’ve heard them used. But do you really know what distinguishes one from the other? Here, a quick definition of terms:


The most basic form of database analysis, RFM segments a house file or rental list by recency of purchase, frequency of purchase, and the monetary value of the purchases. Say you have a file of 10,000 names. Taking recency first, you might assign a value of “5” to the 2,000 customers who purchased from you most recently. The 2,000 customers who bought next most recently would be assigned a “4,” and so on; the 2,000 customers who have gone the longest amount of time without purchasing would be assigned a “1.” Then you’d take the same 10,000 names and rank them in terms of frequency: The 2,000 who made the most purchases would be ranked “5,” and so on. Finally you’d rank the 10,000 names by how much money they spent with you. Once you finished, you’d assign each name a cumulative score. A customer who received a 5 for each recency, frequency, and monetary value would score “15,” making him a better-performing customer than one who scored a “3” for recency, a “4” for frequency, and a “2” for monetary value.

“Anybody who understands basic analysis will be able to do RFM on a PC,” says Debra Ellis, president of Barnardsville, NC-based Wilson & Ellis Consulting. “But if you want to drill down into your group of 1,000 people to find the 100 that are absolutely the most likely to order from you, you need to move onto profiling.”


This technique allows you to describe your customer segments in broad terms. For instance, your typical customer may be a woman between the ages of 25-35 with an income of $60,000-$70,000, and your best customers may be women ages 30-35 with income of at least $70,000. Profiling also allows you to single out specific buying patterns and preferences of customers. For example, you might decide not to mail a customer in the summer if profiling shows that customer has ordered from you only during the past 10 winters.

“You can just run a demographic overlay over your customer base to get that information,” says Ellis. But since your customer base or product mix may change, you should profile on a continuous basis.

Predictive modeling

Also known as statistical response modeling, this uses known customer or prospect attributes such as age, income, and household composition to produce a score to predict behavior. For example, a home furnishings cataloger might use attributes such as recent moves and home ownership to determine the best prospects. To create a robust model, however, you need to use a number of attributes simultanousely.

Regression analysis

Think of it as RFM meets predictive modeling. Names are scored by multiple attributes or factors. The key factor is the one deemed most likely to predict future buying behavior; others are weighted according to how important they are to determining purchasing behavior. Let’s say recency is the predictive factor, with frequency, monetary value, and demographics as independent factors. Household income might be factored by a number so that wealthier buyers are ranked as proportionately more valuable than less-wealthy customers. Logistic regression uses a dichotomous dependent variable — one that can be separated into only two parts. Response is one such variable-the customer either responded or didn’t. Multiple regression is typically used for a continuous dependent variable, such as dollars, explains Peter Vlahakos, vice president of analytical services for Woodcliff Lake, NJ-based Donnelley Marketing.

“This is a robust technique, probably the most popular analysis,” Vlahakos says. “But something is lost without a sophisticated analyst to do it.” He adds that some semiautomated modeling systems allow an analyst to step in at various points.

CHAID/tree analysis

Catalogers can use Chi-Square Automatic Interaction Detector (CHAID) and other types of tree analysis to divide their database into homogeneous groups by factors such as demographics and purchase history. “It’s like your entire database is a tree trunk, and you split off groups into branches based on statistical lifestyle and external factors,” Ellis explains. As you continue to branch off into smaller subsets, it is possible to see which groups outperform others and how the data interact. CHAID is flexible, allowing you to slice and dice the file as you see fit. One potential drawback: “If you miss a critical variable, you’re spending a lot of money for nothing,” says Ellis.


This profiling method allows you to group customers by variables such as location, age group, Standard Industrial Classification (SIC) code, income, and purchase history. You can then use this information to target mailings to specific segments of customers. For example, a cataloger selling traditional home furnishings might use zip codes to target customers who live in high-end subdivisions, explains Ellis. If your goal is to cross-sell or upsell products, Vlahakos says, you might do product affinity analysis on house file names, clustering customers based on their likelihood to buy certain types of product.

By grouping people according to certain assumptions, though, you may inadvertently eliminate prime prospects. For example, a rural town with a single zip code may be dismissed as a low-income area by clustering models that ignore the presence of high-end subdivisions within the town, says Ellis.