Basic terminology

Apr 01, 1999 10:30 PM  By

RFM analysis: Selecting panels of buyers or prospects for promotions based on permutations of historical recency, frequency, and monetary value, among other criteria. The criteria are intuitively determined, based on historical performance. Each panel or cell is homogeneous in composition (for instance, all customers within a particular cell might have 0-6 months recency, two lifetime orders, and ñ50-100 average order size).

Multiple regression: A statistical technique that interrogates multiple potential predictors, such as independent variables like recency and frequency. This technique finds the subset that best predicts future behavior (the dependent variable, such as response), and weights them by assigning multiplicative coefficients so that a file of customers or prospects can be sorted in terms of most desirable to least desirable predicted behavior. Regression assigns a unique score, such as predicted behavior, to every individual. But very different individuals can receive the same scores (e.g., a person with 36-month recency and three lifetime orders might have the same score as someone else with three-month recency and one lifetime order).

Tree analysis (e.g. CHAID): A statistical method of dividing customers into homogeneous groups by such factors as purchase history and demographics. The resulting groups can be rank-ordered for some performance measure such as response or sales, and are always applied to situations with a dependent variable and a number of independent variables. The groups can provide insight into customer behavior and identify marketing opportunities. Tree analysis often is used as an intermediate step in regression, to find interactions in the data. For instance, individuals who are older as well as affluent might be interested in buying a Cadillac.

Tree analysis creates cells just as RFM does, but it is less dependent on human intuition. Generally, human intuition determines the variable breaks during the data preparation stage (e.g., average order size = ñ0-ñ25, ñ25.01-ñ50, ñ50.01-ñ75). But tree analysis uses statistics rather than human intuition in determining how the variable categories should be grouped.

Multiple discriminant analysis: A classical statistical technique to classify observations and assign them to distinct, mutually exclusive groups that are reviewed and assigned descriptive names, such as “soccer moms.”

Neural networks: Originally developed to mimic the human brain process, these methodologies “learn” from data via various mechanisms. They make no assumptions about the distributions of predictor variables or targets, and can model highly nonlinear relationships and capture difficult to see interactions.

Statistical significance: The determination of whether two results are different enough to be considered real. For mailers, most often used to determine if response differences between two test panels, or test and control panels, can be used to determine a winner. It is also used to evaluate research results, using statistical techniques that incorporate the following definitions:

Confidence or confidence level: Degree of certainty in the accuracy of a test result, often expressed as a range or confidence interval around the test result (see illustration above). Say you have a test result of 1.0% response rate on a 26,790 mailing quantity. You should have 90% confidence that the true universe response is 0.9-1.1% (range). In theory, if an experiment is repeated 100 times, we should expect the result to fall within the range 90 times.

Precision: One-half of the range. If you want the results to be +/-10%, 10% refers to the precision. Note: 90% confidence level does not necessarily have precision of +/-10%.

Tails: Extreme areas of the distribution, both to the left and right of the mean. Tails typically fall plus/minus two standard deviations beyond the mean, or average, of the distribution.

Outliers: Atypical observations. Approximately 95% of the observations should fall within two standard deviations of the mean. Outliers, or approximately 5% of the observations, can fall into either the left or the right tail of the distribution.–CBW