The Power of Predictive Modeling

Want to build customer lift and retention? It’s easy to do.

All you need is right there in your database.

Predictive modeling allows for more informed marketing decisions and more effective resource allocation. Instead of blasting the same offers to everyone, you can target by segment or focus on particular customer groups based on current or potential value. To get started, focus on this step-by-step process:

  • Set objectives. What behavior do you hope to change? Increase the number of customers responding to a bonus offer? Discover which customers are in danger of leaving you for a competitor? Your business objectives drive the direction of your analytical work.
  • Create your dataset. Identify the modeling universe—the set of customers to whom you will apply the model. Inventory and assess the available data. How confident are you of its quality? Will your universe be constrained by spending thresholds? Geography? Recency of spend? Are you building your model at the individual or the household level? Account for suppressions—those customers you will exclude from the model. For example, a model supporting an e-mail campaign must exclude customers who have opted out of e-mail.
  • Define the outcome. What behavior are you trying to predict? Analysts commonly refer to the model’s outcome as the dependent or performance variable. The nature of the outcome variable depends on your business objectives. If you seek to increase the number of offer respondents, then your variable might be likelihood of response to an offer. Other results might include likelihood to attrite, revenue or profitability projections.
  • Determine your method. Most predictive modeling involves some degree of regression analysis, which examines the relationship of a dependent or response variable to specified independent or explanatory variables. Regression equations allow the analyst to estimate the dependent variable’s value based on those of the independent variables. For predictive modeling purposes, these might include recency, frequency, transaction size, age, gender or income level.

    The two primary regression analysis methods are linear and logistic regression. Linear methods assume a direct correlation between the explanatory and response variables, and can be quite accurate when you’re 100% certain of the data source—say, transactional data from your loyalty program database. Logistic methods allow the analyst to consider additional customer dimensions, such as demographic or survey data, and apply weights to those variables that provide the degree and direction of their relationship to the outcome.

  • Establish the time period. Your modeling timeframe typically includes an observation period and an outcome period. For example, using 24 months of customer data, you might designate the first 12 months as the observation period and the next 12 as the outcome. Use the first-period data set to select the raw explanatory variables within your model, and the second to road-test the model and its validity. Length of the observation period can vary greatly based on available data. Generally, the more recent the data, the stronger its predictive power.
  • Perform pre-modeling. Pre-modeling is where analysts really earn their paychecks. Analysts scrub the data with a variety of techniques to determine what explanatory variables truly relate to the outcome. Uni-variate analysis examines each variable in isolation to eliminate sparsely-populated variables—if only 5% of your database includes customer age, then that variable won’t have much bearing on the outcome. Bivariate analysis allows the analyst to quantify each explanatory variable’s correlation with the outcome variable—for example, does age influence likelihood to respond?

    Next, analysts put the candidate variables through a process of transformation to improve their predictive power. They’ll trim away outliers—data points with extreme values that will skew the outcome. They’ll impute missing values, often by inserting the mean or the median value for those variables or inferring values through more sophisticated techniques. They’ll also drop variables highly correlated to one another to mitigate the effect of co-linearity. In other words, the analyst disentangles two or more closely related variables that might affect the model’s reliability. Only the strongest variables survive.

  • Build and test the model. Employing candidate variables, the analyst constructs an initial predictive model. Model-building is an iterative process in which the statisticians prototype, test, analyze and refine the model until it achieves optimum predictive power.

    Once satisfied with the initial results, your analysts run diagnostics to ensure the model’s integrity. Trouble-shooting might include resolving model underfeeding—too few variables into the model—or its opposite, model overfeeding. In logistic analysis, do weights applied to explanatory variables make sense? When you graph the results of your model, how well does the graph’s curve fit the actual data points? Curve over-fitting or under-fitting can wreck the usefulness of your model, so the analysts will continue to tweak it until they achieve goodness of fit and have minimized the difference between the data points and the curve.

    Finally, the analyst validates the model by running the algorithm on a hold-out customer sample from the outcome period data-set. This back test is the next best thing to testing your model live in the market.

  • Implement the model. The analytical team now prepares a new customer scoring universe with which to test the model in a live environment. The analysts ensure that the explanatory variable distributions resemble those in the modeling universe. Numerous changes in the data warn that the model may not prove out in a live environment.

    The analyst moves any time-related variables forward to reflect the live market, then manually generates a handful of scores to ensure that they match computer-generated scores. Finally, the model is released into the wild. The analyst runs the model against the scoring universe to produce a set of customer scores that predict our defined output variables.

    Based on those scores, the analyst ranks the projections, prepares an output file and delivers a report to the marketing team.

Colleen Ryan is a consultant at Colloquy, a loyalty marketing, consulting and publishing firm.