Developing Predictive and Prescriptive Business Analytics: A Case Study
In its initial form, BI existed as a “reporting function” that delivered summaries of historical operational data to decision-makers. Often, these reports tracked a variable (e.g., revenue by product or region) and summarized data in terms of Year-to-Date (YTD), Month-over- Month (MOM), or Year-over-Year (YOY) aggregate summaries. These summaries provided a useful account of past performance, but gave little or no insights of what couldbe expected in the future.
Predictive and Prescriptive Analytics– What is fundamentally different?
Predictive analytics is the practice of extracting information from historical data to determine a pattern and to predict the future. It utilizes a variety of statistical, modelling, data mining, and machine learning techniques to study recent and historical data, and allowing analysts to make predictions about the future based on the historical data.
Prescriptive analytics goes beyond predicting future outcomes by suggesting actions to be taken in the present to create the most favorable future business outcomes.
We developed forward-looking analyticsfor a membership-based consumer buying club that was transforming itself from a brick-and-mortar retail business into a digital e-Commerce business.Based on a franchise model that included approximately 70 franchise locations throughout North America, the business model was based on creating “Leads” through various marketing campaigns and activities, and the conversion of leads into new “Memberships.” Members pay a one-time upfrontmembership fee, and then have access to products from approximately 700 merchants and 1 million product SKUs at preferred wholesale prices.Memberships were renewable on an annual basis at a reduced cost. Merchandise categories included home furnishings and improvement, entertainment and outdoorequipment, flooring products, accessories and travel.
All in all, a total of 22 lead sources were pursued that covered 6 marketing channels including Display ads, TV, Pay-per-click, Search Engine Optimization (SEO), Direct Mail and Email Campaigns. Each lead source had an associated budget to carry-out its marketing campaigns.
Predictive Modeling: Marketing Channel Resource Allocation Problem
To give direction to the allocation of funds by marketing channel, the relationship between the amount spent on the lead source and the resulting lead count needed to be established.
Regression techniques are a prevailing method for predicting the value of the dependent variable (i.e., the number of leads generated per lead source), given at set of independent variables (i.e., the dollars spent on a lead source). Webuilt a regression model usingan “R-Linear Model”algorithm to determine this relationship.
In our example, we identified four of the 22 lead sources as the best predictors of the number of overall leads generated. One lead source was from SEO (C_DBCOM_CA), two lead sources were from Pay-per-click (C_NBPPCL_WP, C_NBPPC_WP) and one lead source was from Display channel advertisements (C_QS_LA).
Lead source C_QS_LA had a coefficient of 0.32 and contributed most to the overall lead generation efforts, while lead sources C_ NBPPCL_WP and C_NBPPC_WP had coefficients of 0.023 and 0.025, respectively. Thus, lead source C_QS_LA, as it represents the most effective Marketing channel, should get preferred funding, while the other lead sources contribute significantly less to the overall lead generation efforts. In particular, lead source C_DBCOM_CA actually had a (negative) coefficient of -0.039. That is, every dollar spent on this lead source created a “distraction” in the marketing channels that actually resulted in fewer overall leads! The marketing team was advised to immediately abandon this channel and to not fund it any longer.
Prescriptive Modeling: Customer Retention Problem
Rather than having annual memberships expire and initiating re-acquisition discussions with the expired membership base, management started to explore options to be more pro-active and to not have memberships expire in the first place.
Analysis of historical renewal data revealed that during any given month approximately 18,000 members were up for renewal. From a Corporate staffing and resource allocation perspective, it was not practical to have a proactive outreach program and to call all of these members before their membership expired. But it was considered feasible to call a sub-set of “at risk memberships”
To target and identify members who were less likely to renew their membership, we needed to build a predictive model using historical data of member engagement and renewal patterns, and construct a metric based on threshold values to determine whether a particular member will renew or not.
We collected historical data on over one million renewals over a 15-year period and developed a Decision Tree Model using R to identify discriminating renewal patterns. Decision trees are used in a variety of disciplines, and are well suited to discriminate variables in multivariate situations.
Data for our study was collected from various sources including the Corporate ERP system, a POS system, and an e-Commerce database that specified members’ historical purchasing and buying patterns. Information about a member wasconstructed as a vector of attributes that contained predictor variables including historical data on a member’s total purchase orders, previous membership renewal patterns, previous 12 months’ purchase order count and dollar volume, membership renewal price,and membership status.
The output of the predictive model depicted the discriminating factors in priority order that determined whether a member is likely to renew a membership (“Active”) or to let the membership lapse (“Expired”).
According to Figure 2, the most discriminating factor identified was the “Total Order Count” that reflected the number of purchases a member made during the previous twelve (12) months. Based on the historical data, at least one purchase interaction during the previous year resulted in almost certainty that the member would renew the membership. As it turned out, the dollar amount of the purchase was not important; it was the engagement/interaction that mattered.
The second most important factor included the member’s previous renewal history: even if the member had not made a purchase during the previous year, but had a record of at least three (3) consecutive renewals in previous years, this member would again renew with almost “certainty.”
Based on the insights gained from the analysis, the IT department started to produce monthly “renewal lists” that indicated i) which members had not renewed at least three times in previous years, and, in addition, ii) had not made a purchase during the previous twelve months; these members were most likely to let the membership lapse and thus were targeted pro-actively by the Member Care team. Once the developed heuristics were applied to the original 18,000 renewal members per month, approximately only 3,000-4,000 at-risk members remained. The analysis helped focus scarce Member Care resources and allocate them to the immediate “at risk” membership group.
Summary and Conclusions
Through the application of advanced analytics (e.g., regression techniques and decision tree models), we were able to show how resource allocations problems can be solved, and how a more effective and targeted membership outreach function can be created in a retail environment. Through the insights provided, Marketing and Member Care teams were enabled to develop new campaigns and member incentive programs that were better suited to attain desired business objectives and results spent on this lead source created a “distraction” in the marketing channels that actually resulted in fewer overall leads! The marketing team was advised to immediately abandon this channel and to not fund it any longer.