The Marketing Science Signal

Marketing Data Science

Mike is a leader in the field of Marketing Data Science & Operational Strategy with 20+ years leading global Data Science, AI/ML, and Marketing Analytics teams at Dell Technologies, Cisco, Pure Storage, Hitachi Vantara and Hearst Media. He is also an Accredited Professional StatisticianTM with the American Statistical Association.
  • The Multi-Channel Force Multiplier: How Bridging Digital Nurture and Direct Outreach Triples Conversion Lift.

    The Problem: Beyond the “3+ Rule”: It is widely accepted that a synergistic media mix will always outperform a single media vehicle. Historically, the industry adhered to the “3+ rule”—popularized in the 1970s—which suggested that three exposures to a message were required to influence a purchase. In the digital age, however, that threshold has risen to a frequency of 7+ or more. While media and advertising agencies have used techniques like linear programming and mainframe-based syndicated survey data since the 1980s to optimize these mixes, modern integrated marketing campaigns require a more sophisticated touch.

    The Implementation Gap: Throughout my career, my data science teams and I have built media mix optimization models for numerous B2B companies. While these models are often adopted in principle at the executive level, they frequently prove too complex for practical implementation. Often, marketing groups remain so tactically focused that executing a systematically integrated campaign feels unfeasible, rendering the optimization an “academic exercise.” Although modern vendors provide multi-touch attribution (MTA) methods to track touches against opportunities and allocate budgets, the real value lies in using this data as a foundation for deeper optimization work.

    A Strategic Priority for the CMO: In practice, CMOs—such as Todd Forsythe and Jonathan Martin—leverage these models to calibrate marketing budgets and enhance overall effectiveness. Building a media mix model is typically the first task I undertake when launching a new data science practice. It is a baseline expectation that a CMO understands the optimal mix for generating pipeline and revenue and allocates their budget accordingly.

    Innovation through Association Rules: My approach to media mix modeling has evolved toward leveraging association analysis. The idea originated from Ling (Xiaoling) Huang, and I refined the methodology through collaborations with Yexiazi (Summer) Song and, most recently, Fuqiang Shi, to develop models for diverse business units.

    According to Miller (2015), this technique is commonly referred to as Market Basket Analysis, a concept born in retail:

    “Market basket analysis, (also called affinity or association analysis) asks, what goes with what? What products are ordered or purchased together?”

    By applying this retail-focused logic to media, we can uncover the hidden relationships between marketing channels. As Zhao et al. (2019) noted:

    “The challenge of multi-channel attribution lies not just in identifying the final touchpoint, but in uncovering the hidden synergies where the presence of one marketing stimulus significantly amplifies the effectiveness of another.”

    Zhao et al. (2019)

    The Data: Constructing a Hybrid Marketing Funnel

    To demonstrate this methodology, I synthesized a consolidated dataset of over 45,000 interactions by merging the UCI Bank Marketing Dataset with the Kaggle/Criteo Multi-Touch Attribution (MTA) Dataset. While these sources represent different industries—fixed-term bank deposits and e-commerce—their combination provides a comprehensive view of the modern buyer’s journey, blending high-frequency digital touchpoints with high-touch personal outreach.

    Dataset Profiles

    1. UCI Bank Marketing Dataset

    This dataset is famous for being highly imbalanced, which is a “real-world” marketing scenario.

    • Non-Converters: About 88% of the rows. Customers who were called but did not subscribe to the term deposit.
    • Converters: About 12% of the rows.
    • Why it matters: Because most people don’t convert, when the model finds a channel like Mobile Outreach that has a high Lift, it’s mathematically significant.

    2. Kaggle/Criteo Attribution Dataset

    This dataset is designed for Multi-Touch Attribution (MTA).

    • Non-Converters: These are “Journeys” where a user clicked on ads (Email, Display, etc.) without converting to a sale.
    • Converters: Customer journeys that resulted in sales.
    • Why it matters: this helps improve marketing effectiveness and efficiency by identifying waste (channels that people click on but never lead to a sale).

    Data Preprocessing

    Raw data labels were standardized into a unified “Media Mix” master dataframe to ensure consistency across sources:

    • (Kaggle MTA) Email Nurture: Represents the high-volume digital baseline.
    • Cellular (UCI Bank Mkt.) Mobile Outreach: Represents direct personal contact via mobile.
    • Telephone (UCI Bank Mkt.) Telemarketing: Represents landline-based outreach.

    The preprocessing phase involved concatenating the sources, indexing, removing non-essential characters, and handling missing values. I then formatted the dates and standardized the channel categories to enable a seamless cross-platform analysis.

    The Data: A Hybrid Funnel Approach

    By merging these records, we are modeling a hybrid B2C customer journey. This reflects the reality of high-value industries like financial services, where a customer might be prompted by a high-volume digital ad (Criteo) but requires a personal, high-touch mobile conversation (UCI) to finalize a complex transaction.

    This balanced view allows the association rules to uncover synergies across the entire funnel, rather than looking at digital or offline channels in isolation.

    Apriori Methodology

    Using the Apriori Algorithm (Raschka, 2018, Agraval et al. 1996), I processed over 45,000+ interactions to identify Synergy Lift which is one of several measures of impact. According to Miller (2015)  “an association rule is a division of each item set into two subsets with one subset, the antecedent, thought of as preceding the other subset, the consequent. The Apriori algorithm … deals with the large numbers of rules problem by using selection criteria that reflect the potential utility of association rules.” Essential the rule is Antecedent (marketing mix) à Consequent (conversion). Here are the key measures:

    • Support (scale) how often this mix occurs:
      • Support = occurrences of rule (mix+conversion) / total customer base
    • Confidence (predictability) how often a customer buys when exposed to this mix:
      • Confidence = ratio of conversions for each media mix combination.
    • Lift (synergy) how well this mix performs vs. the average:
      • Lift = Confidence/Support (consequent or avg. conversion rate)

    Model Output

    Once I ran the Apriori algorithm and selected the top rules by highest lift (mix performance vs. the average) it was clear that the volume of marketing interaction was not aligned to conversion potential.  The chart below has to be interpreted with caution, and illustrates the danger of looking at any channel in isolation, because email looks high volume/ low lift however it often occurs in high-potential integrated combinations.

    Lift (>1) is the force multiplier and so that is how I evaluated the combinations for effectiveness in converting customers. For example, many combinations without mobile, direct mail, email and telemarketing had high impact.  Activities that happen on the right-hand side of the arrows (>>) are happening at the same time as the customer conversion (Converted), and so should be considered part of the overall mix.

    New product launches, combined with brand awareness, email and retargeting had the most impact, followed by a similar mix that replaced launches with discounting – both are time sensitive calls to action and so this makes sense.   Activities on the left-hand side of the arrows are typically the ‘nurture’ phase while the right-hand side is the conversion event.

    This visual shows the top media mix combinations and their performance relative to the baseline.  So, for a financial services marketer this would be the roadmap for funding and executing integrated marketing campaigns:

    If we want to look at combinations to check a particular pair of marketing channels, or create a particular tactic, a correlogram like the one below shows the pairs with the most lift.

    Optimizing Marketing Budgets: A Data-Driven Approach

    From a funding perspective, analyzing our 55,211-record blended dataset through Ridge regression allows us to move beyond raw interaction volume to true contribution. By generating and normalizing beta coefficients, we can isolate the unique impact of each channel on the final conversion event, providing a mathematical foundation for marketing spend allocation.

    Based on this specific analysis, here is the performance breakdown of the primary drivers and their normalized contribution to conversion:

    Summary

    • The Efficiency of Retention: Email and Referral show the highest normalized contribution ($~20\%$ each), suggesting that “warm” audience paths are the most reliable foundation for the budget.
    • The Synergy Mandate: Funding should prioritize synergistic pairs rather than siloed channels. For example, the high weights of Social Media and Search Ads suggest they function best when funded in tandem to capture both interest and intent.
    • Awareness as Air-Cover: “Brand Awareness” channels (Social/Display) provide the necessary air-cover for time-sensitive, high-conversion calls to action like New Product Launches and Discount Offers.

    Citations:

    Criteo Labs (2018). Criteo Attribution Modeling & Bidding Dataset. Kaggle. Available at: https://www.kaggle.com/c/criteo-attribution/data

    Miller, T. W. (2015). Marketing Data Science: Modeling Techniques in Predictive Analytics with R and Python. Pearson Education.

    Moro, S., Cortez, P., & Rita, P. (2014). A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, 62, 22-31. Elsevier. https://doi.org/10.1016/j.dss.2014.03.001

    Raschka, S. (2018). MLxtend: Providing machine learning and data science utilities and extensions to Python’s scientific computing stack. Journal of Open Source Software, 3(24), 638. https://doi.org/10.21105/joss.00638

    Zhao, K., et al. (2019). Deep Learning and Association Rules for Multi-Channel Attribution. In Proceedings of the 2019 International Conference on Data Mining & Marketing Analytics.

    UCI Machine Learning Repository. (2012). Bank Marketing Dataset. https://archive.ics.uci.edu/ml/datasets/Bank+Marketing

  • From Accounts to Contact Personas: a data-driven framework for B2B targeting and segmentation – when contacts matter.

    Introduction & The B2B Modeling Hierarchy

    In my experience, B2B contact data lacks the predictive weight found in B2C or subscriber databases. Having managed contact data at Hearst, Cisco, and DellEMC, I’ve seen firsthand that while contact attributes (PII, job titles, and history) are essential for execution, they are often secondary for targeting.

    Because contact data is lower in dimensionality and higher in volatility, it rarely contributes to B2B targeting models in a statistically significant way. Even in “greedy” models utilizing 250+ features (including firmographics, purchasing history, competitive install, intent and installed-base telemetrics) the company-level attributes consistently push out contact-level demographics.

    Machine learning and statistical algorithms, (such as logistic regression, SVM, Random Forest and XGBoost), will almost prioritize company-level variables, often rejecting contact-level data as “noise” that offers negligible improvement in predictive accuracy.

    Starting with the Account

    B2B propensity-to-buy (also respond or churn likelihood), RFM or cluster segmentation, and Customer Lifetime Value are all fundamentally based on the company (or account).  This has always been the case. To maximize impact, start with the company.

    Typically, the most influential variables (features) for these models include:

    • Past Purchases (RFM): The strongest indicator of future behavior.
      • Recency, Frequency, and Monetary values at the account level provide the historical baseline for any CLV calculation.
    • Firmographics: Standardizing variables like Industry and Revenue is critical.
      • These act as the primary “branches” in decision-tree models like XGBoost to segment high-value targets.
    • Engagement Data: Third-party syndicated media usage and first-party website traffic.
      • “Intent” signals, indicate where an account is in the buying journey.
    • Pipeline Dynamics: Lead volume and velocity.
      • How quickly multiple contacts from the same account are engaging.
    • Market Context: Installed base telemetrics (usage) and competitive product footprint.
      • Gap analysis of current products that are at maximum use or underutilized for cross-/ up-sell, and identifying accounts that are primed for a displacement campaign based on service contracts or competitive product purchases.

    However, once the targeting or segmentation scheme is developed, that is the time that contact penetration, quality and associated attributes become critical for go-to-market execution.

    Maintaining a Contact Data Foundation.

    The foundation for contact data should ideally be in place prior to program execution. This means achieving high contact penetration in key functional areas with established brand awareness and permissions. Because contact data is notoriously volatile, maintenance must be an ongoing process to prevent decay.

    Technical Note: For this analysis, I generated synthetic data mirroring a typical B2B tech environment. All data wrangling and modeling were performed using Python (pandas/NumPy) in a Jupyter Notebook, with Matplotlib/Seaborn for graphics and Scikit-Learn for K-Means clustering.

    Contact Profiling: Turning Noise into Segments

    Marketing to hundreds of thousands of free-form job titles is impossible at scale. To develop a systematic approach, a contact hierarchy is critical. While companies once had to build internal mapping tables, most modern vendors now provide standardized hierarchies to assess quality and facilitate execution.

    Start with a demographic profile of contacts currently in the marketing data lake or datamart.  The table below is an illustration of a basic contact profiling scorecard that can be used to analyze and track the total marketable contact data foundation.

    Hypothetical Contact Profiling Scorecard

    Addendum: Contact Data Health Checklist

    Framework Reference: Forrester (formerly SiriusDecisions) Data Strategy Standards

    To ensure the “Workhorse” models discussed in previous articles perform at peak efficiency, I recommend auditing your contact database against these five critical health benchmarks:

    • Accuracy: >95% for core predictive features (Job Title, Industry).
      • Scientific Impact: High accuracy reduces “label noise” and improves the gain in classification trees.
    • Density: 100% completeness for critical path fields (Email, Account Name, Contact Name, Title).
      • Scientific Impact: Eliminates data sparsity, ensuring your models don’t rely on biased imputations.
    • Timeliness/Validity: <12 months since last verification.
      • The Decay Factor: B2B data decays at ~2.1% per month (25% annually). Records older than one year significantly increase bounce rates and skew survival analysis.
    • Consistency: 100% standardization on categorical variables (Country, Company Name, Seniority).
      • Scientific Impact: Standardizing “US” vs “USA” is essential for Wickham’s (2014) Tidy Data principles and ensures correct feature grouping.
    • Buying Group Linkage: >3 contacts mapped per target account.
      • Strategic Impact: Essential for moving from individual “Lead Scoring” to “Account-Based Propensity” models.

    I wanted to work from a single synthetic dataset to maintain continuity. For the dataset I’ll be using (below), here is a basic job title distribution to use as a starting point and later we will do some basic Persona mapping.

    Fitting to the Account Target

    Further cross-tabs can be done on the contact data to ascertain whether the contacts exist in high or low potential geographies, industries, etc.

    There are three ways to do this.  If only a few features are available or a specific industry is being targeted, then match the contacts to the companies:

    Contact Distribution by Country and Industry — Top 10 in Rank Order.

    Once the foundation is set, cross-tabulations can determine if contacts exist within high-potential geographies or industries. We can visualize this through a Tree Map of the top countries and industries. The larger the box, the higher the concentration of a specific job title within that segment.

    Tree Map: Top 10 Countries, Industries and Job Titles based on contact coverage.

    The “Sweet Spot” for Sales and Marketing

    For a synthesized view, we return to the Propensity-to-Buy (P2B) models. By identifying the top job titles within high-propensity accounts, we find the “sweet spot” for marketing spend.

    Propensity to Buy Customer Distribution

    Here we have the top job titles for high-propensity accounts initially in a bar chart (below):

    Taking segmentation a step further, we use K-Means clustering to group companies by Industry, Country, and P2B score. According to Tan et al. (2019),

    “Cluster analysis groups data objects based only on information found in the data that describes the objects and their relationships. The goal is that the objects within a group be similar or related to one another and different from (or unrelated to) the objects in the other groups. The greater the similarity (or homogeneity) within a group and the greater the difference between groups, the better or more distinct the clustering.”

    This allows us to drill down into specific personas (e.g., TDMs and Strategic Managers) within “High Value/High Growth” clusters.

    Distribution of contact personas in the High Value cluster segment.

    Program Execution

    Fictitious PII synthetic data to illustrate a final list of contacts in high propensity-to-buy companies. Now the hypothetical program is ready for execution.

    Illustration only: synthetic data not intended to represent actual persons or PII.

    Conclusion

    By prioritizing account fit over contact volume, marketing effectiveness improves across every metric:

    • Lower cost-per-lead.
    • Higher response rates through relevance.
    • Better pipeline quality and higher conversion rates.
    • Increased revenue.

    Targeting precision in B2B marketing isn’t about how many contacts you reach, but about reaching the right people within the right organizations. Start with the account to find your target, then use high-quality contact data to hit it—this is the most reliable path to maximizing both ROI and market impact.

    Citations

    Forrester Research. (2024). The B2B Marketing and Sales Data Strategy Toolkit. [Online]. Available at: https://www.forrester.com/report/the-forrester-b2b-marketing-and-sales-data-strategy-toolkit/RES172091

    Hoffmann, J. P. (2016). Generalized Linear Models: An Applied Approach (2nd ed.). Routledge.

    Integrate. (2025). Implementing the B2B Data Quality Toolkit: Standards for 2026. [Online]. Available at: https://www.integrate.com/resources/b2b-data-quality-toolkit

    Miller, T. W. (2015). Marketing Data Science: Modeling Techniques in Predictive Analytics with R and Python. Pearson Education.

    Tan, P.-N., Steinbach, M., Karpatne, A., & Kumar, V. (2019). Introduction to data mining (2nd ed.). Pearson Education.

    Wickham, H. (2014). Tidy Data. Journal of Statistical Software, 59(10), 1–23. https://doi.org/10.18637/jss.v059.i10

  • Plugging the Leak: Gradient Boosting and Survival Analysis for Customer Retention

    Why churn modeling?

    The “Leaky Bucket” metaphor was first popularized by subscription marketers using CRM in the ‘90’s and I see it finally making its way into B2B marketing as a result of XaaS and Cloud technology.

    Coming from subscriber marketing, the idea of customer churn and retention were always paramount for B2C marketers. With the rise of XaaS and Cloud technology, the ‘Leaky Bucket’ metaphor—a staple of 90s CRM—has evolved from a B2C concept into a mission-critical B2B framework.”

    Churn modeling fits nicely with Propensity-to-Buy and Customer Lifetime Value because it employs many of the same techniques. There are two ways to look at churn which require methodologies we’ve seen in my past articles:

    What is the likelihood of a customer leaving in the next 30 days? This requires a statistical or machine-learning classifier such as XGBoost (which I used in my article on propensity to buy).  Here we want a list of customers that have a high churn likelihood, so the target for prediction is changed from purchase to churn (yes/no).

    When will a customer churn? Knowing the timing of churn means turning to survival modeling, which I employed in my article on Customer Lifetime Value analysis, and here the recommended python library is called Lifelines.  There are also some great visualizations of survival probability of different customer segments that can be generated, and I think that is an excellent way to understand the identify the drivers and profile the customers who churn – whether they are people (B2C) or businesses (B2B accounts).

    The Dataset

    The IBM Telco Customer Churn dataset is a famous dataset used by data scientists to predict churn and work with marketers to develop customer retention strategies. It is a fictional telecommunications customer database, providing a mix of demographic, service, and financial data. It consists of data on 7,043 customers with 21 features including:

    Demographics: Information about the customer’s gender, age range (Senior Citizen), and whether they have partners or dependents.

    Account Information: How long they’ve been a customer (tenure), their contract type (Month-to-month, One year, Two year), payment method, paperless billing, and charges (MonthlyCharges and TotalCharges).

    Services: Specific services the customer has signed up for, including Phone, Multiple Lines, Internet (DSL, Fiber Optic, etc.), Online Security, Online Backup, Device Protection, Tech Support, and Streaming TV/Movies.

    The Target: The Churn column, indicating whether the customer left within the last month (Yes/No).

    Exploratory Data Analysis (EDA) and Data Quality

    An analysis of the dataset showed that it was very complete, with only 11 missing Total Charges so other than a few data transformations (strings à integers) I wasn’t concerned about doing a lot of cleaning and data manipulation.

    Since the data had a field for Churn (Yes/No) a customer profile could be generated which provided a lot of insight into churned customers:

    From these charts, we can see that locking customers in with long term contracts and automatic withdrawal seems a good retention strategy, as churners tend to be on monthly contracts and paying by check.  Perhaps Senior Citizens are more price-sensitive and have bandwidth to shop for discounts, but that would have to be tested.  I also generated a correlation matrix which showed the correlation of the features to customer churn as another way of looking at the characteristics of churned customers:

    The Classification Question: Will they leave? (XGBoost).

    Real-world tradeoffs in modeling the probability and timing of customer churn.

    There are a lot of features that can be used together to predict churn from this dataset. The first question is “will a customer churn”?  So, I think of this as a propensity to churn model which classifies churners based on the statistical probability of churn (essentially a propensity to buy model with the target changed from “purchase” to “churn”).

    Typically, in sales and marketing models we have to decide when we target:

    1. Do we have a conservative model that is relatively accurate in predicting purchases or churn, but misses a lot of customers because it is so conservative? From a marketing expense perspective this is efficient.
    2. Do we tune more aggressively and cast a wider net, but target a lot of prospects or customers that will not purchase or churn (false positives)? This is less cost-efficient but will uncover more absolute revenue or churn “by knocking on more doors.”

    In my experience, I have always leaned towards the more aggressive model (within reason), and sacrifice precision to hit more potential purchasers/churners.

    Looking at the confusion matrix below for my baseline (aggressive) model, it is great at capturing churn within a 30 day window (Recall = 0.82 means that it captures 82% of the customers who churned) but at a high cost of false positives (Precision = 0.50 which means that any marketing or sales effort will be inefficient because it will cast a very wide net).  Further tuning improved precision, but at the cost of rejecting a lot of churning customers who had lower probability scores based on the available data, so I would not go with the conservative model or would continue to tune.

    In short: a false positive will increase marketing or discounting, while a false negative will lose a customer.

    In the high-risk customer base, we have two groups:

    1. High risk and high monetary value customers that are on month-to-month contracts that should be encouraged to sign long term contracts.
    2. High risk and lower monetary value customers that are locked into one- to two-year contracts that are targets for long term retention and customer satisfaction programs.

    For some more input on sales and marketing program design, XGBoost also produces a list of the features (variables below) that are most influential in the model, which can be used to test tactical adjustments such as discounting for two-year contracts and content rebalancing.

    The Time-based Question: When will they leave? (Lifelines)

    While the XGBoost classifier was able to tell us “This customer is at-risk” a survival model can tell us “This customer has a 60% chance of making it a year” so that a marketer can time retention efforts. To find out when a customer will churn, we need a model that is designed to predict the timing of events.  Miller finds that “a good example of a duration or survival model in marketing is customer lifetime estimation” (Miller, 2015). These are generally categorized as Survival Models:

    “… medical researchers are often interested in the effects of certain drugs on the timing of death (or recovery) among a sample of patients. In fact, these statistical models are known most as survival models because they are used often by biostatisticians, epidemiologists, and other researchers to study the time between diagnosis and death. … social and behavioral scientists have adopted these models for a variety of purposes.”

    Hoffmann, J. P. (2016)

    In Python there is a package called Lifelines (lifelines import KaplanMeierFitter) which I used for this task. The survival probability curve by contract type highlights the value of two-year contracts.

    Financial Impact

    By combining two churn modeling approaches sales and marketing can evolve from a reactive strategy to proactively developing retention programs that improve financial performance. An aggressive XGBoost model gives the business the ability to identify high-risk/ high-monetary value customers in advance, while Survival Analysis improves marketing effectiveness by guiding the timing of retention and customer satisfaction campaigns.

    The addition of churned customer profiling can be used to develop relevant marketing communications and pricing strategies.

    Citations:

    Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). ACM. https://doi.org/10.1145/2939672.2939785

    Davidson-Pilon, C. (2023). lifelines: survival analysis in Python (Version 0.27.8) [Software]. Zenodo. https://doi.org/10.5281/zenodo.8259706

    Hoffmann, J. P. (2016). Generalized Linear Models: An Applied Approach (2nd ed.). Routledge.

    IBM Sample Data Sets. (n.d.). Telco Customer Churn: Focused customer retention programs. Retrieved from https://community.ibm.com/community/user/businessanalytics/viewdocument/telco-customer-churn

    Miller, T. W. (2015). Marketing Data Science: Modeling Techniques in Predictive Analytics with R and Python. Pearson Education.

  • The Financial Side of Marketing: Beyond RFM to Predictive CLV

    My previous articles on RFM analysis and propensity-to-buy modeling explored stand-alone frameworks for segmentation and targeting. However, these models also serve as the foundational inputs for the ultimate metric in account prioritization: Customer Lifetime Value (CLV).

    While I have typically used CLV with subscription-based B2C models or B2B IT hardware (storage, routers, and switches), I believe that the principles are identical for online retail. By analyzing historical purchasing patterns through RFM (Recency, Frequency, and Monetary Value), we establish a behavioral baseline. To effectively prioritize sales coverage and marketing spend, we can then project the customer’s expected life.

    Leveraging RFM we have the CLV formula:

    CLV = (Frequency * Monetary) x Expected Lifespan

    By shifting from standard propensity modeling (the simple probability of purchase) to survival analysis (the timing and likelihood of the next purchase), we can distinguish between historical high-spenders and customers with the highest probability of future revenue. This allows us to treat customer acquisition and retention as a strategic financial investment—ensuring that projected cash in-flows exceed the Customer Acquisition Cost (CAC). As Miller (2015) notes:

    “Customer lifetime value analysis draws on concepts from financial management. We evaluate investments in terms of cash in-flows and out-flows. Before we pursue a prospective customer, we want to know that [sales] will exceed [costs]”

    From this, we can derive Net CLV (CLV – CAC) or the Financial Efficiency Ratio (CLV:CAC). For this exercise, I have targeted the 3:1 KPI (popularized by venture capitalist David Skok) as the benchmark for a healthy account.

    Data Overview

    The Online Retail Data Set from the UC Irvine Machine Learning Repository is a publicly available real-life dataset used by students for customer analytics, RFM (Recency, Frequency, Monetary) modeling, and market basket analysis.

    Dataset Description

    This is a transactional dataset containing all transactions occurring between December 1, 2010, and December 9, 2011, for a UK-based, non-store online retail company (Chen, D., 2012).

    • Business Nature: The company primarily sells unique all-occasion giftware.
    • Customer Base: While many are individuals, a significant portion are wholesalers (which accounts for the extreme outliers in spending and quantity).
    • Scale: 541,909 transactions and 8 attributes.

    Here is a sample from that dataset:

    Exploratory Data Analysis (EDA) and Data Cleaning

     Visual inspection revealed several data quality issues. I filtered the dataset to focus strictly on product purchases, removing entries related to “bad data”: returns, exchanges, and duplicative descriptions. Examples of the values removed are below (there were many more): 

    filter_values = [

        ‘add stock to allocate online orders’, ‘adjust’, ‘adjustment’,  ‘alan hodge cant mamage this section’, ‘allocate stock for dotcom orders ta’,  ‘barcode problem’, ‘broken’,  ‘came coded as 20713’, “can’t find”, ‘check’, ‘check?’, ‘code mix up? 84930’,  ‘counted’, ‘cracked’, ‘crushed’, ‘crushed boxes’, ‘crushed ctn’, ‘damaged’,  ‘damaged stock’, ‘damages’, ‘damages wax’, ‘damages/credits from ASOS’, ‘damages/display’, ‘damages/dotcom?’, ‘damages/showroom etc’, ‘damages?’,

    Data Distributions: Outliers and Skewness

    Histograms of the RFM data show that none of the three components are normal (Gaussian) distributions.  We see a high concentration of customers who bought recently and tapered off, a lot of customers who were one-time purchasers, and a right-skew in monetary value due to high spending customers.

    Model Selection and Development: Heuristic vs. Statistical Modeling

    The Manual “Heuristic” Model This is a “quick and dirty” approach using business rules of thumb. I utilized a step function to assign the probability of a customer remaining “alive” based on their last purchase:

    • 0–30 days since last purchase: 95% chance.
    • 31–180 days since last purchase: 80% chance.
    • Over 180 days: 10% chance.

    The Lifetimes Statistical Model (BG/NBD) Next, I applied the BG/NBD (Beta-Geometric/Negative Binomial Distribution) model via the Lifetimes library (Davidson-Pilon, C., 2021). Unlike the heuristic, this model analyzes the individual cadence of every customer. If a customer who typically buys every 10 days hasn’t purchased in 30, the model flags them as “at risk” much faster than a once-a-year purchaser.

    I’ll return to the choice of Heuristic business rules based models vs. statistical Survival Analysis in a later article on customer churn, where this comparison is also relevant.

    Results and Model Accuracy

    The Lifetimes model proved to be more conservative than the heuristic approach. It yielded a Mean Absolute Error (MAE) of 0.98, indicating that the model’s prediction was off by less than one transaction per customer over a three-month holdout window.

    While I utilized a 12-month window to account for retail seasonality and maintain precision, this horizon is flexible; for technology hardware with longer refresh cycles, a 3–5 year window is often more appropriate.

    Model Performance Metrics: The following metrics compare the model’s predictions against actual customer behavior during the holdout period:

    • MAE: 0.9846
    • Actual Avg. Purchases (Holdout): 1.4462
    • Predicted Avg. Purchases (Holdout): 1.0706

    Financial Output & Targeting Logic: By applying the model to our customer base, we derived the following financial benchmarks:

    • Mean CLV: $3,816.56
    • Mean Opportunity Ratio: 1.30

    Using an average Customer Acquisition Cost (CAC) benchmark of $400 (a conservative estimate based on the consumer electronics industry) and the 3:1 efficiency ratio, we establish a CLV threshold of $1,200. Any customer with a predicted CLV below this mark represents a net loss, whereas those above it are prioritized for sales coverage.

    Putting this all together we have a targeting grid for planning and targeting high value customers:

    Conclusion

    Enterprise-level data is often more complex and so more work can be involved in tuning the CLV model, however the core approach here remains the same for strategic planning, account-based planning and target marketing. Understanding the future value of a customer (vs. only historical spend) is one of the most important capabilities in a marketer’s toolkit.

    Citations:

    Chen, D. (2012). Online Retail [Dataset]. UCI Machine Learning Repository. https://archive.ics.uci.edu/ml/datasets/Online+Retail

    Davidson-Pilon, C. (2021). Lifetimes: Measuring customer lifetime value in Python (Version 0.11.3) [Computer software]. https://github.com/CamDavidsonPilon/lifetimes

    Hoffmann, J. P. (2016). Generalized Linear Models: An Applied Approach (2nd ed.). Routledge.

    Marianantoni, A. (2025, March 19). CLV to CAC Ratio: Guide for Startups 2025. M Accelerator. https://maccelerator.la/en/blog/entrepreneurship/clv-to-cac-ratio-guide-for-startups-2025/

    Miller, T. W. (2015). Marketing Data Science: Modeling Techniques in Predictive Analytics with R and Python. Pearson Education.

    Shopify Staff. (2024, July 29). Customer acquisition costs by industry (2025). Shopify Blog. https://www.shopify.com/blog/customer-acquisition-cost-by-industry#4

    Skok, D. (2010, February 17). SaaS metrics 2.0 – A guide to measuring and improving what matters. For Entrepreneurs. https://www.forentrepreneurs.com/saas-metrics-2/

  • The Demand Generation Workhorse

    The “workhorse” of modern demand generation is a family of binary classification models. These models are designed to predict a specific response, such as “…whether a customer buys, whether a customer stays with the company or leaves to buy from another company, and whether the customer recommends a company’s products to another customer” (Miller, 2015). Bruce Ratner (2017), in his work on machine learning and data mining, calls this approach the “workhorse of response modeling.”

    There are many statistical and machine-learning techniques for building Propensity-to-Buy (P2B) models—whether predicting churn, response rates, or identifying CRM opportunities that will convert to sales.  The concept has been around since the early days of direct mail (1980s), when direct mail marketers relied on Logistic Regression (still a powerful technique). Now, fueled by big data and high-performance computing, one of the most popular and effective techniques is called eXtreme Gradient Boosting (XGBoost) and I am using that for my examination of how it can be applied in marketing.

    Propensity-to-buy has been the first technique we have used in any company I’ve had the pleasure of working in; it has evolved to become the foundation of modern digital marketing.  Technically efficient and scalable, once the base code is set, a P2B model can be trained to predict different targets with limited recalibration. 

    Key Applications of P2B:

    1. Purchase Likelihood for a Product: Identifying which customers and prospects are most likely to purchase specific products. This applies to both new product launches and up-sell/cross-sell for existing products.
    2. Churn Risk: Predicting which companies or individual contacts are at risk of leaving to competitors.
    3. Lead Scoring and Conversion Probability: Forecasting which CRM opportunities are most likely to convert to “Closed-Won” to optimize the marketing and sales pipeline.
    4. CLV Integration: Generating the probability scores required to calculate a Customer Lifetime Value (CLV) model.
    5. Campaign Response: Determining which contacts are most likely to engage with or respond to a specific marketing program.
    6. Customer Valuation: Identifying which customers will be the most valuable overall across their entire purchase history.  For example, comparing a company’s Total IT Spend with P2B.
    7. Strategic Profiling: Isolating a “high-propensity” population to build ideal demographic and firmographic profiles for future targeting.

    For the following example, XGBoost was perfect (an open-source machine learning ensemble method that builds multiple decision trees to correct previous errors). First introduced to me six years ago by data scientist Fuqiang Shi, it has become the gold standard for structured data. XGBoost gained global fame around 2014–2016 for dominating Kaggle competitions (often outperforming popular methods like Random Forest, Support Vector Machine, Bayesian Classifier, etc.).  That said, in practice a data scientist should test several methods to find the best model by comparing performance metrics.

    The Data

    To demonstrate how to build and score a model, I utilized the Bank Marketing Dataset from the well-known UC Irvine Machine Learning Repository. This dataset consists of 45,211 rows and 17 columns, representing a real-world scenario of a direct telemarketing campaign from a Portuguese banking institution (2008-2010). Here are the first five rows:

    Bank Marketing Dataset from the well-known UC Irvine Machine Learning Repository.

    UCI Machine Learning Repository: Bank Dataset. University of California, Irvine.

    Model Development: A High-Level Overview

    Propensity model development process.

    Predictive Performance

    The model achieved a predictive performance of 81% (ROC AUC), which was within range for a real-world application in my experience (although I have seen between 65% for prospecting to 90% for customer models). While I initially hard-coded the parameters, I followed up with a grid search (GridSearchCV) to ensure optimization. Since both approaches achieved the same predictive performance, I stopped tuning at that point. [Environment: Python Jupyter Notebook (Anaconda)].

    Using Model for Decision Support: Rebalancing Marketing Frequency.

    From a targeting perspective, we now have a list of customers and can develop a profile based on the characteristics of that population – either by analyzing the segment directly or by examining the most influential variables used in the P2B model.

    The model reveals a ‘tipping point’ in telemarketing outreach. The campaign variable shows that as the number of contacts increases (red dots moving left in the SHAP plot), the propensity to buy drops. This suggests the bank is currently over-indexing on low-potential customers, essentially ‘inundating’ them—while missing the opportunity to focus that energy on high-potential segments that require lower frequency to convert.

    Scatterplot showing customers by number of telemarketing contacts and propensity to purchase with size = account balance.

    The Negative Correlation: In the summary plot, the high values for campaign/ telemarketing calls (dark red dots) shift to the left of the center line. This indicates that a high number of contacts during a single campaign actually decreases the probability of a purchase.

    Further examination (and data) is required here to determine whether messaging, media mix, brand awareness or other factors also come into play during execution, but this is definitely a red flag since typically effective reach is around ~3X+.

    SHAP Insights:

    SHAP (SHapley Additive exPlanations) visualization showing how variables change the outcome (positive or negative) and bar chart of feature importance.

    Based on the SHAP (SHapley Additive exPlanations) visualizations provided, we can determine exactly which “levers” drive purchase propensity. This is the “explainability” phase that translates a black-box model like XGBoost into actionable business insights.

    • The Bar Chart (Left) – Importance. It shows variables prioritized by the model (e.g., “Cellular contact is the most important piece of information”).  One caveat here: this is an older dataset from the ML Library and for illustration only; interpret with caution!
    • The Summary Plot (Right) – Direction. Shows how those variables change the outcome (e.g., “Being contacted via cell phone increases propensity, while a housing loan decreases it”).

    Summary of Feature Influence

    The model shows that a mix of communication channelspast behavior, and economic stability are the primary drivers of a “Yes” prediction.

    1. Primary Driver: Communication Method (contact_cellular). The most influential variable and  the strongest predictor of a purchase.
    2. The “Momentum” Effect (poutcome_success). Success in previous marketing campaigns is a powerful indicator of future success. This validates my previous blog’s assertion that RFM (Recency/Frequency/Monetary Value) are highly influential in P2B models. 
    3. Financial Stability (housing no and balance). Customers without housing loans (housing_no) show a higher propensity to purchase. Further, higher bank balance levels correlate positively with conversion.
    4. Timing and Outreach (day_of_weekmonth_juncampaign). The specific timing of the outreach (months like June or March) influences the model, though to a lesser degree than the contact method.  So, a telemarketing group and marketing programs should be adjusted for seasonality.

    Conclusion

    I’ll be returning to propensity-to-buy in future articles, since like RFM analysis this technique is foundational to successful quantitative marketing.  Both techniques trace their origins to the 1980s as statistical tools for direct mailers and have evolved over time to become the foundational “workhorse” of modern digital marketing.

    Citations

    Miller, Thomas W. Marketing Data Science: Modeling Techniques in Predictive Analytics with R and Python. FT Press, 2015.

    Moro, S., Rita, P., & Cortez, P. (2014). Bank Marketing [Dataset]. UCI Machine Learning Repository. https://doi.org/10.24432/C5K306.

    Ratner, Bruce. Statistical and Machine-Learning Data Mining: Techniques for Better Predictive Modeling and Analysis of Big Data. 3rd ed., CRC Press, 2017.

  • My Favorite Segmentation Scheme

    As I recall from school, the idea of optimizing a portfolio of B2B accounts goes back as far as Boston Consulting Group’s growth/share matrix and the initial list scoring techniques of direct marketers — and although the technology we use to segment customers has changed (i.e. Machine Learning) the underlying principles are the same: segment your B2C customers or B2B account base based on potential to optimize your go-to-market.

    The Segmentation Challenge

    The journey to effective segmentation often faces two extremes:

    The Black Box (Technical): Techniques like K-means clustering and principal components analysis (PCA) are powerful ML tools, but they often require massive datasets and can lack interpretability. Justifying a segmentation strategy by explaining eigenvalues or complex algorithms to a business leadership team can create friction and slow adoption.

    The Gray Area (Subjective): Conversely, creating detailed customer personas is highly intuitive but often subjective. Because they are based on opinion or aspiration, the segments can be endlessly debated, leading to unclear targeting and weak tactical execution.

    The Power and Simplicity of RFM:

    In my experience, one of the most elegant, simple and powerful segmentation schemes is based on RFM scoring, which allows the marketer to:

    • Maximize ROI: Strategically allocate sales and marketing resources to the highest-potential customers.
    • Drive Growth: Target active, high-value accounts for cross-sell and up-sell campaigns to increase purchase frequency.
    • Minimize Churn: Increase retention and proactively intervene with the most valuable customers who show signs of drifting away.
    • Improve Acquisition: Identify and target “lookalike” prospects who share the profiles of your best customers.
    • Inform Value Metrics: Serve as a core input for calculating and improving Customer Lifetime Value (CLV).

    In its simplest form, RFM (Recency, Frequency and Monetary Value) only requires customer purchase transaction data and is both statistically significant in predicting future purchases (on its own and when nested within other purchase-likelihood models) and easily understood by the business.

    As Thomas Miller describes in his textbook, Marketing Data Science:

    “Direct and database marketers build models for predicting who will buy in response to marketing promotions. Traditional models, or what are known as RFM models, consider the recency (date of most recent purchase), frequency (number of purchases), and monetary value (sales revenue) of previous purchases. More complicated models utilize a variety of explanatory variables relating to recency, frequency, monetary value, and customer demographics.”

    Miller, Thomas W. Marketing Data Science. Pearson Education LTD., 2016.

    Methodology: From Data to Actionable Segments

    For each of the three categories, the customer is given a score on a scale of 1 – 5 where one is the lowest score and five is the highest or best score.  If a scored list targeting the highest potential account is desired, add the scores together to get a total score between one and 15.

    By assigning these scores, you effectively break your population into quintiles (20% groups) for each dimension. For simple prioritization, you can combine the scores for a total RFM score between 3 and 15, creating a ranked list of highest-potential accounts.

    Patterns in the purchase data can be used as the foundation of personas (additional attributes can be layered on for profiling) and I have found that the scores have always been influential (statistically significant) as inputs into purchase likelihood ML models for scoring accounts. The following are some segments that are typically derived from the scores in the RFM Scores table (above):

    • Champions have the highest score in all three categories (RFM) and highest total scores.
    • New or Highest Potential Customers have high recency and monetary value scores, but have just started purchasing, and so their frequency score will be in the lower quintiles.
    • Past or Churned customers have high monetary and frequency scores, but very low recency scores (i.e. bottom 20%).
    • Additional segments can be created for average customers (to benchmark), new prospects, or totally lost customers.

    I’ll return to RFM segmentation later for use in precision targeting, segmentation for relevant marketing communication and media mix optimization.  As I mentioned at the beginning, this technique was pioneered by early direct mail marketers using spreadsheet analysis, and although we can build Python, SQL and R scripts to run it now, the fundamentals remain the same – and if the reader wants to investigate further just Google “RFM reference material” and a lot of material is widely available in the form of academic papers and videos.

    Here are a few references:

    https://www.investopedia.com/terms/r/rfm-recency-frequency-monetary-value.asp#:~:text=The%20recency%2C%20frequency%2C%20monetary%20value%20(RFM)%20model%20is,%2C%20the%20better%20the%20result).

    https://mailchimp.com/resources/rfm-analysis/#:~:text=RFM%2C%20also%20known%20as%20RFM,monetary%20value%20of%20a%20transaction.