Recommender Systems: Market Basket Analysis & Next-Likely-Purchase In Cross-Sell.

Why Recommender Systems Matter

  • Discover Product Affinities: Identify items that frequently sell together to uncover hidden customer behaviors. Use these insights to optimize cross-selling and product bundles that reflect how people actually shop.
  • Boost Average Order Value (AOV): Use “Lift” metrics to place high-affinity products near each other in digital or physical layouts. This turns single-item purchases into multi-item baskets by simplifying the discovery of related goods.
  • Personalize Promotions: Move beyond generic discounts by offering coupons for “consequent” products based on what is already in the cart. This increases conversion rates by making offers feel tailored rather than random.
  • Optimize Inventory & Placement: Predict which products will face increased demand when a “seed” item goes on sale. Use these patterns to inform stocking levels and strategic shelf (or landing page) positioning.
  • Enhance Loyalty: By recommending the “next logical purchase” before the customer realizes they need it, you improve the user experience. This builds long-term retention by positioning your brand as an intuitive partner in their journey.

Introduction

In the first article of this series entitled My Favorite Segmentation Scheme (https://mikesdatamarketing.com/2025/12/10/my-favorite-segmentation-scheme/) I identified the Loyal or High-Potential Customer segment and postulated a strategic portfolio management strategy to migrate the HiPo customers to Champions through cross-sell and up-sell. Here is a data visualization for the customer base in that article below: 

The question for a quantitative marketer in a company with multiple products is: what is the next likely purchase by these customers?  By identifying the next likely purchase we have the highest likelihood of selling and up-leveling these customers to Champions.

Aside from an opinion-based approach, or perhaps a long-term strategy to introduce new products, the only scalable quantitative option is to use a recommender system.

Any company that sells multiple products can benefit from a recommender system for cross-sell.  My expertise is primarily in B2B technology hardware marketing, but many of the techniques originated in B2C marketing.  My personal experience is in building recommenders using Association Rules Mining (also known as Market Basket Analysis) which I was first introduced to by Ling (Xiaoling) Huang and later Fuqiang (Kevin) Shi. 

However, there are other ways to build recommenders, and each has its own advantages and disadvantages.  Yexiazi (Summer) Song built a recommender using Markov Chain when she was on my team years ago.  More recently, Jidan (Joanna) Duan developed a recommender using Collaborative Filtering with Matrix Factorization to reduce latency.  In my view, any of these are key for website personalization, or telemarketing to a list of high propensity accounts that are not targeted at a single product.

For my next three articles, I am going to look at each of these three techniques, beginning with Association Rules (since that has been my “go to” technique for many years now and I am most comfortable with it).  During the course of these articles, I will do a capabilities assessment of each and compare it to the other techniques to the extent that the output and metrics are comparable.

The Goal: To move away from “one-size-fits-all” marketing and toward high-propensity targeting that increases Average Order Value (AOV).


Data Overview

I returned to the Online Retail Data Set from the UC Irvine Machine Learning Repository from my article on customer lifetime value entitled The Financial Side of Marketing: Beyond RFM to Predictive CLV (https://mikesdatamarketing.com/2025/12/28/the-financial-side-of-marketing-beyond-rfm-to-predictive-clv/). This is a publicly available real-life dataset used by students for customer analytics, RFM (Recency, Frequency, Monetary) modeling, and market basket analysis. Although the values are consumer products, I have used association rules on tech hardware and software and so the technique is just as applicable to B2B as B2C.

Dataset Description

This is a transactional dataset containing all transactions occurring between December 1, 2010, and December 9, 2011, for a UK-based, non-store online retail company (Chen, D., 2012).

  • Business Nature: The company primarily sells unique all-occasion giftware.
  • Customer Base: While many are individuals, a significant portion are wholesalers (which accounts for the extreme outliers in spending and quantity).
  • Scale: 541,909 transactions and 8 attributes.

Here is a sample from that dataset:

I performed much of the same data cleaning tasks that I wrote about in the CLV article to prepare the data for modeling and exploratory data analysis. 

Data Distributions: Outliers and Skewness

Histograms of the RFM data show that none of the three components are normal (Gaussian) distributions.  We see a high concentration of customers who bought recently and tapered off, a lot of customers who were one-time purchasers, and a right-skew in monetary value due to high spending customers. So this is a great population for increasing purchase frequency through cross-sell using a recommender system!

Here are the top products:

Some seasonality around the Holidays, so this would be a good time for implementation:

We can also look at shopping patterns, which are mid-morning into the afternoon:


Association Rules

In his text on marketing data science Thomas Miller (2015) describes Association Rules:

“Association Rules Mining is another way of building recommender systems. Association rules modeling asks: What goes with what? What products are ordered or purchased together? What activities go together? What website areas are viewed together? A good way of understanding association rules is to consider their application to market basket analysis.”

Miller (2015)

Methodology

I utilized the Apriori algorithm to identify frequent item sets, setting a minimum support threshold of 0.01 to ensure I wasn’t chasing statistical noise. According to Tan et al the Apriori principle states that “if an itemset is frequent, then all of its subsets must also be frequent… conversely, if an itemset is infrequent, then all of its supersets must be infrequent too.”

When interpreting the output of the Apriori algorithm, the resulting table provides several key interest measures. While analysis typically centers on the ‘Big Three’—Support, Confidence, and Lift—the specific focus often shifts depending on the project’s objectives. A primary advantage of association rules mining is the flexibility to filter and rank these rules using various combinations of metrics, ensuring the final recommendations align with specific business needs.

  • Antecedents: These are the items already present in the customer’s purchase history.
  • Consequents: This is the item the model proposes as a recommendation.
  • Support: The percentage of all transactions that contain both the antecedent and the consequent. It measures how popular the rule is.
    • Support (A->C) = P(A union C)
  • Confidence: The probability that a customer will buy the consequent given that they have the antecedent. It measures reliability.
    • Confidence (A->C) = P(A intersection C)/P(A)
  • Leverage: This calculates the difference between the observed frequency of A and C appearing together and the frequency that would be expected if they were independent. A leverage of 0 indicates independence.
  • Conviction: This measures the degree of implication. A high conviction value means that the consequent is highly dependent on the antecedent.

Lift is the Superior Ranking Metric

Support-based pruning (Tan et al., 2005) is widely used.  While Confidence tells us how likely a purchase is, it can be highly misleading. If a “best-seller” (like your White Hanging Heart T-Light Holder) is bought by 50% of all customers, any antecedent will naturally have high confidence in leading to that product simply because it is popular—not because there is a meaningful relationship.

Lift solves this by accounting for the base popularity of the consequent:

  • Lift (A->C) = [Support(A union C)]/[Support(A) x Support(C)]

Why I rank by Lift:

  1. Filters out “Noise”: A Lift value of 1.0 means the two items are independent. Ranking by Lift ensures you aren’t just recommending your most popular items to everyone (which provides no strategic value).
  2. Identifies “Niche Bundles”: I’ve used Association Rules for media mix optimization and this was particularly important when a marketing group ran a lot of tactical programmatic programs but very few truly integrated campaigns (therefore low support) that nonetheless had high lift.  With regards to this open-sourced dataset, a high Lift (e.g., > 3.0) indicates a strong, specific relationship. For example, if the Regency Teacup leads to the Regency Cakestand with a high Lift, it proves that the purchase isn’t random—it’s a deliberate “Tea Party” bundle.

Actionable Insight

For business professionals, Lift represents the incremental gain. It tells the marketing team exactly which products, when placed together or promoted via email, telemarketing or on websites, will change customer behavior rather than just reflecting existing trends.

So, now we have the top selling market baskets:

Business Rules to Increase Precision

A primary limitation of recommendation engines is their reliance on historical data, which restricts suggestions to items and behaviors already present in the dataset. To address this, business logic overlays or mapping tables can be introduced. These allow for the dynamic replacement of ‘End-of-Support’ items with newer alternatives, or the substitution of low-margin products with higher-margin equivalents. Furthermore, if the customer base is non-homogeneous—varying by geography, firmographics, or demographics—the data can be segmented into distinct subsets, allowing for specialized models tailored to each specific sub-segment.


Summary

The “Tea Party” and “Craft Kit” bundles identified above aren’t just interesting coincidences—they are actionable revenue drivers. By using Association Rules, we move beyond simply knowing what our best sellers are to understanding the context of the purchase.

Key Takeaways for this Method

  • Cold-Start Efficiency: Unlike other models, Association Rules don’t require a deep customer history. This gives them the ability to assess new customers because they are based on transactions such as item or product co-occurrence rather than customer history.
  • Bundled-Pricing: Sales and Marketing can bundle, for example, hardware and software packages to increase sales revenue.
  • Targeting: given Apriori, if a marketer knows that a vendor has purchased products A and B, and a high lift rule is (A, B) -> C then a telemarketing campaign can focus on selling product C.
  • Incremental Lift: By ranking recommendations by Lift rather than just popularity, we ensure our marketing efforts are driving new behaviors (i.e. selling high margin or newer products) rather than just suggesting items the customer likely would have found on their own.

While Association Rules are a powerful starting point for cross-sell, they treat every transaction as a “snapshot” in time. They don’t account for the order in which a customer explores a site or how their interests evolve over a single session.

In a subsequent article, I will explore Markov Chains, a technique that looks at the path a customer takes, allowing us to predict the next step in their journey based on their most recent move. 


References: Methodology & Python Packages

Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules. Proceedings of the 20th International Conference on Very Large Data Bases (VLDB), 487-499.

Chen, D. (2012). Online Retail Data Set. UCI Machine Learning Repository. https://archive.ics.uci.edu/ml/datasets/Online+Retail

Hunter, J. D. (2007). Matplotlib: A 2D graphics environment. Computing in Science & Engineering, 9(3), 90–95. https://doi.org/10.1109/MCSE.2007.55

Miller, T. W. (2015). Marketing Data Science: Modeling Techniques in Predictive Analytics with R and Python. Pearson Education.

Raschka, S. (2018). MLxtend: Providing machine learning and data science utilities and extensions to Python’s scientific computing stack. Journal of Open Source Software, 3(24), 638. https://doi.org/10.21105/joss.00638

Tan, P. N., Steinbach, M., & Kumar, V. (2018). Introduction to Data Mining (2nd ed.). Pearson Education.

Waskom, M. L. (2021). seaborn: statistical data visualization. Journal of Open Source Software, 6(60), 3021. https://doi.org/10.21105/joss.03021


Technical Keywords & Methodology Index

Methodology & Strategy: Association Rules Mining, Market Basket Analysis, Recommender Systems Architecture, Cross-Sell/Up-Sell Optimization, Cold-Start Efficiency, Business Rule Overlays.

Statistical Concepts: Apriori Algorithm, Support-Confidence-Lift Metrics, Rule Pruning, Itemset Frequency, Predictive Propensity Modeling, Niche Bundle Identification.

Data Engineering & Analytics: Transactional Data Processing, RFM (Recency, Frequency, Monetary) Modeling, Data Normalization & Skewness Handling, Outlier Management, Exploratory Data Analysis (EDA).

Revenue Operations (RevOps) Logic: Average Order Value (AOV) Enhancement, Marginal Gain Analysis, Incremental Lift Attribution, Strategic Inventory & Placement Optimization.

Python Libraries & Documentation

For data scientists and engineers looking to replicate this workflow, the following stack utilizes specialized utilities to transform raw transactional data into actionable revenue signals.

LibraryRole in PipelineStrategic Purpose
MLxtendAssociation Rule MiningImplements the Apriori algorithm to uncover frequent itemsets and calculate Lift, Support, and Confidence.
Pandas / NumPyData WranglingManages the cleaning, reshaping, and skewness correction of large-scale transactional datasets.
MatplotlibVisualizationGenerates RFM histograms and temporal purchase patterns to identify seasonality and sales trends.
SeabornStatistical PlottingEnhances EDA outputs for stakeholder communication, visualizing the distributions and outliers in the data.

Posted in

2 responses to “Recommender Systems: Market Basket Analysis & Next-Likely-Purchase In Cross-Sell.”

  1. […] Recommender Systems: Market Basket Analysis & Next-Likely-Purchase In Cross-Sell. […]

  2. […] Recommender Systems: Market Basket Analysis & Next-Likely-Purchase In Cross-Sell. […]

Leave a Reply

Discover more from The Marketing Science Signal

Subscribe now to keep reading and get access to the full archive.

Continue reading