Why Recommender Systems Matter
- Map Non-Linear Journeys: Customers move between “states” (social, email, search) rather than a straight line. Use Markov Chains to visualize this web and capture micro-conversions.
- Identify “Dead Ends”: Transition probabilities reveal where your funnel “leaks.” High churn from the intent stage indicates friction at opportunity closed/won (B2B) or pricing (B2C).
- Optimize Marketing Mix: Use multi-touch attribution to credit awareness channels fairly. This prevents cutting essential top-of-funnel budget.
- Increase Retention: Forecast the likelihood of customers moving from Active to Churned. This provides a window to intervene before they leave.

Introduction
In my first article on recommender systems entitled Recommender Systems: Market Basket Analysis & Next-Likely-Purchase in Cross-Sell, I explored Association Rules (Market Basket Analysis), which identifies which products tend to be purchased together in a single transaction. It’s a powerful “snapshot” tool and is built on deep historical data, like a lot of statistical and ML models.
In contrast, one of the most elegant ideas in data science is the Markov property: the future depends only on the present state. It is a seemingly simple philosophy, but it has profound implications. What a customer is doing “now” is a state rich enough to predict what comes next. This is the world of state-based journeys, where Markov Chains quantify how customers move through media, lead funnels, and product purchases over time.
Modern GTM strategies require more than a snapshot; they require a map. To optimize lead pipeline, media mix, and sequential selling, we need to quantify the customer journey over time. One of the most effective techniques for this is the Markov Chain. I was first introduced to the power of this method when a member of my team, Yexiazi (Summer) Song, utilized it to decode the complex sequences of B2B hardware and software sales for a global tech corporation.
GTM Use Cases: Moving Beyond “Last Touch”
Legacy attribution often fails because it looks at touchpoints in a vacuum. As RevSure AI notes, Markov Chain models offer a “full-funnel, unbiased insight” that standard models miss:
“In complex B2B funnels, a lot happens between the first touch and the closed deal. Relying on first- or last-touch attribution misses the rich middle… This leaves marketers knowing what happened but not why—or what to do about it.”
“Markov Chains and Next Best Action,” RevSure AI (2025)
By using Markov Chains, we move from good business intuition to an evidence-based understanding of the Next Best Action.
Methodology: The Power of the “Removal Effect”
At its core, a Markov Chain is a stochastic statistical model developed by a Russian mathematician of the same name based on the theory that the probability of a system moving to a future state depends solely on the current state, not its previous state. In a marketing context, we define our State Space (S) as the various channels (Email, LinkedIn, Direct Sales) plus the terminal states: “Start,” “Conversion,” and “Churn.”
The real “magic” lies in the Removal Effect. By mathematically “removing” a specific channel from the chain and observing how much the total probability of conversion drops, we can assign a precise weight (or value) to that channel.


This calculation allows us to see which touchpoints are the “force multipliers” in the funnel and which are merely noise. Essentially, it says that the probability of what a customer does next is based on what they are doing now. Ultimately, the Markov Chain uses a transition matrix, where each row represents the probability of moving from the current state to the next, summing to one:

Applying the Probabilistic Lens to the B2B SFDC Funnel

While we often visualize the B2B funnel as a linear, gravity-fed pipe, the reality within Salesforce is far more probabilistic. In data science, we call this a stochastic process—meaning the path forward isn’t a fixed rail, but a series of possibilities influenced by where the prospect stands today. A prospect doesn’t just slide from Response to Won; they loop back for further discovery, stall in “Nurture” for six months, or skip stages entirely when a champion fast-tracks a deal.
By modeling each SFDC stage as a “state” in a Markov process, we move from simple counting to advanced Probabilistic Attribution:
- The Transition Matrix: We quantify the probability of moving from any stage (e.g., MQL) to any other stage (e.g., SAL or Lost). This reveals the true “leakage” points—such as a 60% drop-off between Sales Acceptance and Qualification—that a standard funnel report often masks.
- The Removal Effect: This is the “killer app” for the B2B marketer. By statistically “removing” a stage from the chain and observing the drop in the final “Won” probability, we can calculate the exact value-add of mid-funnel activities. If removing “Stage 2: Solution Scoping” drops our total win probability by 40%, we have a mathematical mandate to invest in sales engineering and technical content.
- Weighted Forecasting: Instead of applying a flat historical win rate to an opportunity, a Markov approach calculates the Absorption Probability. This tells us the likelihood that an account, given its current state and historical movement patterns, will eventually terminate in a “Closed/Won” state, providing a far more accurate revenue forecast for the CFO.
The Data: Scaling to Real-World Complexity
To demonstrate this, I’m moving away from small transactional samples to a high-resolution eCommerce behavioral dataset consisting of over 285 million user events. A sample is shown below. This clickstream data allows us to model the journey from “View” to “Cart” to “Purchase,” providing the scale necessary to see these transition probabilities in action.
I am going to use this eCommerce dataset for my primary Markov Chain modeling exercise dealing with product sales, and at the end I am going to examine using Markov Chain for media mix optimization.
Due to the eCommerce dataset’s size (which tested the limits of my Alienware workstation!), I implemented a chunk-based down-sampling strategy. I processed 100,000-row segments and took a random 1% sample from each to ensure a representative, non-biased subset that wouldn’t crash the Python kernel.
High Level EDA – The Funnel Reality

Product volume (below) follows a heavily skewed distribution, heavily weighted towards electronics and so we can anticipate that the Markov Chain results will be more reliable in the high-volume categories (as opposed to the lower volume or sparse product categories).

Looking at the raw data, we can calculate the Transition Probabilities for these top categories — which shows if a customer is in one product category the subsequent probability of moving to another product category. These are simply the next step in the analytical process, so interpret with caution because they are not the final Attribution Weights used for budget allocation, etc.

Before building a Markov Chain, we need to see the States. In this dataset those are view, cart and purchase – I’ve seen many funnels like this in B2B marketing where a high input (i.e. leads, unique visitors, responders, etc. in B2B) results in relatively few sales! That said, we see a 1.8% conversion rate, which is right at the average for consumer electronics but low for appliances and there is always room for improvement.

Behavioral Trends: Activity over Time
What states are most active to time interventions? This chart shows that a particular October weekend (Sat/Sun) was strongest in terms of views, as well as conversions.

Results
After filtering for only the top categories, I ran the Markov Chain and arrived at the following weights. This tells the marketer which product categories lead to a final sale. Now we have the weights for funding:

Cross-Promotion Opportunities
This is the final step — now that we have the weighting, we can look at key categories based on contribution to the funnel as well as last touch. If it has high weight but low direct sales, it is helping mid-funnel, which would be missed with first- or last-touch attribution:

As a marketer, I would bundle smartphone promotions with headphones, and follow-up every TV, clock and refrigerator sale with a mobile phone offering. Ideally, TVs, clocks, refrigerators and headphones can be moved to the right-hand quadrant to become high-volume, high-impact drivers through recalibrating marketing investments from the low-impact products up to the mid-funnel products. That would yield higher conversion rates and revenue.
Media Mix Optimization
We can use the same approach to optimize the media mix. Here I took the UCI bank telemarketing campaign dataset from my article The Multi-Channel Force Multiplier: How Bridging Digital Nurture and Direct Outreach Triples Conversion Lift, which initially used association rules.
Based on the dataset, the attributed conversions for each touchpoint/channel are as follows and these can be converted to weights for budget planning:
- Cellular: ~3,484 conversions
- Brand Awareness: ~579 conversions
- Telemarketing: ~311 conversions
- Email: ~218 conversions
- New Product Launch: ~179 conversions
Total conversions in the dataset: 5,289 (cases where y = 'yes').

By taking the Markov Weight/Last-Touch Weight we can find the channels that are most effective in the middle of the media mix in contributing to conversion — Brand Awareness, New Product Launch, and EMail (below):


Summary

The primary advantage that Markov Chains have over Association Rules Mining is the ability to capture the chronological sequence of events. While Association Rules give us a powerful “snapshot” of what happens together in a single transaction, the Markov Journey requires time-stamped, sessionized data to map how one event leads to the next over time.
The real payoff is in the Attribution Weights: the model provides a mathematical justification for budget allocation—proving, for instance, exactly why you should put 12% of your budget into TV promotions based on their mid-funnel contribution rather than just their last-click performance. I am still a professional fan of Association Rules for their simplicity and ease of explanation, but the predictive flow of a Markov Chain has revealed strategic capabilities and a level of visibility into the customer journey that a static snapshot simply cannot match.
Key GTM Takeaways
- Sequence Matters: Don’t just look at what was bought; look at the path taken to get there.
- Quantify the “Middle”: Use the Removal Effect to stop guessing which mid-funnel activities actually drive revenue.
- Probabilistic Forecasting: Move away from flat win-rates and toward absorption probabilities for a more accurate SFDC pipeline.
References
- Kakalejčík, L., et al. (2018): Multichannel Marketing Attribution Using Markov Chains. Journal of Applied Management and Investments.
- Anderl, E., et al. (2014): Mapping the Customer Journey with Markov Chains.
- RevSure AI (2025): Markov Chains and Next Best Action: The Future of Conversion Optimization.
Technical Keywords & Methodology Index
Methodology & Strategy: Markov Chain Modeling, Stochastic Process Mapping, Removal Effect Analysis, Probabilistic Attribution, Full-Funnel GTM Orchestration, Next Best Action (NBA) Framework.
Statistical Concepts: Transition Matrix, Absorption Probability, State Space (S) Definition, Non-Linear Journey Analytics, Funnel Leakage Diagnostics, Sequential Event Analysis.
Data Engineering & Revenue Operations: Clickstream Data Processing, Media Mix Optimization (MMO), Attribution Weights, Multi-Touch Attribution, Lead Lifecycle Quantization, Conversion Lift Analysis.
Python Libraries & Documentation
For analysts and engineers looking to reproduce this methodology, the stack leverages a sequence-focused pipeline. The following table maps the tools used in this analysis to their specific strategic role in the marketing science workflow.
| Library | Role in Pipeline | Strategic Purpose |
| Anaconda / Jupyter | Environment & Execution | Provides a reproducible, modular environment for collaborative research and rapid iteration of Markov models. |
| Pandas & NumPy | Data Manipulation | Handles high-resolution clickstream data (285M+ events) and facilitates the matrix algebra required for transition probabilities. |
| Seaborn & Matplotlib | Visualization | Transforms raw transition weights into heatmaps, enabling stakeholders to visualize funnel leakage and conversion paths. |
| Statsmodels | Statistical Modeling | Provides the rigorous testing needed to validate the statistical significance of funnel weights. |
| Kaleido | High-Res Output | Ensures complex funnel visualizations are exportable at high resolutions for enterprise reporting and executive presentations. |

Leave a Reply