Why Statistical and ML Forecasting Matters
- Blend Intuition with Math: Traditional “black box” models often fail because they ignore human context. A hybrid approach integrates sales pipeline data (human intuition) with machine learning to create a forecast that leadership can actually trust.
- Solve the Data Integrity Gap: Sparse or low-quality data doesn’t have to break your model. By focusing on outlier management and data engineering, you can transform poor quality CRM records into a reliable foundation for prediction.
- The Champion-Challenger Method: Don’t rely on a single algorithm. Use a “shootout” between multiple methodologies to identify which model performs best for your specific business cycle and customer behavior.
- Bridge the Trust Barrier: The primary obstacle to ML adoption is lack of visibility. Hybrid architectures provide the transparency stakeholders need by showing exactly how human inputs and statistical trends combine to drive the final number.

Introduction
I am always surprised when I join a company to find that the GTM and Finance functions are still relying solely on Excel spreadsheets and field sales “expert opinion” (the Delphi Method) for forecasting. This persists despite the wealth of statistical, machine-learning, and deep-learning methods available today. Often, this reliance stems from a lack of trust in “black box” technologies; if leadership doesn’t understand the path a model takes from Point A to Point B, their hesitation is understandable—especially when financial performance and board-level predictions are on the line.
However, transitioning to modern forecasting doesn’t require abandoning qualitative insights. In fact, expert human opinion can be directly integrated into ML and statistical frameworks to improve accuracy (e.g., using field sales input to refine deal-size estimates within CRM data like Salesforce). In this article, I examine three techniques my teams and I have used ranging from traditional modeling to Deep Learning.
On a related note, as modelers, it can be tempting to pre-select a favorite technique and assume it is the best fit for the problem. However, I was trained to always evaluate at least three distinct approaches to identify which produces the highest predictive accuracy with the lowest error.
In this pursuit, statisticians often follow Occam’s Razor (the principle of parsimony): the philosophical rule that when competing models explain data equally well, the simplest model should be preferred. In this case, that would be a relatively simple univariate time series. However, when encountering significant unexplained variance, additional predictors—exogenous variables—are required. While a model like SARIMAX can outperform a standard ARIMA by accounting for these external factors, every additional variable risks increasing model complexity and error if not managed correctly. Similarly, while neural networks rule the world of Big Data, they can significantly underperform if the dataset is too small.
The Models
Rather than choosing a winner in advance, this three-model shoot-out guarantees the best possible prediction for my selected business case.

1. SARIMA: The Statistical Foundation
This is the traditional statistical method. It looks strictly at the history of a single variable (i.e. Won Revenue) for prediction. It decomposes data into its past values (Autoregressive), its past errors (Moving Average), and its repeating cycles (Seasonality).
2. SARIMAX: The Context-Aware Statistical Model
The “X” stands for eXogenous variables. Building on SARIMA with additional features to explain random variation, SARIMAX looks at the calendar plus external factors—like sales account manager’s Forecasted Revenue, marketing spend, or economic indicators. It provides the power of time series + linear regression.
3. LSTM: The Deep Learning Memory Model
As François Chollet explains in Deep Learning with Python, LSTMs are a specialized type of Recurrent Neural Network (RNN). While traditional models may forget the beginning of a sequence by the time they reach the end, LSTMs have a carry (Cell State)—a way to keep track of long-term dependencies. Unlike SARIMA, which uses fixed formulas, the LSTM creates its own features through layers of neurons. It is Long Short-Term because it decides what to remember (Long-term) and what to throw away (Short-term).

The Dataset and Exploratory Data Analysis (EDA)
Sparse Data Cannot Be Used For Forecasting
My first attempt to do a forecasting technique shoot-out leveraged an SFDC/CRM-style dataset (Source: Chioma Iwuchukwu – Sales Funnel Revenue Forecast). The raw data captured roughly 100 days of activity starting January 1st. While this seed data provided a realistic snapshot of B2B deal flow, it presented a cold-start problem: with only ~18 actual “Won Revenue” events, the data was too sparse for deep learning models to distinguish between a recurring pattern and random luck.

Sometimes the data is so bad that it cannot be used for forecasting, and despite my best efforts this was as close as I could get with a n=18 ultra-sparse dataset that looked like this:

As a result, my forecasts were way off the mark:

In its original state, the data provided a snapshot of performance but was limited in volume, making it difficult for deep learning models like LSTMs to generalize without overfitting. A review of the summary statistics (df.describe()) reveals a high standard deviation (275% of the mean for Won Revenue) and an enormous range (0 to $46K for Won/Loss) across all key variables. In a B2B context, this indicates a high-variance environment where individual large deals can significantly swing daily totals. Another indicator was the fact that the mean for Won Revenue was $3,986 while the median was $0.

The Augmented Dataset (n=200): Scaling for Deep Learning
To do a comparison of techniques, I had to do a significant clean-up and rebuild of the n=100 dataset, and so I utilized synthetic data augmentation to expand the dataset to 200 daily observations that captured the underlying structure of the initial data but removed the volatility while capturing the weekly cadence and growth of the initial dataset. This was a deliberate data engineering step to create a robust training environment for the LSTM Neural Network. This augmentation was calibrated to mirror the original B2B cycle mathematically:
- Deterministic Trend: I preserved the original growth trajectory, ensuring the models evaluate a business that is scaling.
- Seasonal Harmonic: Using a sinusoidal function, I reinforced the 7-day weekly cycle. This captures the essential B2B weekend dip and mid-week peak patterns (shown below).
- Stochastic Noise: I injected Gaussian Noise (random variance) to simulate market volatility. This forces the LSTM to learn signal over noise—distinguishing between a structural shift and daily chatter.
- Lookback Depth: Doubling the volume allowed for a 14-day sliding window. This gives the neural network enough temporal depth to learn from two full business cycles before making its next prediction.
Analysis of Variability and Decomposition
For the new dataset, things look much more promising. I removed the extreme observations (outliers) because the three $45,000 observations alone were adding $1,350 to the daily average. The standard deviation is now around 0.19 off the mean because the max is much lower, and so the range is much tighter (min:max). Another check showed that the median was $5,236 — really close to the $5,249 mean indicating a good symmetry and low skewness. It is ready to use.

Here are some bar charts to show the daily trends, which are not highly variable but do show the weekend sales dip:

Typically, Time Series Decomposition (as shown in the chart below) allows us to strip away the noise to see the underlying mechanics:
- Trend: A clear, 15% positive slope indicating long-term growth.
- Seasonality: A heavy daily/weekly influence.
- Residuals: A significant amount of randomness.
Because of this high residual noise, a pure time series model like ARIMA (which only looks at past values) is likely to be insufficient. This is why I have utilized SARIMAX to incorporate Forecasted Revenue (the field sales pipeline, which is opinion based and gives us a blend of business intuition and machine learning) as an exogenous variable and an LSTM to capture non-linear relationships that traditional statistics might miss.
By increasing the density of the dataset, I have smoothed the influence of extreme outliers, ensuring that the resulting forecast is a reflection of systemic performance rather than a reaction to a few bad days.

Forecasting Model Results
The logic is driven by the test size = 21 variable. In time-series forecasting, we typically set aside a test set to validate how well the models perform against real data.
- Total Dataset (n=200): Since the frequency is daily (freq=’D’) the dataset covers approximately 6.5 months (from January 1st to mid-July).
- Training Period: The first 179 days.
- Forecast (Test) Period: The final 21 days.

Based on the measures of predictive accuracy (below) SARIMAX is the winner, followed closely by LSTM. This is not surprising, since we saw the randomness (exogenous factors beyond trend and seasonality) that were driving sales, and these two techniques are able to capture some of that noise, especially because they are incorporating human opinion (similar to the Delphi Method) by adding the sales pipeline data into the model. Sales account managers know things that are out of the model’s sight: if a key contact just quit (or joined) a company, competitor price competition/promotions, new product launch timing, etc. The forecast now includes this information.
This code implements a hybrid forecasting strategy by comparing a traditional statistical SARIMAX model, which leverages exogenous variables like forecasted revenue, against a deep learning LSTM network for capturing complex, non-linear sequential patterns. By preparing data through specific sliding window sequences and scaling, this approach enables you to contrast deterministic time-series forecasting with neural network-based predictive modeling to improve the accuracy of your marketing performance projections.

The fact that the SARIMAX model really outperformed the ARIMA model shows that, in this case at least, human opinion can add a great deal to machine learning models, and for this dataset (developed using past historical forecast and actual data) the two are highly correlated:


So, humans still have a place in the world of AI!
Citations
- Chollet, F. (2021). Deep Learning with Python (2nd ed.). Manning Publications. (For logic regarding LSTM architecture and temporal representations).
- Hunter, J. D. (2007). Matplotlib: A 2D Graphics Environment. Computing in Science & Engineering. (For data visualization and time-series plotting).
- Harris, C. R., et al. (2020). Array programming with NumPy. Nature. (For synthetic data generation and mathematical arrays).
- Iwuchukwu, C. (2024). Sales Funnel Revenue Forecast [Dataset]. GitHub/Personal Collection. Augmented and expanded to n=200 using Python synthetic generation techniques (2024).
- Miller, T. W. (2015). Marketing Data Science: Modeling Techniques in Predictive Analytics with R and Python. Pearson Education. (For the application of SARIMA/SARIMAX in a business revenue context).
- Pandas Development Team. (2023). pandas-dev/pandas: Pandas 2.0.0. Zenodo. (For data manipulation and time-series resampling).
- Seabold, S., & Perktold, J. (2010). Statsmodels: Econometric and Statistical Modeling with Python. Proceedings of the 9th Python in Science Conference. (For SARIMA and Seasonal Decomposition implementation).
- Synthetic Revenue Dataset. (2024). Enlarged B2B Sales Funnel Forecast Data [Generated Dataset]. (Derived from original Sales Funnel Revenue patterns using Python-based augmentation).
Technical Keywords & Methodology Index
Forecasting & Deep Learning Frameworks: SARIMA (Autoregressive Integrated Moving Average), SARIMAX (Exogenous Variables), LSTM (Long Short-Term Memory Networks), Recurrent Neural Networks (RNN), Multivariate Time Series Analysis.
Methodology & RevOps Strategy: Hybrid Forecasting Architecture, Delphi Method (Expert Opinion Integration), Champion-Challenger Method, Sales Pipeline/CRM Integration, Statistical Parsimony (Occam’s Razor).
Data Engineering & Diagnostics: Synthetic Data Augmentation, Outlier Management, Time Series Decomposition, Gaussian Noise Injection, Lookback Depth/Windowing, Stochastic Process Modeling.
Systems Architecture & Logic: Black Box Model Interpretability, Exogenous Variable Impact Analysis, Temporal Dependency Tracking, Predictive Accuracy Benchmarking.
Python Libraries & Documentation
For data scientists and revenue operations engineers looking to replicate this hybrid architecture, the following stack utilizes specialized Python utilities to bridge the gap between human intuition and machine precision.
| Library | Role in Pipeline | Strategic Purpose |
| Statsmodels | Statistical Modeling | Implements SARIMA, SARIMAX, and time-series decomposition to establish the statistical baseline. |
| NumPy | Mathematical Computation | Enables synthetic data generation and the high-performance array programming required for model training. |
| Pandas | Data Wrangling | Manages complex time-series resampling and the cleaning of sparse CRM datasets. |
| Matplotlib | Visualization | Produces time-series plots and funnel decomposition charts necessary for stakeholder trust and visibility. |

Leave a Reply